A Comparison of Deep Learning Methods For Urban Traffic Forecasting Using Floating Car Data
A Comparison of Deep Learning Methods For Urban Traffic Forecasting Using Floating Car Data
com
Available online at www.sciencedirect.com
ScienceDirect
ScienceDirect
Available online at www.sciencedirect.com
Transportation Research Procedia 00 (2019) 000–000
Transportation Research Procedia 00 (2019) 000–000
ScienceDirect www.elsevier.com/locate/procedia
www.elsevier.com/locate/procedia
Transportation Research Procedia 47 (2020) 195–202
22nd EURO Working Group on Transportation Meeting, EWGT 2019, 18-20 September 2019,
22nd EURO Working Group on Transportation
Barcelona,Meeting,
Spain EWGT 2019, 18-20 September 2019,
Barcelona, Spain
A Comparison of Deep Learning Methods for Urban Traffic
A Comparison of Deep Learning Methods for Urban Traffic
Forecasting using Floating Car Data
Forecasting using Floating Car Data
Juan José Vázqueza,a,*, Jamie Arjonaaa, MªPaz Linaresaa, Josep Casanovas-Garciaa,b
Juan José Vázquez *, Jamie Arjona , MªPaz Linares , Josep Casanovas-Garciaa,b
a
Universitat Politècnica de Catalunya, Barcelona 08034, Spain
Barcelona Politècnica
Universitat
ab
Supercomputing Center, Barcelona
de Catalunya, Barcelona08034,
08034,Spain
Spain
b
Barcelona Supercomputing Center, Barcelona 08034, Spain
Abstract
Abstract
Cities today must address the challenge of sustainable mobility, and traffic state forecasting plays a key role in mitigating traffic
Cities todayin
congestion must address
urban theFor
areas. challenge
example, of sustainable
predicting mobility,
path traveland time
trafficisstate forecasting
a crucial issue plays a key role and
in navigation in mitigating traffic
route planning
congestion inFurthermore,
applications. urban areas.theFor example,
pervasive predicting
penetration of path travel time
information is a crucial issue
and communication in navigation
technologies makes and routecarplanning
floating data an
applications.
important Furthermore,
source the pervasive
of real-time penetrationtransportation
data for intelligent of informationsystem
and communication
applications. technologies makeswith
This paper deals floating car data an
the problem of
important source
forecasting urban of real-time
traffic when data for intelligent
floating car data istransportation
available. A system
comparison applications. This learning
of four deep paper deals with the
methods problem of
is presented to
forecasting
demonstrateurban traffic when
the capabilities floating
of the neuralcar data is
network available.(recurrent
approaches A comparisonand/orof four deep learning
convolutional) methods
in solving is presented
the traffic forecastingto
demonstrate
problem in antheurban
capabilities
context.ofDifferent
the neural network
tests approaches
are proposed (recurrent
in order and/or
to not only convolutional)
evaluate in solving
the developed deepthe traffic models,
learning forecasting
but
problem
also in an urban
to analyze how the context. Different
penetration tests
rates of are proposed
floating in order
cars affect to not only
forecasting evaluateThe
accuracy. the presented
developedexperiments
deep learning models,
were but
designed
also to analyze
according how the penetration
to a microscopic rates of floating
traffic simulation approachcars affecttoforecasting
in order accuracy.
emulate floating car The
data presented experiments
fleets, which were designed
provide vehicle position
according
and speed,to a microscopic
and to validate the traffic simulation
obtained results.approach in order
Finally, some to emulateand
conclusions floating
furthercarresearch
data fleets, which provide vehicle position
are presented.
and speed, and to validate the obtained results. Finally, some conclusions and further research are presented.
© 2020 The Authors. Published by Elsevier B.V.
© 2020
This The
is an Authors.
open accessPublished by Elsevier
article under B.V.
the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
© 2020 The
Peer-review Authors.
under Published
responsibility by
of Elsevier
the B.V. committee
scientific
Peer-review under responsibility of the scientific committee of of the 22ndEuro
the 22nd EURO Working
Working Group
Group on on Transportation
Transportation Meeting.
Meeting
Peer-review
Keywords: under
urban responsibility
traffic forecast; deepoflearning;
the scientific car data. of the 22nd EURO Working Group on Transportation Meeting.
floatingcommittee
Keywords: urban traffic forecast; deep learning; floating car data.
1. Introduction
1. Introduction
Traffic forecasting has been an active research topic since the late 1970s. It is far too general of a problem and
Trafficdifferent
includes forecasting has been an
sub-problems active
with research
different topicofsince
degrees the late 1970s.
complexity, It is on
depending farsome
too general
aspectsofsuch
a problem and
as context,
includes different sub-problems with different degrees of complexity, depending on some aspects
data source, predicted variables, and the prediction horizon, among others. This research focuses on trafficsuch as context,
data source,in predicted
forecasting variables,
urban contexts and the car
using floating prediction horizon,
data (FCD) among
to predict others. speed
the average This of
research focuses
the roads on the on traffic
network.
forecasting in urban contexts using floating car data (FCD) to predict the average speed of the roads on the network.
The existing literature presents many different approaches to solving the forecasting problem, which van Lint et
al. (2012) classifies as the following: naïve methods that make no model assumptions; parametric methods whose
structures are predetermined according to theoretical considerations that fit the parameters with data; and non-
parametric methods whose structures and parameter values are determined from data.
Due to increases in the quantity and different sources of data, as well as the computational capabilities of new
systems, the trend in recent years has changed in favor of non-parametric methods, specifically machine learning
methods. In the field of traffic forecasting, deep learning (DL) proposals have proven to give more accurate
predictions, and the use of these methods have increased intensively in recent years. For this reason, our work here
focuses on developing DL based models.
This contribution is organized as follows. First, Section 2 summarizes a literature review of traffic forecasting
using DL methods. Then, Section 3 specifies the selected methods to be compared. Section 4 presents the
computational experiments by detailing the simulation scenarios, the proposed experimental design, the hyper-
parameter optimization and the obtained results. Finally, Section 5 describes the final conclusions and some future
research.
2. Related work
Vlahogianni et al. (2014) and Lana et al. (2018) systematically examine recent developments in data-driven
traffic forecasting methods. The evolution between these two works shows a clear increase in the use of non-
parametric methods in this field. In particular, DL is the most salient and recommended approach in recent
proposals.
Long short-term memory (LSTM) methods are some of the most used DL approaches in studies on time series
and other sequential data such as traffic data. These methods are considered a subfamily of recurrent neural
networks (RNN) and are able to learn long-term dependencies while remembering information for long periods. The
proposals of Duan et al. (2016), Liu et al. (2017), and Du et al. (2018) are good examples of applying LSTM to
traffic forecasting. Fu et al. (2017) compare an LSTM model with a gated recurrent units (GRU) model, which is an
NN method similar to LSTM and suggested by Cho et al. (2014). The results conclude that GRU outperforms LSTM
in traffic forecasting.
In order to add spatial information to the previous LSTM methods, some authors propose merging LSTM with
convolutional neural networks (CNN). In this way, Yu et al. (2017) present a spatiotemporal recurrent convolutional
networks model (SRCN), which take as input a set of static images that represents the network-wide traffic speeds.
Moreover, Cheng et al. (2017) introduce an end-to-end framework called DeepTransport, in which CNN and RNN
are utilized to obtain spatial-temporal traffic. In addition, Cui et al. (2018) propose an approach that merges CNN
and LSTM, which they call a High-Order Graph Convolutional Long Short-Term Memory Neural Network (HGC-
LSTM). This applies CNN to the network graph encoded as a matrix, which is a similar format to images. The
experiments presented in these papers demonstrate that the methods capture the complex relationships in the
spatiotemporal domain and outperform traditional state-of-the-art DL methods.
Due to the lack of a generic suite for testing these kinds of solutions under the same conditions, it is difficult to
compare different methods using the results from the original papers. Therefore, our proposal implements four of the
most relevant proposals introduced above in order to compare them properly. In particular, we consider two
recurrent neural network approaches (LSTM and GRU), and two of the previously mentioned combined solutions,
SRCN and the HGC-LSTM methods.
3. Selected models
As indicated in the previous section, we implemented four different DL methods in order to perform traffic
forecasting in urban contexts, using FCD to predict the average speed of the network road sections. Before delving
into the different methods, the common initial data format should be defined. A new dataset is generated from the
source FCD in order to represent the state of the network in different time periods of a predefined duration 𝑡𝑡𝑡𝑡𝑡𝑡. The
form of the new dataset 𝑆𝑆 is 𝑆𝑆 ∈ ℝ��� , where 𝑁𝑁 is the number of time windows with duration 𝑡𝑡𝑡𝑡𝑡𝑡 and 𝑀𝑀 is the
number of road sections. So, for a time period 𝑖𝑖 and a section 𝑗𝑗, 𝑆𝑆�� represents the average speed of all recorded
Juan José Vázquez et al. / Transportation Research Procedia 47 (2020) 195–202 197
Juan José Vázquez et al. / Transportation Research Procedia 00 (2019) 000–000 3
vehicles in 𝑗𝑗 during 𝑖𝑖. 𝑆𝑆�� is a missing value when the original data has no records for a section 𝑗𝑗 at time period 𝑖𝑖.
Because the proposed models do not accept missing values, they are imputed by performing a k-nearest neighbor
imputation.
The LSTM and GRU models can be defined as a sequence of one or more specific layers (LSTM and GRU,
respectively), which are all connected to each other. The last layer of each model is a fully connected NN layer for
transforming the output of the last layer to the desired format (in this case, one value for each predicted road section
speed). Therefore, given the state of the network in a period (𝑆𝑆� ), these models are able to predict the state of the
network in the next time period (𝑆𝑆��� ).
For the SRCN method, the input data incorporates the spatial relationships of the data, and each input state is
codified as an image. To achieve this, an image template is defined by mapping the network model to a grid, where
each cell represents a pixel. Thus, to build the state image for a period, each pixel of the image is filled by
computing the average speed of 𝑆𝑆 for that period and for all the road sections of the corresponding cell.
The inputs of the HGC-LSTM model are also 2-dimensional. Although they are not images, they are numerical
matrices that can be interpreted as images. Given a predefined value 𝐾𝐾, and for a time period 𝑡𝑡, the input is
computed with the following equation: 𝑇𝑇𝑇𝑇� = [𝑇𝑇𝑇𝑇�� , … , 𝑇𝑇𝑇𝑇�� ]. Each element of the array is computed by 𝑇𝑇𝑇𝑇�� =
(𝐹𝐹𝐹𝐹𝐹𝐹 ʘ Ã� ) · 𝑆𝑆� , where ʘ is the element-wise product of matrices. 𝐹𝐹𝐹𝐹𝐹𝐹 ∈ ℝ��� is a binary matrix where 𝐹𝐹𝐹𝐹𝐹𝐹�� =
1 if a path exists from section 𝑖𝑖 to section 𝑗𝑗 in a time up to 𝑡𝑡𝑡𝑡𝑡𝑡. Ã� ∈ ℝ�� is also a binary matrix, where Ã��� = 1
if a path exists from section 𝑖𝑖 to section 𝑗𝑗 and crossing exactly 𝑘𝑘 − 1 sections.
In contrast to the differences in the input generation process, the SRCN and the HGC-LSTM models share the
same structure. Because of the new format of the input, these models combine the RNN layers with some extra CNN
layers. The following four consecutive parts comprise their structure. (1) The first part of the model is a set of CNN
layers, each one composed of a convolutional and a pooling layer. (2) A flatten layer transforms the output of this
first part into the required RNN format. (3) In order to incorporate the temporal relation of data into the model, a set
of RNN layers (LSTM or GRU) is added before the flatten one. (4) The last layer is a fully connected layer that
transforms the output to the desired format. This structure allows the models to take images as inputs and extract
spatial and temporal information from these inputs.
4. Computational experiments
Once the proposed methods are defined, we will expose the comparative methodology to evaluate them. Our
work here proposes a traffic simulation approach to generating the needed FCD input. In contrast to using real FCD,
generating data through simulation allows creating data for a great variety of scenarios in order to study the
performance of models in different situations. In addition, this saves a lot of effort in terms of the time and cost
required for collecting real data. That said, depending on the case, it is better to use real or simulated data. If the goal
is to use the traffic forecasting model in a real scenario, real data is highly recommended. Otherwise, if the goal is to
compare different methods and evaluate their general performance under different conditions, simulated data is a
better option.
The source FCD is generated using Aimsun (2018), a microscopic traffic simulator able to model the interactions
for each vehicle and also collect data from them individually. From the simulation, a record (vehicle identifier,
speed, section, and lane) is collected for each connected vehicle in every pre-defined period. In the following
proposed experiments, the collection period is 10 seconds.
4.1. Scenarios
In this study, we use two different urban traffic networks in Spain to evaluate the performance of the forecasting
models: Camp Nou and Amara (see Fig. 1). The former represents a small area of Barcelona composed of 4 nodes
and 22 sections. The latter, Amara, is a district in San Sebastian composed of 105 nodes and 192 sections. Using
these urban scenarios allows us to analyze the performance of the forecasting models using different network
features, such as size, capacity, and the topology of the roads, among others.
4198 Juan José Vázquez
Juan et al. / Transportation
José Vázquez Research
et al. / Transportation Procedia
Research 00 (2019)
Procedia 47 000–000
(2020) 195–202
Fig. 1. Camp Nou (left) and Amara (Right) traffic simulation networks.
In order to evaluate the developed models, a partial factorial experimental design is defined. It is based on three
experimental factors:
Penetration rate: percentage of vehicles in the network used to generate the FCD. This factor is
presented at four levels: 25%, 50%, 75%, and 100%.
Prediction horizon: this defines how far ahead the model predicts the future. This factor is presented at
six levels: 5, 10, 15, 20, 40 and 60 minutes. The first three levels are considered to be short-term, and
the rest are long-term.
Training data: amount of historical data used to train the methods. This is quantified by the number of
days corresponding to the data. This factor is presented at three levels: 5, 10 and 15 days.
Every experiment is based on a common initial configuration set at a 100% penetration rate within a 5-minute
prediction horizon and using 5 days’ worth of data to train the models. For each experiment, the factor level is
changed and the rest maintain their initial values. In addition, every selected factor configuration is tested for each of
the four implemented models and in the two previously presented scenarios.
The selected deep learning methods require setting a group of parameters for use in an optimization process. This
process is named hyper-parameter optimization. In particular, we optimize the hyper-parameters of the models for
every test by using a random search as an alternative to grid search and manual search. The results presented by
Bergstra & Yoshua (2012) show that this strategy is able to find models that are as good or better, and they also
perform within a small fraction of the computation time needed by other search strategies. This algorithm consists of
generating a random set of possible configurations and selecting the best one based on its accuracy with the
validation dataset. In this case, 60 different random configurations are generated for each experiment.
Juan José Vázquez et al. / Transportation Research Procedia 47 (2020) 195–202 199
Juan José Vázquez et al. / Transportation Research Procedia 00 (2019) 000–000 5
Fig. 2. MAE of the four models, depending on the FCD penetration ratio in Camp Nou (left) and Amara (right).
In order to evaluate the performance of the implemented models, two different forecast error measures are used.
A forecast error measure quantifies how well the forecasted values 𝑦𝑦� ∈ ℝ� match the observed ones 𝑦𝑦 𝑦𝑦� . The
error measures used to evaluate the models’ accuracy are mean absolute error (Equation 1) and root mean squared
error (Equation 2).
�
𝑀𝑀𝑀𝑀𝑀𝑀 𝑀 ∑�
� |𝑦𝑦� − 𝑦𝑦
�|� (1)
�
𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 �∑�
� (𝑦𝑦� − 𝑦𝑦
�)�
� (2)
The first experiments analyze the performance of the forecasting models using four different FCD penetration
rates (25%, 50%, 75%, and 100%). The results presented in Table 1, Table 2 and Fig. 2 show that the best models
for the highest penetration rates (100% and 75%) are LSTM and GRU. In the Amara scenario, which is the largest,
these two models are also the best for the lowest penetration rates (50% and 25%). In contrast, for the Camp Nou
scenario, the HGC-LSTM model is the best option for the lowest penetration values. Also, the decrease in the
penetration rate directly affects prediction accuracy by causing it to also decrease, especially in the smallest
scenario. Having said this, it is important to consider that the errors for the lowest penetration rate are reasonably
good for forecasting urban traffic.
In general, the GRU and LSTM methods outperform the others in mostly all the short-term experiments
performed for both the Camp Nou and Amara scenarios (see Tables 3 and 4). In the long-term case, the accuracy of
the four options is very similar for the two scenarios. The prediction error of GRU and LSTM models is practically
constant for long-term and short-term experiments, so the prediction horizon is not so critical for them. The
performance of the convolutional methods is also very similar for all situations, with the exception of the smallest
prediction horizons in the Amara scenario, where the results are worse than the others.
Lastly, the results of the training data experiments are presented (see Table 5). The methods that perform better,
in general, are LSTM and GRU. In particular, GRU shows a smaller error for the Camp Nou network, and LSTM is
the best option for the Amara scenario. The convolutional solutions are better for the Amara network in twice the
number of cases, but the difference is small. Although the accuracy of the proposed forecasting methods generally
increases with more data, the difference is not critical. This is especially true for the Amara network, where the
improvement is minimal. Thus, the implemented models show good performance in traffic forecasting with 5 days
of training data.
5. Conclusions
This project deals with the traffic forecasting problem, which has been prominently active in the last 40 years.
Traffic forecasting plays a key role in mitigating some traffic and transportation problems. In particular, this project
focuses on traffic forecasting in urban networks using floating car data (FCD).
Four deep learning methods were implemented in order to perform traffic forecasting. The results of the
performed experiments show that these solutions are able to predict traffic speeds with good performance.
Specifically, recurrent methods (LSTM and GRU) present smaller errors than convolutional ones (SRCN and HGC-
LSTM). Therefore, the convolutional component is not needed to extract spatial information.
In terms of penetration rate, its increment reduces the prediction error. However, the methods predict reasonably
well with the smallest tested penetration (25%). Similarly, the use of more training data increases the accuracy,
although the improvements are not very significant, and 5 days of data are enough to train the four tested methods.
Furthermore, the presented computational experiments determine that the implemented models are able to
perform accurate traffic forecasting regardless of scenario size and prediction horizon. These results inspire further
research to complement the performed experiments, such as extending the experimental design by adding more
levels for the proposed factors as well as by considering new factors. Although the forecasting models in the
literature usually test smaller scenarios than Amara, the use of a larger network could be interesting for evaluating
the feasibility of the models in terms of their forecasting and computational capabilities. Furthermore, in order to
perform more realistic predictions, differentiating section lanes could pose a highly interesting challenge in
detecting traffic congestion.
FCD can at times be insufficient for covering all the network sections, and machine learning forecasting of a
variable without any historical data does not make sense. Nevertheless, different approaches can be applied to
solving this. For example, secondary methods may be used, the missing values can be extrapolated, or new data
sources could be added. Aside from cases of missing values, including new data sources can complement the FCD
and improve forecasting accuracy. The new data could be of the same type as that which we used here (speeds from
loop sensors), or it could be completely different (exogenous variables like weather conditions or calendar events).
Acknowledgements
Throughout this work, the authors have benefited from the support of inLab FIB team at Universitat Politècnica
de Catalunya. This research was funded by Secretaria d’Universitats i Recerca de la Generalitat de Catalunya (2017-
SGR-1749) and under the Industrial Doctorate Program (2016-DI-79).
References
https://doi.org/10.1109/ITSC.2017.8317886
van Lint, H., & van Hinsbergen, C. (2012). Short-Term Traffic and Travel Time Prediction Models. In Artificial Intelligence Applications to
Critical Transportation Issues (pp. 22–41). https://doi.org/10.17226/22690
Vlahogianni, E. I., Karlaftis, M. G., & Golias, J. C. (2014). Short-term traffic forecasting: Where we are and where we’re going. Transportation
Research Part C: Emerging Technologies, 43, 3–19. https://doi.org/10.1016/j.trc.2014.01.005
Yu, H., Wu, Z., Wang, S., Wang, Y., & Ma, X. (2017). Spatiotemporal recurrent convolutional networks for traffic prediction in transportation
networks. Sensors (Switzerland). https://doi.org/10.3390/s17071501