0% found this document useful (0 votes)
4 views

Urban_Traffic_Prediction_from_Mobility_Data_Using_Deep_Learning

This document explores the application of deep learning for urban traffic prediction, emphasizing its potential to enhance the accuracy and efficiency of traffic modeling using various mobility data sources. It discusses traditional methods of traffic prediction and highlights the limitations of manually crafted features, proposing future research directions to leverage deep learning's capabilities. The article aims to inspire further research in the integration of deep learning with urban traffic prediction systems.

Uploaded by

janarthanan20669
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Urban_Traffic_Prediction_from_Mobility_Data_Using_Deep_Learning

This document explores the application of deep learning for urban traffic prediction, emphasizing its potential to enhance the accuracy and efficiency of traffic modeling using various mobility data sources. It discusses traditional methods of traffic prediction and highlights the limitations of manually crafted features, proposing future research directions to leverage deep learning's capabilities. The article aims to inspire further research in the integration of deep learning with urban traffic prediction systems.

Uploaded by

janarthanan20669
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

EXPLORING DEEP LEARNING FOR EFFICIENT AND RELIABLE

MOBILE SENSING

Urban Traffic Prediction from Mobility Data Using Deep Learning


Zhidan Liu, Zhenjiang Li, Kaishun Wu, and Mo Li

Abstract Such emerging big data substantially augment the


data availability (coverage and fidelity) and also
Traffic information is of great importance for enriches data diversity, so that large-scale and reli-
urban cities, and accurate prediction of urban traf- able traffic predictions become viable.
fics has been pursued for many years. Urban traf- To leverage such benefits, conventional meth-
fic prediction aims to exploit sophisticated models ods utilize statistical models or machine learn-
to capture hidden traffic characteristics from sub- ing models to predict traffic flows. They rely on
stantial historical mobility data and then makes human-crafted features to unveil and capture
use of trained models to predict traffic conditions underlying traffic characteristics and further take
in the future. Due to the powerful capabilities of instant traffic condition measurements as input,
representation learning and feature extraction, together with models built on the obtained fea-
emerging deep learning becomes a potent alter- tures, to predict future traffic conditions. How-
native for such traffic modeling. In this article, we ever, traffic flows can be influenced by various
envision the potential and broard usage of deep factors in practice, for example, transport regula-
learning in predictions of various traffic indicators, tions, weather conditions, and so on. These man-
for example, traffic speed, traffic flow, and acci- ually selected features have been shown to be
dent risk. In addition, we summarize and analyze inadequate to comprehensively describe traffic
some early attempts that have achieved notable characteristics and thus cannot achieve accurate
performance. By discussing these existing advanc- predictions [2].
es, we propose two future research directions to Recently, unprecedented data availability and
improve the accuracy and efficiency of urban traf- the ability to rapidly process these data together
fic prediction on a large scale. make possible the immense development of deep
learning theory [3]. Deep learning has drawn
Introduction much attention due to its remarkable capability to
Comprehensive urban traffic information bene- automatically extract features from large-scale raw
fits urban citizens’ daily life and improves urban data, and has already been successfully applied
transportation efficiency. Accurate predictions of in various domains, for example, computer vision
such traffic information are of great importance and speech recognition.
for route planing, navigation, and other mobility Compared to classic machine learning models,
services. Urban traffic prediction generally applies for example, SVM and ANN, which only have a
traffic models to analyze various historical and shallow architecture to capture features, deep
real-time traffic data to predict traffic conditions in learning models inversely use multi-layer (i.e.,
the future. Traffic speed, traffic flow, and accident “deep”) architecture to discover intricate struc-
risk are representative indicators of traffic condi- tures and complex patterns, where different layers
tions, and tremendous efforts have been made capture features from different perspectives and
to accurately predict such indicators as the traffic finally together form a multi-level abstraction.
prediction targets in the past decades by leverag- In view of the powerful capabilities of deep
ing types of mobility data and traffic models [1]. learning, we envision the potential and broad
Traditionally, people are used to deploy vari- usage and impact of its integration with rich
ous infrastructures, including loop detectors, traf- mobility data in future urban traffic prediction.
fic cameras, and radars, at some important road In this article, we introduce the basic compo-
intersections to collect mobility data [2]. Howev- nents involved in the procedure of urban traffic
er, due to the high deployment and maintenance prediction, including the types of input mobility
costs, it is prohibitive to widely adopt them on data, traffic modeling, and various target traffic
a city scale, which thus largely limits the cover- indicators, for example, traffic speed, traffic flow,
age of traffic monitoring. Thanks to the popularity and accident risk. We investigate the possible
of ubiquitous sensing and Intelligent Transporta- approaches of applying deep learning to various
tion Systems (ITS) in recent years, we can gath- kinds of traffic predictions, and meanwhile discuss
er unprecedented mobility data by exploiting a those early attempts that have already exploit-
variety of mobile devices (e.g., smartphones and ed deep learning for accurate predictions of var-
on-board GPS devices) and automatic fare col- ious traffic indicators. Based on discussing these
lection (AFC) devices widely deployed by urban existing advances, we analyze the inherent match
transit systems (e.g., subways, buses, and taxis). between deep learning and data-driven traffic pre-

Zhidan Liu and Kaishun Wu (corresponding author) are with the College of Computer Science and Software Engineering, Shenzhen University;
Digital Object Identifier: Zhenjiang Li is with the Department of Computer Science, City University of Hong Kong; Mo Li is with the School of Computer Science and Engineering,
10.1109/MNET.2018.1700411 Nanyang Technological University.

40 0890-8044/18/$25.00 © 2018 IEEE IEEE Network • July/August 2018

Authorized licensed use limited to: Sathyabama Institute of Science and Technology. Downloaded on January 30,2025 at 07:51:22 UTC from IEEE Xplore. Restrictions apply.
diction. Moreover, we also point out two poten-
tial research directions, i.e., joint optimization of
multi-source data and traffic modeling, and paral- Loop detector
Persisted traffic model
lel computing promoted deep learning to accel-
erate traffic predictions, for future explorations. Real-time
To the best of our knowledge, this is the first arti- Traffic camera mobility data Traffic speed prediction
cle that examines and summarizes deep learning Traffic modeling
based urban traffic predictions, and we believe GPS-equipped taxi Feature extraction
this work could inspire a variety of follow-up work
in this area. Feature selection
The rest of this article is organized as follows. GPS-equipped bus Traffic flow prediction
First we introduce the concepts involved in urban
traffic prediction. Then we discuss the potential Historical Model training
of deep learning in traffic prediction and analyze mobility data
AFC systems
existing attempts. Next we discuss possible direc- Model evaluation
tions to improve the accuracy and efficiency of Traffic accident risk
large-scale traffic prediction. Finally, we conclude prediction
Social media data
this article.
FIGURE 1. The basic components of urban traffic prediction.
Concepts of Urban Traffic Prediction
Urban traffic prediction concerns the prediction ity, and event of each accident, provide helpful
of traffic conditions made from a few seconds to information to assess potential accident risk of
a few hours into the future based on current and each location within a city. Social networking ser-
historical traffic information [1]. Many research vices can treat humans as sensors to probe the
efforts have been made to accurately model traf- dynamics of a city, and thus social media data
fic indicators such as traffic speed, traffic flow, can help infer traffic anomalies (e.g., accidents) as
and accident risk, and produce anticipated traffic well. Cellphone data indicate users’ movements
conditions. Figure 1 demonstrates the high-level within a city at cell-tower levels, and provide hints
procedure of urban traffic prediction, including for inferences of traffic conditions. In addition,
mobility data collection, advanced traffic model- sensing data from crowdsourcing systems also
ing, and targets of traffic predictions. serve as an important data source for traffic pre-
diction. All such data measure urban traffic from a
Mobility Data Collection complimentary perspective.
The mobility data involved in traffic predictions
can be classified into the following categories. Advanced Traffic Modeling
Traffic Data from Infrastructures: Many infra- Urban traffic is complicated and usually non-lin-
structure devices, e.g., loop detectors and traffic ear, and thus some advanced traffic models
cameras, have been deployed in cities to con- are preferred, for example, statistical models or
tinuously collect traffic data. The loop detectors machine learning models, to capture the hidden
are buried under traffic lanes of some important traffic characteristics from mobility data and then
roads, and can detect vehicles passing by. Such facilitate the predictions based on input of real-
measurements are used to calculate the traveling time data.
speed of each individual vehicle and also count As shown in Fig. 1, advanced traffic model-
the total number of vehicles passing by (i.e., ing is an iterative process that consists of several
traffic flow) within a period. Similarly, cameras phases. To construct a traffic model, we first need
are placed above road intersections and used to to extract some desired values (i.e., features) from
capture images of vehicles passing by. Based on the raw mobility data. Such a set of features are
computer vision techniques, traveling speeds of correlated with the target traffic conditions. Tak-
vehicles and traffic flows can also be derived. ing the traffic condition ci of a road segment si
Trajectory Data from Vehicles: In urban cit- as an example, ci is not only influenced by traffic
ies, a large number of public vehicles (e.g., taxis conditions of si’s neighboring road segments in
and buses) have been equipped with GPS devic- the spatial dimension, but also impacted by time
es, and thus can periodically report their status, of the day (e.g., peak hours and non-peak hours)
including current location, traveling speed, direc- and day of the week (e.g., workday and weekend)
tion, and so on Those reports indicate the tra- in the temporal dimension. Those spatial-temporal
jectories of vehicles that contain traffic condition factors together determine the evolution of ci and
measurements of the roads. play an important role in accurately predicting its
AFC Records from Transit Systems: Modern future status. After the feature extraction phase,
public transportation networks rely heavily on a small set of the most relevant features are fur-
AFC devices to automatically collect transit fees ther selected based on some criteria, for example,
from bus and subway passengers, who need to information entropy, to simplify the modeling and
tap their smartcards to AFC readers when they get enhance the generalization capability of a model.
on and off buses or subways. Thus, AFC systems After constructing the traffic model only using the
record the boarding/alighting (bus or subway) sta- most informative and non-redundant features, we
tions/time of passengers, and all such records can can tune the parameters through massive training
be used to construct a trip origin-destination (OD) data and evaluate the derived model with testing
matrix that reveals mobility flows. data. The whole process of traffic modeling can
Other Data Sources: There are other data be repeated until target prediction performances
sources useful for traffic predictions. For example, (e.g., accuracy) are achieved. The persisted traffic
accident reports, which contain location, sever- model is the one that encodes the traffic char-

IEEE Network • July/August 2018 41

Authorized licensed use limited to: Sathyabama Institute of Science and Technology. Downloaded on January 30,2025 at 07:51:22 UTC from IEEE Xplore. Restrictions apply.
Category Involved data sources Desired output (i.e., vehicles or humans) that pass through an
area during a period. The area can be a road
Traffic speed prediction Infrastructures, GPS-equipped vehicles
Average traffic speed (or segment or a region in the city. Different from
congestion level) traditional works that hold many assumptions on
human mobility, more recent approaches model
Total number of objects pass-
Traffic flow prediction Infrastructures, AFC systems and predict the traffic flow based on the realistic
ing through a road/region
human mobility data collected from infrastruc-
Traffic accident risk Infrastructures, AFC systems, social Accident risk probability for tures and AFC systems. Traffic flows reveal the
prediction media data, historical accident reports each road/region movements of crowds and potentially determine
the traffic distributions [5].
TABLE 1. A summary of different urban traffic predictions. Traffic Accident Risk Prediction: Traffic acci-
dents, although rare, have serious impacts on
urban traffic. Therefore, it is necessary to assess
… … traffic accident risks for each specific road and
region, which can be measured as likelihoods,
meaning how likely is it that traffic accidents might
… … occur on a road/region. Recent practices mainly
associate accident risks with current traffic condi-
tions and human mobility, and thus they develop
… …
models to mine relations between mobility data
and historical accident reports for traffic accident
… …

… …

… …

… …
… …

… … risk prediction [6].

… … Deep Learning Based Traffic Prediction


Input Output A Primer on Deep Learning
layer Hidden layers layer Although there exist various forms of deep learn-
ing models, they share a common architecture
FIGURE 2. A deep neural network with fully-connected layers. It contains an input as shown in Fig. 2, which contains an input layer,
layer, an output layer, and many hidden layers. Each hidden layer contains a an output layer, and from several to more than
number of units that use an activation function (i.e., ReLU) to calculate the a thousand hidden layers in between. Raw data
state based on units from the immediately previous layer. initialize the values of the input layer while the
output layer emits the desired inferences. All hid-
acteristics and can be used for traffic prediction den layers are responsible for transforming states
given the real-time input mobility data. of the input layer into the expected inferences
Existing works mainly rely on models like of the output layer by capturing the high-level
ARIMA, ANN, and SVM to capture the complex abstractions. Each layer in the network contains a
traffic [1]. When building such traffic models, number of units, and the sizes could vary among
feature extraction and selection are significant- different layers. Links exist between units of any
ly important as they will determine the final per- two neighboring layers and each link is associ-
formance of a traffic model. These procedures, ated with a weight. Every unit has an activation
however, are heavily dependent on man-crafted function that determines how to calculate its own
feature engineering, which calls for rich experi- state based on units from the immediately previ-
ences and expert knowledge. ous layer and in turn exposes its state to the next
layer. One of the most popular activation func-
Targets of Traffic Prediction tions recently is the rectified linear unit (ReLU),
According to the prediction targets of interest, which is a half-wave rectifier f(x) = max(x, 0).
urban traffic predictions can be further subdivided Next we will introduce several popular models
into traffic speed prediction, traffic flow prediction, that have already been well exploited [3].
and traffic accident risk prediction. Table 1 sum- Convolutional Neural Network (CNN): The
marizes these types of traffic predictions, as well CNN model is primarily designed to process
as their involved data sources and desired output. 2-dimensional data, for example, images. As
Traffic Speed Prediction: Traffic speed is a shown in Fig. 3a, a CNN model is composed of
widely adopted indicator to measure the traffic an input layer and an output layer, as well as mul-
condition of one road segment, which is gener- tiple hidden layers, which could be the convo-
ally calculated as the average traveling speed of lutional, pooling, or fully connected layers. The
all sampling vehicles on a given road segment. convolutional layers adopt convolutional filters,
Existing works derive such vehicular speed mea- which apply certain transformations on the input
surements either indirectly from data collected by data to capture their properties. Next follow pool-
loop detectors and cameras [2] or directly from ing layers that combine the output of unit clusters
GPS-equipped vehicles [4]. They construct a traf- at a previous layer into a single unit in the next
fic speed model from historical data by adopt- layer by employing the max or min filter. A pool-
ing classic machine learning models, and take ing layer learns more abstract representations of
real-time sampling speeds as the input to predict the data, and meanwhile acts as a form of dimen-
future traffic speeds. The predicted traffic speeds sionality reduction to simplify the whole model.
can be translated to certain congestion levels A fully connected layer is used to complete the
(e.g., slow, normal, and fast) according to some inference.
mapping rules. Recurrent Neural Network (RNN): The RNN
Traffic Flow Prediction: In general, traffic flow model is mainly used for tasks that are involved
is defined as the total number of target objects with sequential inputs, for example, speech and

42 IEEE Network • July/August 2018

Authorized licensed use limited to: Sathyabama Institute of Science and Technology. Downloaded on January 30,2025 at 07:51:22 UTC from IEEE Xplore. Restrictions apply.
language, due to its “memory” design in the form
Fully connected
of a loop as shown in Fig. 3b. A loop allows infor-
mation to be passed from one step to the next
(Fig. 3b left). RNNs process an input sequence Max/Min
Transform Transform Output
one element at a time, maintaining output results Max/Min
in the hidden units that implicitly persist informa- layer
Input matrix Convolutional Pooling Convolutional Pooling
tion about the history of all past elements. When layer layer layer layer
unfolding the loop, an RNN can be viewed as (a)
a stack of separate neural networks with some y1 y2 yt yt
yt y0
parameters of each network fed from the previ-
ous one (Fig. 3b middle). Such parameters act as yt–1
tanh
the memory of RNN models. Inside the repeating A A A A A
neural networks of an RNN (Fig. 3b right), the Unfold
xt x0 x1 x 2 …… x t xt
input element x t at time step t is concatenated
with the output y t–1 of previous time step and (b)
then are together fed into an activation function hn
(e.g., tanh) to derive output y t of the current

……
time step. Such an architecture allows RNNs to
Hidden layer: h h n–1 '

……
2
capture temporal dynamics, but practices show encode decode h
that RNNs cannot support long-term dependency
[3]. Thus, an improved RNN called a Long Short h1 h 1'
Term Memory network (LSTM) is proposed, which Input layer: x Output layer: x '
uses special hidden units (i.e., memory cells) to x x'
remember inputs for a long time. LSTM models (c)
are able to learn long sequences and automatical-
ly determine the optimal time lags for prediction. FIGURE 3. The architectures of various deep learning models: a) typical architec-
Stacked Autoencoder (SAE): An autoencoder ture of CNN model; b) typical architecture of RNN model; c) typical archi-
is a three-layer neural network with an input layer, tecture of autoencoder and SAE model.
an output layer, and a hidden layer, as shown in
the left part of Fig. 3c. The target output is inten- Model Application scenarios Referred works
tionally set as the input of the model, and thus
the hidden layer aims to learn the representa- 2-dimensional data
CNN Speed prediction [7, 8]; flow prediction [5]
tions of the input data, which can be viewed as (e.g., images, videos)
a dimensionality reduction or encoding of input Sequential data (e.g., speech,
data. Due to this function, the hidden layer of an RNN Speed prediction [8, 9]
language)
autoencoder is also called the feature layer. The
SAE model links such feature layers in a stacked LSTM
Long sequential data
Speed prediction [10]; flow prediction [11]
fashion to create higher-level abstractions of input (e.g., speech, language)
data, which forms a deep architecture, as shown SAE (SdAE) Representation learning Flow prediction [2]; accident risk prediction [12]
in the right part of Fig. 3c. One of the most pop-
ular variant autoencoders is a denoising autoen- RBM DBN Representation learning Speed prediction [9, 13, 14]
coder, which takes deliberately corrupted samples
as the inputs while is forced to recover the orig-
TABLE 2. Summary of different deep learning models.
inal uncorrupted data. When stacking multiple
denoising autoencoders, we thus derive a variant [14], which exploits the RBM model to extract
of SAE called a stacked denoising autoencoder high-level features for building an SVM model
(SdAE). Compared to the SAE model, SdAE is to predict traffic speed. Specifically, for each tar-
able to discover relatively stable features, which get road segment s, a number of correlated seg-
makes it robust against noisy inputs and thus per- ments are selected and their traffic speeds, certain
form much better. states, time intervals, and geographical distances
There are other deep learning models, such as between them and s are fed into an RBM model
the Restricted Boltzmann Machine (RBM) and Deep to automatically discover helpful features for con-
Belief Network (DBN). Table 2 presents a summary structing the SVM model. Substantial taxi traces
of the above models and their early adoptions in are used to train DeepSense, and the experiment
traffic predictions to be discussed later. results show that DeepSense achieves higher pre-
diction accuracy than its competitors.
Traffic Speed Prediction The following works have explored the possi-
The traffic speed of one road segment is influ- ble applications of other deep learning models in
enced by many factors in both the temporal and this direction, e.g, the DBN model [13], the hybrid
spatial dimensions, for example, time of day and model of RBM and RNN [9], and the LSTM model
traffic conditions of neighboring road segments. [10]. Those works, however, primarily apply deep
We have two ways to apply deep learning to learning on temporal speed sequences of individ-
extract such temporal-spatial features at either ual road segments for traffic prediction at a small
the individual road segment level or the whole network region.
road network level. On one hand, we can mine On the other hand, we can consider traf-
detailed traffic features for each individual road fic speed prediction at the road network scale
segment and then make use of the derived fea- and long time range so that we have essentially
tures to construct classic machine learning mod- transformed the temporal-spatial traffic speeds
els for traffic speed prediction. An early attempt into one 2-dimensional data matrix, which is the
follows this idea and has proposed DeepSense favorable input of CNN models. CNN is good at

IEEE Network • July/August 2018 43

Authorized licensed use limited to: Sathyabama Institute of Science and Technology. Downloaded on January 30,2025 at 07:51:22 UTC from IEEE Xplore. Restrictions apply.
fused to derive the final result. In addition to the
It is highly expected that a deep learning job can be partitioned into a series of tasks running at differ- high-level flow statistics, Song et al. has proposed
ent machines in parallel. However, how to achieve the best modeling performances while maintaining DeepTransport [11], which exploits LSTM mod-
els to predict and simulate an individual’s future
minimum costs on both communications and computations is quite difficult. movements and transportation modes. Lv et al.
[2] exploit SAE models to predict traffic flows on
specific road segments. Such detailed information
capturing spatial features of 2-dimensional data will better facilitate the management and planning
and has been widely applied in image recogni- of urban traffic.
tion tasks with prominent performances achieved.
Inspired by such successes, Ma et al. [7] propose Traffic Accident Risk Prediction
a CNN based method to learn urban traffic as There are many factors related to traffic accidents,
images. They convert road network traffic dynam- for example, traffic congestion, driver behavior,
ics into an image that represents the temporal and road and weather conditions, and thus acci-
and spatial relations of traffic as a matrix. Each dent risk prediction is much more challenging. In
row of the matrix describes the evolution of one general, traffic accident risks are highly correlated
road segment along the time, while each column with human mobility, land usage, and historical
describes traffic conditions of the whole road net- traffic accidents. Thus, we can divide a city into
work at a specific time step. They apply CNN to grids and assign traffic flow and historical traffic
such images to extract network-scale features and accident data into these grids to form a mobility
use those features to build a fully connected neu- matrix and an accident matrix. Taking these matri-
ral network for network-wide traffic speed pre- ces as inputs, certain deep learning models could
diction. Experiments show that the CNN based be used to extract complex features for building
method indeed has remarkable capability to a traffic accident risk predictor using a traditional
process 2-dimensional data and outperforms the machine learning model.
compared methods building on either conven- The only attempt we found is made by Chen
tional models (e.g., ANN) or other deep learning et al. [12]. Their proposed method divides a city
models (e.g., RNN and LSTM). By considering into regions and the time of day into intervals.
prediction errors, Wang et al. [8] further improve For each time interval t and each region r, it cal-
conventional CNN models with an additional culates risk level gr,t from historical accident data,
error-feedback recurrent layer, which takes the and average human mobility density dr,t from his-
output of CNN as the input and compensates the torical GPS records. The derived data form two
prediction errors using predicting results of previ- kinds of matrices and are fed into SdAE to extract
ous periods. robust and stable features to construct a logistic
regression model. Given real-time human mobil-
Traffic Flow Prediction ity data, the method outputs a risk assessment
Similar with traffic speed, traffic flow in a specif- map that can be used to provide early warning for
ic area is also affected by temporal and spatial people of possible traffic accidents. This method,
factors. Different from traffic speed, traffic flow however, only considers human mobility but does
should be considered on a large scale since not take other factors into account, e.g., weather
human mobility usually covers a large area. There- and land usage, for a comprehensive accident risk
fore, we divide the road network or the whole prediction.
city into grids and place traffic flows into such a
2-dimensional gridded space to form instant traffic
flow snapshots. Some deep learning, especially
Discussion and Future Directions
CNN, models could be used to discover latent Why Deep Learning Fits Traffic Prediction
traffic flow features from such snapshots to build Urban traffic can be influenced by many factors,
the flow predictor. A notable attempt is made by such as transport regulations, road conditions,
Zhang et al. [5], where they consider predictions whether conditions, stochastic events, land usage,
of traffic inflow/outflow in each region of a city and so on, which together make traffic patterns
by exploiting historical mobility data, weather extremely complex. The hand-crafted features
conditions and holiday events. In this work, traffic from prior statistical or machine learning models
inflow/outflow can be measured as the number are essentially a series of hypotheses proposed
of pedestrians, the number of vehicles driving by to approximate the unknown relation about how
near roads, and any other measurements relat- such factors impact traffic status. Due to the inher-
ed to human mobility. To capture the complex ent complexity and hardness of such relations,
temporal-spatial dependencies, the authors trans- however, the manually selected features have
form historical and current inflow/outflow data been shown to be inadequate to comprehensive-
into image-like matrices, and separate them into ly describe traffic characteristics and thus cannot
three groups, denoting recent time, near history, achieve accurate prediction results.
and distant history. Each group is applied with a Thanks to the deep architecture of multiple
CNN model retaining only convolution layers to processing layers, deep learning is capable of
hierarchically capture spatial structure informa- automatically discovering the most representative
tion. A residual unit sequence is used to allow a features from a massive amount of mobility data,
CNN model to be appended with many layers. which is impossible for prior methods with shal-
In addition, external factors like weather condi- low architectures. By inspecting pioneering stud-
tions and holiday events are considered through ies, we highlight the general workflow to apply
a fully connected neural network. The four com- deep learning for traffic predictions as shown in
ponents individually predict traffic inflow/out- Fig. 4. Instead of directly inputting mobility data
flow, and these predictions are then aggregatively into classic machine learning models, the raw data

44 IEEE Network • July/August 2018

Authorized licensed use limited to: Sathyabama Institute of Science and Technology. Downloaded on January 30,2025 at 07:51:22 UTC from IEEE Xplore. Restrictions apply.
are first fed into deep learning models to learn
Deep learning based Classical machine learning
abstractions by many hidden layers. In general, model based predictor
representation learning
low-level abstractions are first extracted from the (e.g., ANN, SVM, logistic
input data and in turn fed to following layers to regression, ...)
form higher-level abstractions. Finally, such a hier-
archy of abstractions automatically selects some
high-level features that are simultaneously sensi- Predictions
tive to subtle details, e.g., different times of day,
and insensitive to irrelevant variations, e.g., the
types of passed vehicles on roads. Building on ……
such features, the derived traffic models will be Hierarchy of abstractions High-level
more informative, stable, and robust, and thus features
they can achieve much better prediction perfor-
mance. In essence, deep learning can be viewed FIGURE 4. Deep learning models hierarchically learn representations of mobility
as an excellent feature extractor, which avoids data and output high-level features to support classical machine learning
burdensome feature engineering while automat- models for better traffic predictions.
ically learning good features using a general-pur-
pose learning procedure. ous when multi-source mobility data are involved,
where the storage and computation overheads
Future Research Directions will significantly increase. Therefore, scalable and
In this article, we propose two potential and cru- efficient parallel computing systems (e.g., comput-
cial research directions for this research topic. er clusters) are preferable to store such big data
Joint Optimization of Multi-Source Data and and accelerate data processing, traffic modeling
Traffic Modeling: As introduced earlier, various and the prediction.
types of mobility data can serve as deep learn- It is attractive yet challenging to handle deep
ing’s input, and multiple traffic condition indica- learning based traffic predictions in a distributed
tors need to be predicted as well. Of course, each manner. It is highly expected that a deep learn-
indicator may not be reliably inferred from any ing job can be partitioned into a series of tasks
individual single-source mobility data, while how running at different machines in parallel. How-
to select the most appropriate mobility data sourc- ever, how to achieve the best modeling perfor-
es to satisfy each indicator’s prediction require- mances while maintaining minimum costs on both
ment is so far unknown yet. In addition, even if communications and computations is quite diffi-
such a selection could be eventually figured out, cult. First, how to parallelize this modeling job is
how to further determine suitable deep learning unclear, and even if possible, how to merge piec-
model details, e.g., the number of models, model es of parameters learned at different machines
types, layers, and so on, to fuse these mobility into the complete final model for quick prediction
data sources and meanwhile link the input and remains to be explored. In addition, since mobility
output is non-trivial. Thus, applying deep learning data are used as training data for traffic model-
in traffic prediction encounters a joint optimiza- ing, the wise placements of those data among
tion of data modality, model structure, and fusion machines are of great importance to reduce
methodology. unnecessary data exchanges (i.e., training data
One possible solution we propose is to exploit and intermediate parameters) between machines.
all available mobility data sources for prediction To address these challenges, we can build our
based on a multi-model strategy. For each data deep learning models by exploiting the parameter
source, we apply deep learning to capture respec- server (PS) architecture [15] to manage and syn-
tive features and then produce one prediction. chronize the model parameters among machines.
All predictions from multiple traffic models can In the PS architecture, server nodes maintain the
be carefully fused to obtain the comprehensive latest model parameters and make them avail-
result. able to worker nodes, while worker nodes update
Such a multi-model based traffic prediction the model parameters using the assigned training
is feasible and attractive, where we can exploit data. Also, since regions nearby are correlated
the ensemble learning theory to integrate those in traffic flows, we can place the mobility data
models and their predictions for a better result. among machines according to their geographical
In practice, we may apply different deep learn- information to significantly reduce data transfer
ing models for different mobility data sources among machines when training the deep learning
to obtain diverse traffic models, and adopt the models. In practice, we can embed more domain
weighted average strategy to compute the final knowledge of transportation into our model and
prediction. The weights of different models are system design to further improve the accuracy
determined through a training procedure. We and efficiency of large scale traffic predictions.
thus omit the data selection issue and design the
deep learning model for each data source inde- Conclusions
pendently. In this article, we envision the potential of rich
Parallel Computing Promoted Deep Learning mobility data and deep learning on urban traffic
to Accelerate Traffic Predictions: To fully extract prediction, and discuss some pioneering attempts.
abstractions from mobility data, deep learning Deep learning will advance traffic predictions
models are usually designed to contain hundreds through powerful representation learning and has
to thousands of layers, and thus numerous param- shown initial successes. By discussing the existing
eters need to be tuned. Conventional computing advances, we proposed two research directions
systems are thus inadequate to such computation- to further improve the accuracy and efficiency of
ally intensive tasks. It becomes even more seri- traffic prediction on a large scale.

IEEE Network • July/August 2018 45

Authorized licensed use limited to: Sathyabama Institute of Science and Technology. Downloaded on January 30,2025 at 07:51:22 UTC from IEEE Xplore. Restrictions apply.
Acknowledgment [10] R. Yu et al., “Deep Learning: A Generic Approach for
Extreme Condition Traffic Forecasting,” Proc. SIAM ICDM,
2017, pp. 777–85.
This research was supported by the NSF of SZU [11] X. Song, H. Kanasugi, and R. Shibasaki, “DeepTransport:
(No. 2018061); China NSFC Grant (No. 61601308 Prediction and Simulation of Human Mobility and Trans-
and No. 61472259); Guangdong NSF Grant (No. portation Mode at a Citywide Level,” Proc. IJCAI, 2016, pp.
2618–24.
2017A030312008); Guangdong Provincial Science [12] Q. Chen et al., “Learning Deep Representation from Big
and Technology Development Special Foundation and Heterogeneous Data for Traffic Accident Inference,”
(No. 2017A010101033); Shenzhen Science and Tech- Proc. AAAI, 2016, pp. 338–44.
nology Foundation (No. JCYJ20170302140946299 [13] Y. Jia, J. Wu, and Y. Du, “Traffic Speed Prediction using
Deep Learning Method,” Proc. IEEE ITSC, 2016, pp. 1217–
and No. JCYJ20170412110753954); Fok Ying-Tong 22.
Education Foundation for Young Teachers in the [14] X. Niu, Y. Zhu, and X. Zhang, “DeepSense: A Novel Learn-
Higher Education Institutions of China (No. 161064); ing Mechanism for Traffic Prediction with Taxi GPS Traces,”
Guangdong Talent Project (No. 2014TQ01X238 Proc. IEEE GLOBECOM, 2014, pp. 2745–50.
[15] M. Li et al., “Scaling Distributed Machine Learning with the
and No. 2015TX01X111); GDUPS Grant (2015); Parameter Server,” Proc. USENIX OSDI, 2014, pp. 583–98.
the ECS grant from the Research Grants Council of
Hong Kong (No. CityU 21203516); and a GRF grant Biographies
from the Research Grants Council of Hong Kong Zhidan Liu ([email protected]) received the B.E. degree in
(No. CityU 11217817); Singapore MOE Tier 2 grant computer science and technology from Northeastern Universi-
MOE2016-T2-2-023; and NTU CoE grant M4081879. ty, China, in 2009, and the Ph.D. degree in computer science
and technology from Zhejiang University, China, in 2014. He
is currently an assistant professor at Shenzhen University (SZU),
References China. His research interests include distributed sensing and
[1] E. I. Vlahogianni, M. G. Karlaftis, and J. C. Golias, “Short- mobile computing, big data analytics, and urban computing.
Term Traffic Forecasting: Where We Are and Where We’re
Going,” Transportation Research Part C: Emerging Technolo- Z henjiang L i ([email protected]) received the B.E.
gies, vol. 43, part 1, 2014, pp. 3–19. degree in computer science and technology from Xi’an Jiaotong
[2] Y. Lv et al., “Traffic Flow Prediction with Big Data: A Deep University, Xi’an, China, in 2007, the M.Phil. degree in electronic
Learning Approach,” IEEE Trans. Intelligent Transportation and computer engineering and the Ph.D. degree in computer
Systems, vol. 16, no. 2, 2015, pp. 865–73. science and engineering from The Hong Kong University of
[3] Y. LeCun, Y. Bengio, and G. Hinton, “Deep Learning,” Science and Technology (HKUST), Hong Kong, in 2009 and
Nature, vol. 521, no. 7553, 2015, pp. 436–44. 2012, respectively. He is currently an assistant professor at City
[4] Z. Liu et al., “Mining Road Network Correlation for Traffic University of Hong Kong, Hong Kong. His research interests
Estimation via Compressive Sensing,” IEEE Trans. Intelligent include wearable and mobile sensing, deep learning and data
Transportation Systems, vol. 17, no. 7, 2016, pp. 1880–93. mining, distributed and edge computing.
[5] J. Zhang, Y. Zheng, and D. Qi, “Deep Spatio-Temporal
Residual Networks for Citywide Crowd Flows Prediction,” K aishun W u ([email protected]) received his Ph.D. degree in
Proc. of AAAI, 2017, pp. 1655–61. computer science and engineering from HKUST, Hong Kong, in
[6] J. Sun et al., “A Dynamic Bayesian Network Model for Real- 2011. After that, he worked as a research assistant professor at
Time Crash Prediction using Traffic Speed Conditions Data,” HKUST. In 2013, he joined SZU as a professor. He is the inven-
Transportation Research Part C: Emerging Technologies, vol. tor of six U.S. and 43 Chinese pending patents (13 are issued).
54, 2015, pp. 176–86. He received the 2014 IEEE ComSoc Asia-Pacic Outstanding
[7] X. Ma et al., “Learning Traffic as Images: A Deep Convo- Young Researcher Award.
lutional Neural Network for Large-Scale Transportation
Network Speed Prediction,” Sensors, vol. 17, no. 4, 2017, Mo Li ([email protected]) received the B.S. degree in computer
Article No. 818. science and technology from Tsinghua University, China, in
[8] J. Wang et al., “Traffic Speed Prediction and Congestion 2004 and the Ph.D. degree in computer science and engineer-
Source Exploration: A Deep Learning Method,” Proc. IEEE ing from HKUST, Hong Kong, in 2009. He is currently an asso-
ICDM, 2016, pp. 499–508. ciate professor at Nanyang Technological University, Singapore.
[9] X. Mao et al., “Large-Scale Transportation Network Con- His research interests include networked and distributed sens-
gestion Evolution Prediction using Deep Learning Theory,” ing, wireless and mobile, cyber-physical systems, smart city, and
PloS one, vol. 10, no. 3, 2015. urban computing.

46 IEEE Network • July/August 2018

Authorized licensed use limited to: Sathyabama Institute of Science and Technology. Downloaded on January 30,2025 at 07:51:22 UTC from IEEE Xplore. Restrictions apply.

You might also like