0% found this document useful (0 votes)
14 views

T-LSTM A Long Short-Term Memory Neural Network Enhanced by Temporal Information For Traffic Flow Prediction

LSTM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

T-LSTM A Long Short-Term Memory Neural Network Enhanced by Temporal Information For Traffic Flow Prediction

LSTM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Received June 25, 2019, accepted July 3, 2019, date of publication July 22, 2019, date of current version

August 6, 2019.
Digital Object Identifier 10.1109/ACCESS.2019.2929692

T-LSTM: A Long Short-Term Memory Neural


Network Enhanced by Temporal Information
for Traffic Flow Prediction
LUNTIAN MOU 1, (Member, IEEE), PENGFEI ZHAO2 , HAITAO XIE1 , AND YANYAN CHEN 3
1 Beijing Key Laboratory of Traffic Engineering, Beijing University of Technology, Beijing, China
2 Department of Information, Beijing University of Technology, Beijing, China
3 Beijing Engineering Research Center of Urban Transport Operation Guarantee, Beijing University of Technology, Beijing, China

Corresponding author: Yanyan Chen ([email protected])


This work was supported by the National Key Research and Development Program of China under Grant 2017YFC0803903.

ABSTRACT Short-term traffic flow prediction is one of the most important issues in the field of intelligent
transportation systems. It plays an important role in traffic information service and traffic guidance. However,
complex traffic systems are highly nonlinear and stochastic, making short-term traffic flow prediction
a challenging issue. Although long short-term memory (LSTM) has a good performance in traffic flow
prediction, the impact of temporal features on prediction has not been exploited by existing studies. In this
paper, a temporal information enhancing LSTM (T-LSTM) is proposed to predict traffic flow of a single road
section. In view of the similar characteristics of traffic flow at the same time each day, the model can improve
prediction accuracy by capturing the intrinsic correlation between traffic flow and temporal information.
The experimental results demonstrate that our method can effectively improve the prediction performance
and obtain higher accuracy compared with other state-of-the-art methods. Furthermore, we propose a novel
missing data processing technique based on T-LSTM. According to the experimental results, this technique
can well restore the characteristics of original data and improve the accuracy of traffic flow prediction.

INDEX TERMS Traffic flow prediction, missing data repair, temporal features, deep learning, LSTM.

I. INTRODUCTION data-driven Deep Neural Networks (DNN) has been widely


In the field of Intelligent Transportation Systems (ITS), traffic applied in the field of traffic [7]. It can make full use of latent
control and guidance systems are the core topics and to knowledge hidden in big traffic data to forecast traffic flow
which traffic flow prediction is the key. Accurate and realtime and deal with large historical datasets and complex nonlin-
short-term traffic flow prediction can not only provide cru- ear functions. Among the state-of-the-art methods, Recur-
cial travel information for individual travelers, business sec- rent Neural Networks (RNN), Stacked Autoencoder (SAE),
tors, and government agencies, but also play an increasingly
and Deep Belief Networks (DBN) have good performances
important role in easing traffic congestion, reducing carbon
in traffic flow prediction of a single road section [8]–[10].
dioxide emissions, and improving travel safety.
Especially, since the RNN model is a special approach for
During the past four decades, many researchers have been
processing time series, it can capture the temporal character-
trying to provide reliable traffic flow prediction methods.
istics of traffic flow well and is very suitable for traffic flow
However, due to the highly nonlinear and random character-
prediction. Other advanced methods such as Convolutional
istics of traffic flow, it is still a great challenge for traditional
Neural Networks (CNN) or CNN-RNN model have better
methods to make accurate prediction [1]–[5]. Existing para-
performances in traffic network prediction for their strong
metric models-based and nonparametric models-based meth-
learning ability in spatial or spatial-temporal features. In par-
ods mainly use linear models and shallow machine learning
ticular, the framework combining CNN and RNN has become
models to predict incoming traffic flow and cannot describe
standard research configuration for its consideration of the
the nonlinearity and uncertainty well [6].
spatial-temporal characteristics of traffic flow.
With the continued improvement of computing per-
However, all these studies neglect a factor that might have
formance and the wide deployment of traffic sensors,
a strong impact on short-term traffic flow prediction. The
The associate editor coordinating the review of this manuscript and factor is the temporal information itself. Traffic flow is com-
approving it for publication was Yanli Xu. monly recognized to have a strong temporal characteristic
2169-3536 2019 IEEE. Translations and content mining are permitted for academic research only.
VOLUME 7, 2019 Personal use is also permitted, but republication/redistribution requires IEEE permission. 98053
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
L. Mou et al.: T-LSTM: LSTM Neural Network Enhanced by Temporal Information

and a similar trend is demonstrated by daily traffic flow [11]. empirical data [19]. Among parametric models, ARIMA is
Specifically, traffic flow is heavy during commuting hours one of the most widely used. It was proposed in 1970s to
and relatively light in the early hours of each day. Con- predict short-term freeway traffic data [20]. Then, schol-
versely, when a traffic flow sample is obtained, its moment ars made some improvements on the ARIMA model and
can be roughly inferred, but cannot be accurately inferred. proposed a series of variant models such as Kohonen-
Therefore, if the temporal information and the traffic flow ARIMA [21], subset ARIMA [22], Autoregressive Moving
are considered simultaneously, deep neural networks may be Average model (ARMA) [23], and seasonal ARIMA [24].
capable of learning higher-level temporal representations and In addition, Kalman Filter is another commonly used parame-
achieve better results. More importantly, the trend modeling ter model. It has been successfully applied in traffic flow pre-
study found that residual traffic flow as a kind of important diction and has exhibited a superior capability of conducting
trend information can reflect the time-variant fluctuation and online learning [25]. Although the above parametric models
the prediction accuracy can be improved when the residual improve the performance of traffic flow prediction, due to the
traffic flow is fed into the model [12]–[16]. We believe that nonlinearity and randomness of traffic flow, these relatively
the temporal information itself could also be used as a kind of simple and inflexible models cannot accurately capture the
trend information and should be considered in the prediction characteristics of traffic flow [10].
model because it is closely related to the time-variant fluctua- As a result, researchers have begun to focus on non-
tion of traffic flow. Existing traffic flow forecasting methods parametric models, such as nonparametric regression [26],
based on deep learning take only traffic flow as the input to SVM [27], Online Support Vector Machine (OL-SVM) [28],
neural networks. Thus, these methods might not capture the KNN [29], and Neural Networks (NN) [30]. Among the
temporal characteristics of traffic flow completely. Besides, above models, NN has the best performance and is considered
most existing studies have not paid adequate attention to the another popular model for traffic flow prediction due to its
processing of missing data and often use adjacent values to fill powerful ability in processing multidimensional data, flexi-
missing values, which hinders improvement of the prediction ble model structures, strong generalization ability as well as
accuracy [17]. adaptability [31]. However, due to the shallow structure of the
The main contributions of this paper are as follows: aforementioned models, it is still a great challenge to make
1) For the first time, we propose a Temporal information accurate traffic flow prediction.
enhancing Long Short-Term Memory neural networks Recently, with the resurgence of deep learning, neural
(T-LSTM) that combines recurrent time labels with networks with multilayer nonlinear structures have been
recurrent neural networks, which makes the best use widely used in pattern recognition, classification, and predic-
of the temporal features to improve the accuracy of tion [32]–[34]. Compared with traditional shallow structure,
short-term traffic flow prediction. deep neural networks can use distributed and hierarchical
2) The performance of our proposed model has been eval- feature representation to model the deep complex nonlinear
uated against a variety of comparison models, which relationship of traffic flow. In 2014, Huang et al. employed a
include Gated Recurrent Unit (GRU), SAE, DBN, DBN with multitask learning for traffic flow prediction [35].
LSTM, Support Vector Machine (SVM), K-nearest To achieve traffic flow forecasting for the next day, Li et al.
neighbor (KNN), Feed Forward Neural Networks proposed an advanced multi-objective particle swarm opti-
(FFNN), and Autoregressive Integrated Moving Aver- mization algorithm to optimize some parameters in DBN and
age model (ARIMA). The result is that our model enhance its multiple step prediction ability [36]. Lv et al.
has achieved the best performance. The reason for not proposed an SAE and demonstrated that the model is superior
including CNN related models is that the traffic flow to FFNN, Random Walk (RW), SVM, and Radial Basis Func-
data of a road network is currently not available to us so tion (RBF) [17]. These models all belong to fully-connected
that traffic flow prediction of only a single road section structure and there are no assumptions about the features
is considered in this paper. in the fully-connected architecture. Thus, it is difficult for
3) In view of the deficiencies in the processing of missing the fully-connected neural networks to capture representative
data, a new missing data repair technique is proposed features from a dataset with plentiful characteristics [37].
based on the proposed T-LSTM model to maximally In order to solve these issues, researchers proposed RNN
recover the characteristics of the raw data. To the best of and CNN based models, which can capture the nonlinearity
our knowledge, it is the first time that an LSTM-based and randomness of traffic flow more effectively and have
model is used for missing data repair. become basic models to forecast traffic flow. LSTM and
GRU, the variants of RNN, have superior capability for time
II. RELATED WORK series prediction with long temporal dependency and tempo-
Generally, traffic flow forecasting methods can be divided ral features learning ability. In 2015, Tian applied LSTM to
into two categories of parametric models and nonparametric short-term traffic flow prediction for the first time and proved
models [18]. Parametric models refer to the models where that the model is superior to SVM, FFNN, and SAE [11].
the structure is predetermined based on certain theoreti- Jia found that with the combination input of speed and
cal assumptions and the parameters can be computed with weather information, LSTM has better prediction accuracy

98054 VOLUME 7, 2019


L. Mou et al.: T-LSTM: LSTM Neural Network Enhanced by Temporal Information

and outperforms DBN in capturing the temporal characteris-


tics of traffic speed [38].
Another type of most successful deep neural networks is
CNN. The traffic flow information of a traffic network is
first mapped into a series of images and then fed into a
CNN [39]. CNN-based methods have strong spatial features
modeling ability and are widely used for traffic network pre-
diction. However, CNN-based methods usually cannot map
multiple types of information simultaneously into the images.
The information includes traffic, speed, density, and other
factors important for traffic flow prediction. Therefore, the
combination of convolutional and recurrent neural networks
FIGURE 1. The recurrent structure of LSTM. t represents the timestep,
has become an important research direction. In the model, x represents the input vector, h represents the output vector, c represents
CNN is used to capture spatial features while RNN is for the state vector, [·, ·] is for connecting vector and W represents the
temporal features. Wu et al. proposed a CNN-RNN model weight matrix.

to improve prediction accuracy, which makes full use of


weekly/daily periodicity and spatial-temporal characteristics
of traffic flow [37]. Duan combined CNN and RNN to pre- block contains one or more self-connected memory cells and
dict urban traffic flow. Experimental results with real taxis’ three multiplicative cells: an input gate, an output gate, and a
GPS trajectory data from Xi’an city show that the model forgetting gate, which perform a continuous simulation of the
can achieve higher prediction accuracy and shorter time con- write, read, and reset operations of a cell. As shown in Fig. 1,
sumption compared with existing methods [40], [41]. the forgetting gate ft controls which information needs to be
However, as we mentioned above, these studies neglect the discarded from the state ct−1 at the previous moment. Thus,
temporal information and may lead to the fact that the models it can ignore irrelevant features and automatically determine
used cannot effectively learn the temporal characteristics of the optimal input. The input gate it determines which state the
traffic flow. In the process of training, the neural networks unit needs to be updated with. Therefore, it has the long-term
need to divide the continuous traffic data into training sam- memory ability. The output gate ot will filter output based on
ples according to different inputs and outputs, resulting in the the state of the unit. In Fig. 1, x represents the input vector,
disruption of the data that are originally continuous in time. h represents the output vector, and W represents the weight
In addition, the temporal information closely related to traffic matrix. Then, symbols ⊗ and ⊕ represent Element-wise Mul-
flow has not been fed into the existing models. Thus, the tiplication and Element-wise Concatenation respectively. It is
models cannot learn the relationship between traffic flow and worth mentioning that the expressions of functions σ and tanh
corresponding temporal information and cannot capture the will be given below.
temporal characteristics of traffic flow adequately. Besides,
existing studies do not pay enough attention to processing of B. RECURRENT TIME LABEL
missing data and often use adjacent values to replace missing Traffic flow at the same moment of each day has similar char-
values approximately. Therefore, we propose the T-LSTM acteristics and similar M-shaped intra-day trends maintain
model that makes the best use of the temporal characteristics over consecutive days. From the short-term trend, the evo-
to improve the accuracy of short-term traffic flow predic- lution of traffic flow is closely related to the time (e.g., traffic
tion. Furthermore, we propose a T-LSTM missing data repair flow is heavy during commuting hours and relatively light
method to achieve maximum recovery of the characteristics in the early hours of each day). From the long-term trend,
of traffic flow. the traffic volume at the same moment each day varies within
a certain range, as shown in Fig. 2.
III. THE T-LSTM MODEL COMBINING RECURRENT In order to express the evolution of traffic flow at the
NEURAL NETWORKS AND RECURRENT TIME LABEL same time more clearly, the following indicator is defined,
A. RECURRENT NEURAL NETWORKS assuming T samples per day and continued sampling for N
LSTM is a variant of RNN that overcomes the gradient days:
disappearance of the RNN model. It exhibits a superior y1 , y1 , · · · yT1
 1 2 
capability of modeling nonlinear time series problems in an
 t   y12 , y22 , · · · yT2 
 
effective fashion. The primary objectives of LSTM are to ,
Y = yi =   (1)
model long-term dependencies and determine the optimal ············ 

input length via three multiplicative units [9]. y1N , y2N , · · · yTN
The LSTM model is composed of the input layer, the recur-
rent layers whose basic unit is memory block instead of where i represents the ith day (i ∈ [1, N ]) and trepresents
traditional neuron node, and the output layer. The memory the index of the traffic flow at time t of a day (t ∈ [1, T ]).
block is a set of recurrently connected subnets. Each memory So, yti represents the traffic flow at the time t of the ith day.

VOLUME 7, 2019 98055


L. Mou et al.: T-LSTM: LSTM Neural Network Enhanced by Temporal Information

be clearly observed that the characteristics of the traffic flow


at the same time are very similar. So, it can be viewed as a
proof for the correlation between the temporal information
and the traffic flow. Thus, the prediction accuracy may be
improved via a comprehensive consideration of the temporal
information and the traffic flow.
In this paper, we combine time label and LSTM to
fully explore the temporal characteristics of traffic flow and
improve the accuracy of short-term traffic flow prediction.
We pay sufficient attention to time information and add a time
label to the traffic flow at each moment. Then an LSTM-based
model is trained with the samples and corresponding time
labels. The model is named T-LSTM, an LSTM model
enhanced by temporal information. When GRU is combined
FIGURE 2. The traffic flow at 4:00 am, 8:00 am, 6:00 pm, and 11:00 pm
from August 1 to 30, 2014. The time interval is 16 minutes and more with recurrent time label, the model is called T-GRU.
details about the dataset will be given in section IV. In our study, l t is used to represent the time labels at
the time t every day and xit to represent the traffic flow at
the time t on the ith day. So, the input xt = [l t , xit ] is a
The traffic flow series and the average traffic flow at the 2D vector. The time labels change periodically according to
time t in N days can be written as the sampling time of each day. For example, if the sampling
time interval is 16 minutes, 90(24 × 60 ÷ 16) pieces of
yt = [yt1 , yt2 , · · · ytN ], (2)
samples and 90 labels will be generated each day. The data
N
1 X of 00:16 each day is labeled with 1, while that of 00:32 is
ytAverage = yti , (3)
N labeled with 2, and so on. Thus, after a round of 24 hours,
i=1
traffic data at 00:00 each day is marked with 90. Finally, each
where the ytAverage reflects the average traffic flow at xt is a two-dimensional vector with time label and traffic flow.
time t in N days. Then, Mean Absolute Percentage Fluctu- Assuming that the input historical traffic flow sequence is
ation (MAPF) at time t in N days is defined as denoted as x = (x1 , x2 , · · · · · · xt ), the predicted traffic flow
N t t sequence h = (h1 , h2 , · · · · · · · ht ) is literally outputted by the
1 X yAverage − yi
MAPF t = × 100%. (4) following equations:
N ytAverage
i=1
ft = σ (Wf · [ht−1 , xt ] + bf ), (5)
Here, MAPF t denotes the average variation of traffic flow it = σ (Wi · [ht−1 , xt ] + bi ), (6)
over N days at time t.
c̃t = tanh(Wc · [ht−1 , xt ] + bc ), (7)
ct = (ft ⊗ ct−1 ) ⊕ (it ⊗ c̃t ), (8)
ot = σ (Wo · [ht−1 , xt ] + bo ), (9)
ht = ot ⊗ tanh(ct ), (10)
where W terms denote weight matrices, and b terms denote
bias vectors. And other mathematical symbols are the same
as defined above. The standard logistics sigmoid function σ
and the hyperbolic function tanh are defined as follows:
1
σ (x) = , (11)
1 + e−x
ex − e−x
tanh(x) = x . (12)
e + e−x
IV. EXPERIMENTS
FIGURE 3. The range of the MAPE. The red lines represent the maximum
and minimum of MAPF respectively. It is calculated based on the traffic
A. DATASET
flow within 24 hours of 30 consecutive days since August 1, 2014. The traffic detector data from Shibalidian Bridge to Hongyan
Bridge of the East Fourth Ring Road in Beijing (March 1 to
As an example, the MAPF calculated from August 1 to August 30, 2014) is selected to validate the T-LSTM. The
30, 2014 is shown in Fig. 3. The MAPF at differ- original sampling time interval is 2 minutes and each detector
ent time (24 hours) changes between 2.35% and 19.37% will generate 720 pieces of data every day. For convenience
within 30 days and the maximum does not exceed 20%. It can of processing, the data of the 31st of March, May, July,

98056 VOLUME 7, 2019


L. Mou et al.: T-LSTM: LSTM Neural Network Enhanced by Temporal Information

and August is deleted. So, there will be a total of 129,600 set to the same of 16. As shown in Fig. 4, the most popular
(720 × 30 × 6) samples. The data include speed, flow, date, Rectified Linear Unit (ReLu) is applied as the activation
and density. The scientific computing library Pandas is used function of the hidden layers and Sigmoid is for the output
to remove the duplicate and anomalous data. The missing layer.
data rate is 17% and the maximum number of consecutive
missing data is 414. Note that we just use historical average TABLE 1. Key Hyperparameters of T-LSTM.
value to replace the missing data in the traffic flow prediction
experiment.
For purpose of research and analysis, the Highway Capac-
ity Manual suggests to use 15 minutes as short-term predic-
tion interval [42]. However, the time interval of our original
data is 2 minutes. Therefore, in this paper, the data are aggre-
gated into the time intervals of 16 minutes. So, there are 90 Additionally, other hyperparameters have been deter-
(720×2÷16) pieces of data and corresponding time labels for mined, including Loss, Optimizer, Batch_size, and Epochs
each day. The data of the first five months are used for training as shown in Table 1. Adaptive Moment Estimation (Adam)
and the data of August are used for testing. Finally, the data is used to optimize the neural networks and it can calculate
are normalized to [0, 1] by the Min-Max Scaler normalization the adaptive learning rate for each parameter [43]. In prac-
method in the Scikit-learn library. tical applications, the Adam method works well. Compared
with other adaptive learning rate algorithms, it has faster
B. EXPERIMENT DESIGN convergence, more effective learning effects, and can correct
The proposed T-LSTM model is implemented using Tensor- problems in other optimization methods. In addition, Mean
Flow and Python language. The workstation used is config- Square Error (MSE) is the most commonly used regression
ured with an Intel i7-4790 3.6 GHz CPU, a 32 GB memory, loss function, which calculates the sum of the squares of the
and an NVIDA GTX 1080 Ti GPU. distance between the predicted value and the true value.

1) TRAFFIC FLOW PREDICTION 2) MISSING DATA REPAIR


The most notable difference between this experiment and Since LSTM has strong time series data processing capa-
existing experiments is that the T-LSTM is implemented bility, and it can predict pretty well the traffic status at the
to make the best use of the temporal characteristics to next moment with the historical data [10], we propose an
improve the prediction accuracy. Three LSTM layers are LSTM-based missing data repair technique that can achieve
stacked so that the model is capable of learning higher-level maximum recovery of the characteristics of traffic flow.
temporal representations (see Fig. 4). Input feature xt is a The experiment of missing data repair is performed on the
two-dimensional vector with the time label and the traffic raw data with a time interval of 2 minutes. No algorithm can
flow. The timestep is set to 8 (i.e., 8 historical data are used to effectively use the missing data, but valid data can be used
predict traffic flow at the next moment). Therefore, the input to infer the missing values as much as possible. Therefore,
to the LSTM model is a matrix of 8 × 2. For simplicity, we removed all discontinuous data and trained the model
the number of neurons in each hidden layer is empirically with real values. Note that the same T-LSTM model is used
to repair missing data. Based on the traffic flow prediction
experiment above, the timestep is set to 1 and the other Hyper-
parameters remain unchanged. Thus, any two consecutive
data can be used to train the T-LSTM to infer missing data.
Then, the trained model is used to repair the missing data and
the repaired data will finally be used for short-term traffic
flow prediction.

C. EXPERIMENTAL RESULTS AND ANALYSIS


As our model is to predict traffic flow at next timestep, the
evaluation criteria include accuracy metrics that compare the
predicted traffic flow with the real traffic flow. So, Root
Mean Square Error (RMSE) and Mean Absolute Percentage
Error (MAPE) are used to evaluate the performance of the
model. They are defined by (13) and (14) respectively as
follows:
" n #1/2
FIGURE 4. The structure of the T-LSTM model. The three layers of LSTMs 1X
are stacked as the hidden layers and the one-layer fully connected layer RMSE = (|fi − pi |) 2
, (13)
is stacked as the output layer.
n
i=1

VOLUME 7, 2019 98057


L. Mou et al.: T-LSTM: LSTM Neural Network Enhanced by Temporal Information

n
1 X |fi − pi |
MAPE = , (14)
n fi
i=1
where f is the observed value of the traffic flow while p is the
predicted value, and n represents the number of samples.

1) RESULTS OF TRAFFIC FLOW PREDICTION


In this subsection, we compare the proposed T-LSTM model
with existing models of SAE, DBN, GRU, LSTM, SVM,
KNN, FFNN, and ARIMA (1, 0, 1) in terms of effectiveness
under same conditions. Table 2 shows the prediction results
of different models and corresponding input information. The
prediction results for August 2014 demonstrate that T-LSTM
has the highest prediction accuracy and the MAPE is reduced
to 6.09%. Obviously, adding the time labels can improve FIGURE 5. Prediction results of T-LSTM. The picture shows the forecast
the prediction performance of LSTM. Compared with LSTM results of the T-LSTM model from August 1st to 4th, 2014, which is
randomly selected.
without time labels, the RMSE and MAPE of T-LSTM
decreased by 13.4 and 1.44%, respectively. More importantly,
when the LSTM is replaced by GRU as the recurrent structure that the RMSE of the SVM is relatively low, but its MAPE
in T-LSTM, the prediction accuracy of T-GRU is also signif- is pretty high. That is because SVM has poor prediction
icantly improved. Thus, the results strongly demonstrate that performance when traffic flow is low.
the temporal information is critical for short-term traffic flow
prediction, and can effectively improve the prediction accu- 2) RESULTS OF MISSING DATA REPAIR
racy. As it can be seen from the experimental results, the more In the above experiment, the average of historical values
complex LSTM and GRU have no significant improvement in is used to replace the missing data. In order to verify the
prediction performance compared with the simple structure effectiveness of the proposed missing data repair technique,
of FFNN. However, T-LSTM exhibits strong temporal fea- the following experiments are conducted. As mentioned in
tures learning ability and the prediction performance is sig- the previous section, the raw valid data is used to train
nificantly improved when the temporal information is added. the T-LSTM first and then the trained model is applied in
Fig. 5 shows randomly selected partial prediction results of inferring the missing data. Finally, the repaired data is aggre-
T-LSTM. gated into time intervals of 16 minutes and then the traffic
flow prediction experiment is re-executed to achieve new
TABLE 2. Comparison of the Results and the Input Information.
results.
As can be seen from Table 3, after missing data repair
by T-LSTM, the prediction performance of all the models,
namely, T-LSTM, T-GRU, SAE, DBN, FFNN, SVM, kNN,
and ARIMA (1, 0, 1) has been improved to some extent.
Specifically, the RMSE of T-LSTM, T-GRU, SAE, DBN,
FFNN, KNN, and ARIMA has declined notably. Except for
SVM, the MAPE of other models has also been reduced. The
reason for this anomaly might be that SVM is not very good
at modeling when traffic flow is very low. From the overall
prediction results, we can find that the data processed by the
proposed technique can improve the accuracy of short-term
traffic flow prediction. With the powerful high-dimensional
data processing ability, T-LSTM can accurately infer miss-
ing data and restore the original characteristics of traffic
flow.
Moreover, the proposed T-LSTM based data repair tech-
nique can not only accurately infer random missing data but
also effectively recover data with a large number of consecu-
Furthermore, according to the RMSE, it can be clearly tive missing values. Fig. 6 shows that T-LSTM can accurately
found that the prediction results based on deep neural recover the evolution of traffic flow with only one piece of
networks are better than those of classic models such as historical data. When there is a large amount of missing data,
ARIMA, KNN, and SVM. The seemingly strange result is T-LSTM can infer the first missing value based on the valid

98058 VOLUME 7, 2019


L. Mou et al.: T-LSTM: LSTM Neural Network Enhanced by Temporal Information

TABLE 3. Prediction Results After T-LSTM Repair. processed can notably improve the accuracy of short-term
traffic flow prediction. Currently, we only forecast traffic
flow of a section of the road without considering traf-
fic flow of the road network. In the future, we will fur-
ther this research into predicting traffic flow of the road
network and implement more comparative experiments as
well.

REFERENCES
[1] Y.-J. Duan, Y.-S. Lv, J. Zhang, X.-L. Zhao, and F.-Y. Wang, ‘‘Deep learning
for control: The state of the art and prospects,’’ Acta Automatica Sinica,
vol. 42, no. 5, pp. 643–654, May 2016.
[2] X. Zheng, W. Chen, P. Wang, D. Shen, S. Chen, X. Wang, Q. Zhang, and
L. Yang, ‘‘Big data for social transportation,’’ IEEE Trans. Intell. Transp.
Syst., vol. 17, no. 3, pp. 620–630, Mar. 2016.
[3] X. Wang, X. Zheng, Q. Zhang, T. Wang, and D. Shen, ‘‘Crowdsourcing in
ITS: The state of the work and the networking,’’ IEEE Trans. Intell. Transp.
Syst., vol. 17, no. 6, pp. 1596–1605, Jun. 2016.
[4] L. Li, Y. Lv, and F.-Y. Wang, ‘‘Traffic signal timing via deep reinforcement
learning,’’ IEEE/CAA J. Automatica Sinica, vol. 3, no. 3, pp. 247–254,
Jul. 2016.
[5] Y. Lv, Y. Chen, X. Zhang, Y. Duan, and N. L. Li, ‘‘Social media based trans-
portation research: The state of the work and the networking,’’ IEEE/CAA
J. Autom. Sinica, vol. 4, no. 1, pp. 19–26, Jan. 2017.
[6] R. Fu, Z. Zhang, and L. Li, ‘‘Using LSTM and GRU neural network meth-
ods for traffic flow prediction,’’ in Proc. 31st Youth Academic Annu. Conf.
Chin. Assoc. Automat. (YAC), Wuhan, China, Nov. 2016, pp. 324–328.
[7] P. Lingras, S. Sharma, and M. Zhong, ‘‘Prediction of recreational travel
using genetically designed regression and time-delay neural network
models,’’ IEEE/CAA J. Autom. Sinica, vol. 1805, no. 1, pp. 16–24,
Jan. 2002.
[8] S. Hochreiter and J. Schmidhuber, ‘‘Long short-term memory,’’ Neural
Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
[9] X. Ma, Z. Tao, Y. Wang, H. Yu, and Y. Wang, ‘‘Long short-term memory
neural network for traffic speed prediction using remote microwave sensor
data,’’ Transp. Res. C, Emerg. Technol., vol. 54, pp. 187–197, May 2015.
[10] Y. Tian and L. Pan, ‘‘Predicting short-term traffic flow by long short-
term memory recurrent neural network,’’ in Proc. IEEE Int. Conf. Smart
City/SocialCom/SustainCom (SmartCity), Chengdu, China, Dec. 2015,
pp. 153–158.
[11] Y. Tian, K. Zhang, J. Li, X. Lin, and B. Yang, ‘‘LSTM-based traffic flow
prediction with missing data,’’ Neurocomputing, vol. 318, pp. 297–305,
FIGURE 6. Missing data repair results of T-LSTM. The time interval is
2 minutes and data from 12:00 on August 14, 2014 to 10:00 on Nov. 2018.
August 15 were used as a test set. [12] B. L. Smith, B. M. Williams, and R. K. Oswald, ‘‘Comparison of paramet-
ric and nonparametric models for traffic flow forecasting,’’ Transp. Res. C,
Emerg. Technol., vol. 10, no. 4, pp. 303–321, Aug. 2002.
[13] C. Chen, Y. Wang, L. Li, J. Hu, and Z. Zhang, ‘‘The retrieval of intra-
day trend and its influence on traffic prediction,’’ Transp. Res. C, Emerg.
historical data and temporal information, and then use the Technol., vol. 22, pp. 103–118, Jun. 2012.
inferred data and temporal information to continue to infer [14] Z. Li, Y. Li, and L. Li, ‘‘A comparison of detrending models and multi-
the next missing data. regime models for traffic flow prediction,’’ IEEE Intell. Transp. Syst. Mag.,
vol. 6, no. 4, pp. 34–44, Oct. 2014.
[15] L. Li, X. Su, Y. Zhang, Y. Lin, and Z. Li, ‘‘Trend modeling for traffic time
V. CONCLUSION AND FUTURE WORK series analysis: An integrated study,’’ IEEE Trans. Intell. Transp. Syst.,
In this paper, the recurrent time labels and the recurrent vol. 16, no. 6, pp. 3430–3439, Dec. 2015.
[16] X. Dai, R. Fu, E. Zhao, Z. Zhang, Y. Lin, F.-Y. Wang, and L. Li, ‘‘Deeptrend
networks are combined and a T-LSTM model is proposed 2.0: A light-weighted multi-scale traffic prediction model using detrend-
for short-term traffic flow prediction. The addition of tem- ing,’’ Transp. Res. C, Emerg. Technol., vol. 103, pp. 142–157, Jun. 2019.
poral information as input to the T-LSTM is effective in [17] Y. Lv, Y. Duan, W. Kang, Z. Li, and F.-Y. Wang, ‘‘Traffic flow prediction
with big data: A deep learning approach,’’ IEEE Trans. Intell. Transp. Syst.,
improving the accuracy of short-term traffic flow prediction. vol. 16, no. 2, pp. 865–873, Apr. 2015.
In experiments, it is evaluated against GRU, SAE, DBN, [18] H. van Lint and C. van Hinsbergen, ‘‘Short-term traffic and travel time
LSTM, SVM, KNN, FFNN, and ARIMA (1, 0, 1). The prediction models,’’ Transp. Res. Circular, vol. 43, no. E-C168, pp. 22–41,
Nov. 2012.
results show that temporal information is crucial for traffic [19] E. I. Vlahogianni, M. G. Karlaftis, and J. C. Golias, ‘‘Short-term traffic
flow prediction and can effectively improve the prediction forecasting: Where we are and where we’re going,’’ Transp. Res. C, Emerg.
performance of the LSTM and GRU models. Furthermore, Technol., vol. 43, pp. 3–19, Jun. 2014.
[20] M. S. Ahmed and A. R. Cook, ‘‘Analysis of freeway traffic time-series data
for the first time, we propose a technique of missing data by using box-Jenkins techniques,’’ Transp. Res. Rec., vol. 773, no. 722,
repair based on T-LSTM and the results show that the data pp. 1–19, Jan. 1979.

VOLUME 7, 2019 98059


L. Mou et al.: T-LSTM: LSTM Neural Network Enhanced by Temporal Information

[21] M. Van Der Voort, M. Dougherty, and S. Watson, ‘‘Combining LUNTIAN MOU (M’12) received the B.S. degree
Kohonen maps with ARIMA time series models to forecast traffic in computer application from the Minzu University
flow,’’ Transp. Res. C, Emerg. Technol., vol. 4, no. 5, pp. 307–318, of China, Beijing, China, in 1999, and the Ph.D.
Oct. 1996. degree in computer application from the Univer-
[22] S. Lee and D. B. Fambro, ‘‘Application of subset autoregressive integrated sity of Chinese Academy of Sciences, Beijing,
moving average model for short-term freeway traffic volume forecasting,’’ in 2012.
Transp. Res. Rec., vol. 1678, no. 1, pp. 179–188, Nov. 1999. From 2012 to 2014, he was a Postdoctoral
[23] K. Yiannis and P. Poulicos, ‘‘Forecasting traffic flow conditions in an urban
Research Fellow with the National Engineering
network: Comparison of multivariate and univariate approaches,’’ Transp.
Laboratory for Video Technology, Peking Univer-
Res. Rec., vol. 1857, no. 1, pp. 74–87, Jan. 2003.
[24] B. M. Williams and L. A. Hoel, ‘‘Modeling and forecasting vehicular sity. Since 2014, he has been an Assistant Pro-
traffic flow as a seasonal ARIMA process: Theoretical basis and empirical fessor with the College of Metropolitan Transportation, Beijing University
results,’’ J. Transp. Eng., vol. 129, no. 6, pp. 664–672, Nov. 2003. of Technology. He is the first coauthor of two international standards and
[25] J. Zhang, F.-Y. Wang, K. Wang, W.-H. Lin, X. Xu, and C. Chen, ‘‘Data- two China standards. His research interests include intelligent transportation,
driven intelligent transportation systems: A survey,’’ IEEE Trans. Intell. intelligent multimedia, machine learning, and artificial intelligence. He has
Transp. Syst., vol. 12, no. 4, pp. 1624–1639, Dec. 2011. undertaken or taken part in more than ten national technological research
[26] A. Rosenblad, ‘‘J. J. Faraway: Extending the linear model with R: General- projects such as the NSFC key programs and general programs.
ized linear, mixed effects and nonparametric regression models,’’ Comput. Dr. Mou was a recipient of a certificate of Outstanding Contributor for the
Statist., vol. 24, no. 2, pp. 369–370, May 2009. 15th Anniversary of AVS, in 2017, and two certificates of the Outstanding
[27] Y. Zhang and Y. Liu, ‘‘Traffic forecasting using least squares support vector Contributor for the IEEE 1857 Standards, in 2013 and 2014.
machines,’’ Transportmetrica, vol. 5, no. 3, pp. 193–213, Jul. 2009.
[28] M. Castro-Neto, Y.-S. Jeong, M.-K. Jeong, and L. D. Han, ‘‘Online-
SVR for short-term traffic flow prediction under typical and atypical
traffic conditions,’’ Expert Syst. Appl., vol. 36, no. 3, pp. 6164–6173,
2009.
[29] L. Zhang, Q. Liu, W. Yang, N. Wei, and D. Dong, ‘‘An improved K-nearest PENGFEI ZHAO was born in Shijiazhuang,
neighbor model for short-term traffic flow prediction,’’ Procedia-Social
Hebei, China, in 1995. He received the B.S.
Behav. Sci., vol. 96, pp. 653–662, Nov. 2013.
degree in information and computing sciences
[30] H. Chang, Y. Lee, B. Yoon, and S. Baek, ‘‘Dynamic near-term traffic
flow prediction: Systemoriented approach based on past experiences,’’ IET from Shijiazhuang University, in 2018. He is cur-
Intell. Transp. Syst., vol. 6, no. 3, pp. 292–305, Sep. 2012. rently pursuing the M.S. degree in pattern recog-
[31] M. G. Karlaftis and E. I. Vlahogianni, ‘‘Statistical methods versus neural nition and intelligent system with the Department
networks in transportation research: Differences, similarities and some of Information, Beijing University of Technology.
insights,’’ Transp. Res. C, Emerg. Technol., vol. 19, no. 3, pp. 387–399, His research interests include intelligent trans-
Jun. 2011. portation, machine learning, computer vision, and
[32] K. Mannepalli, P. N. Sastry, and M. Suman, ‘‘A novel adaptive fractional pattern recognition.
deep belief networks for speaker emotion recognition,’’ Alexandria Eng.
J., vol. 56, no. 4, pp. 485–497, Dec. 2017.
[33] L. Zhao, Y. Zhou, H. Lu, and H. Fujita, ‘‘Parallel computing method of
deep belief networks and its application to traffic flow prediction,’’ Knowl.-
Based Syst., vol. 163, pp. 972–987, Jan. 2019.
[34] H. Lee, P. T. Pham, Y. Largman, and A. Y. Ng, ‘‘Unsupervised feature HAITAO XIE was born in Zhangjiakou, Hebei,
learning for audio classification using convolutional deep belief networks,’’ China, in 1993. She received the B.S. degree in
in Proc. NIPS, Vancouver, BC, Canada, 2009, pp. 1096–1104. network engineering from the Hebei Normal Uni-
[35] W. Huang, G. Song, H. Hong, and K. Xie, ‘‘Deep architecture for versity of Science and Technology, in 2017. She
traffic flow prediction: Deep belief networks with multitask learn-
is currently pursuing the M.S. degree in pattern
ing,’’ IEEE Trans. Intell. Transp. Syst., vol. 15, no. 5, pp. 2191–2201,
recognition and intelligent system with the College
Oct. 2014.
[36] L. Li, L. Qin, X. Qu, J. Zhang, Y. Wang, and B. Ran, ‘‘Day-ahead
of Metropolitan Transportation, Beijing University
traffic flow forecasting based on a deep belief network optimized by the of Technology. Her research interests include intel-
multi-objective particle swarm algorithm,’’ Knowl.-Based Syst., vol. 172, ligent transportation, machine learning, computer
pp. 1–14, May 2019. vision, and pattern recognition.
[37] Y. Wu, H. Tan, L. Qin, B. Ran, and Z. Jiang, ‘‘A hybrid deep learning based
traffic flow prediction method and its understanding,’’ Transp. Res. C,
Emerg. Technol., vol. 90, pp. 166–180, May 2018.
[38] Y. Jia, J. Wu, M. Ben-Akiva, R. Seshadri, and Y. Du, ‘‘Rainfall-integrated
traffic speed prediction using deep learning method,’’ IET Intell. Transport
Syst., vol. 11, no. 9, pp. 531–536, Nov. 2017. YANYAN CHEN received the Ph.D. degree in civil
[39] X. Ma, Z. Dai, Z. He, J. Ma, Y. Wang, and Y. Wang, ‘‘Learning traffic engineering from the Harbin Institute of Technol-
as images: A deep convolutional neural network for large-scale trans- ogy, Harbin, China, in 1997.
portation network speed prediction,’’ Sensors, vol. 17, no. 4, p. 818, Since 1999, she has been with the Beijing Uni-
Apr. 2017. versity of Technology. In 2004, she was with the
[40] Z. Duan, Y. Yang, K. Zhang, Y. Ni, and S. Bajgain, ‘‘Improved deep hybrid London Imperial College, as a Visiting Professor.
networks for urban traffic flow prediction using trajectory data,’’ IEEE She was a Postdoctoral Research Fellow for the
Access, vol. 6, pp. 31820–31827, 2018. next two years. She is currently the Dean of the
[41] Z.-T. Duan, K. Zhang, Y. Yang, Y.-Y. Ni, and S. Bajgain, ‘‘Taxi
College of Metropolitan Transportation, Beijing
demand prediction based on CNN-LSTM-ResNet hybrid depth learning
University of Technology. She has undertaken
model,’’ J. Transp. Syst. Eng. Inf. Technol., vol. 18, no. 4, pp. 215–223,
Aug. 2018. nearly 30 research projects granted by national and provincial science funds,
[42] A. D. May, N. Rouphail, L. Bloomberg, and F. Hall, ‘‘Freeway systems published more than 100 articles in related journals and six academic books.
research beyond highway capacity manual 2000,’’ Transp. Res. Rec., Her research interests include urban transportation planning and manage-
vol. 1776, no. 1, pp. 1–9, Jan. 2001. ment, big data, and ITS. She is the Co-Chair of the Urban Transportation
[43] D. P. Kingma and J. L. Ba, ‘‘Adam: A method for stochastic optimization,’’ Committee in China Highway Society.
in Proc. ICLR, San Diego, CA, USA, 2015, pp. 1–13.

98060 VOLUME 7, 2019

You might also like