T-LSTM A Long Short-Term Memory Neural Network Enhanced by Temporal Information For Traffic Flow Prediction
T-LSTM A Long Short-Term Memory Neural Network Enhanced by Temporal Information For Traffic Flow Prediction
August 6, 2019.
Digital Object Identifier 10.1109/ACCESS.2019.2929692
ABSTRACT Short-term traffic flow prediction is one of the most important issues in the field of intelligent
transportation systems. It plays an important role in traffic information service and traffic guidance. However,
complex traffic systems are highly nonlinear and stochastic, making short-term traffic flow prediction
a challenging issue. Although long short-term memory (LSTM) has a good performance in traffic flow
prediction, the impact of temporal features on prediction has not been exploited by existing studies. In this
paper, a temporal information enhancing LSTM (T-LSTM) is proposed to predict traffic flow of a single road
section. In view of the similar characteristics of traffic flow at the same time each day, the model can improve
prediction accuracy by capturing the intrinsic correlation between traffic flow and temporal information.
The experimental results demonstrate that our method can effectively improve the prediction performance
and obtain higher accuracy compared with other state-of-the-art methods. Furthermore, we propose a novel
missing data processing technique based on T-LSTM. According to the experimental results, this technique
can well restore the characteristics of original data and improve the accuracy of traffic flow prediction.
INDEX TERMS Traffic flow prediction, missing data repair, temporal features, deep learning, LSTM.
and a similar trend is demonstrated by daily traffic flow [11]. empirical data [19]. Among parametric models, ARIMA is
Specifically, traffic flow is heavy during commuting hours one of the most widely used. It was proposed in 1970s to
and relatively light in the early hours of each day. Con- predict short-term freeway traffic data [20]. Then, schol-
versely, when a traffic flow sample is obtained, its moment ars made some improvements on the ARIMA model and
can be roughly inferred, but cannot be accurately inferred. proposed a series of variant models such as Kohonen-
Therefore, if the temporal information and the traffic flow ARIMA [21], subset ARIMA [22], Autoregressive Moving
are considered simultaneously, deep neural networks may be Average model (ARMA) [23], and seasonal ARIMA [24].
capable of learning higher-level temporal representations and In addition, Kalman Filter is another commonly used parame-
achieve better results. More importantly, the trend modeling ter model. It has been successfully applied in traffic flow pre-
study found that residual traffic flow as a kind of important diction and has exhibited a superior capability of conducting
trend information can reflect the time-variant fluctuation and online learning [25]. Although the above parametric models
the prediction accuracy can be improved when the residual improve the performance of traffic flow prediction, due to the
traffic flow is fed into the model [12]–[16]. We believe that nonlinearity and randomness of traffic flow, these relatively
the temporal information itself could also be used as a kind of simple and inflexible models cannot accurately capture the
trend information and should be considered in the prediction characteristics of traffic flow [10].
model because it is closely related to the time-variant fluctua- As a result, researchers have begun to focus on non-
tion of traffic flow. Existing traffic flow forecasting methods parametric models, such as nonparametric regression [26],
based on deep learning take only traffic flow as the input to SVM [27], Online Support Vector Machine (OL-SVM) [28],
neural networks. Thus, these methods might not capture the KNN [29], and Neural Networks (NN) [30]. Among the
temporal characteristics of traffic flow completely. Besides, above models, NN has the best performance and is considered
most existing studies have not paid adequate attention to the another popular model for traffic flow prediction due to its
processing of missing data and often use adjacent values to fill powerful ability in processing multidimensional data, flexi-
missing values, which hinders improvement of the prediction ble model structures, strong generalization ability as well as
accuracy [17]. adaptability [31]. However, due to the shallow structure of the
The main contributions of this paper are as follows: aforementioned models, it is still a great challenge to make
1) For the first time, we propose a Temporal information accurate traffic flow prediction.
enhancing Long Short-Term Memory neural networks Recently, with the resurgence of deep learning, neural
(T-LSTM) that combines recurrent time labels with networks with multilayer nonlinear structures have been
recurrent neural networks, which makes the best use widely used in pattern recognition, classification, and predic-
of the temporal features to improve the accuracy of tion [32]–[34]. Compared with traditional shallow structure,
short-term traffic flow prediction. deep neural networks can use distributed and hierarchical
2) The performance of our proposed model has been eval- feature representation to model the deep complex nonlinear
uated against a variety of comparison models, which relationship of traffic flow. In 2014, Huang et al. employed a
include Gated Recurrent Unit (GRU), SAE, DBN, DBN with multitask learning for traffic flow prediction [35].
LSTM, Support Vector Machine (SVM), K-nearest To achieve traffic flow forecasting for the next day, Li et al.
neighbor (KNN), Feed Forward Neural Networks proposed an advanced multi-objective particle swarm opti-
(FFNN), and Autoregressive Integrated Moving Aver- mization algorithm to optimize some parameters in DBN and
age model (ARIMA). The result is that our model enhance its multiple step prediction ability [36]. Lv et al.
has achieved the best performance. The reason for not proposed an SAE and demonstrated that the model is superior
including CNN related models is that the traffic flow to FFNN, Random Walk (RW), SVM, and Radial Basis Func-
data of a road network is currently not available to us so tion (RBF) [17]. These models all belong to fully-connected
that traffic flow prediction of only a single road section structure and there are no assumptions about the features
is considered in this paper. in the fully-connected architecture. Thus, it is difficult for
3) In view of the deficiencies in the processing of missing the fully-connected neural networks to capture representative
data, a new missing data repair technique is proposed features from a dataset with plentiful characteristics [37].
based on the proposed T-LSTM model to maximally In order to solve these issues, researchers proposed RNN
recover the characteristics of the raw data. To the best of and CNN based models, which can capture the nonlinearity
our knowledge, it is the first time that an LSTM-based and randomness of traffic flow more effectively and have
model is used for missing data repair. become basic models to forecast traffic flow. LSTM and
GRU, the variants of RNN, have superior capability for time
II. RELATED WORK series prediction with long temporal dependency and tempo-
Generally, traffic flow forecasting methods can be divided ral features learning ability. In 2015, Tian applied LSTM to
into two categories of parametric models and nonparametric short-term traffic flow prediction for the first time and proved
models [18]. Parametric models refer to the models where that the model is superior to SVM, FFNN, and SAE [11].
the structure is predetermined based on certain theoreti- Jia found that with the combination input of speed and
cal assumptions and the parameters can be computed with weather information, LSTM has better prediction accuracy
and August is deleted. So, there will be a total of 129,600 set to the same of 16. As shown in Fig. 4, the most popular
(720 × 30 × 6) samples. The data include speed, flow, date, Rectified Linear Unit (ReLu) is applied as the activation
and density. The scientific computing library Pandas is used function of the hidden layers and Sigmoid is for the output
to remove the duplicate and anomalous data. The missing layer.
data rate is 17% and the maximum number of consecutive
missing data is 414. Note that we just use historical average TABLE 1. Key Hyperparameters of T-LSTM.
value to replace the missing data in the traffic flow prediction
experiment.
For purpose of research and analysis, the Highway Capac-
ity Manual suggests to use 15 minutes as short-term predic-
tion interval [42]. However, the time interval of our original
data is 2 minutes. Therefore, in this paper, the data are aggre-
gated into the time intervals of 16 minutes. So, there are 90 Additionally, other hyperparameters have been deter-
(720×2÷16) pieces of data and corresponding time labels for mined, including Loss, Optimizer, Batch_size, and Epochs
each day. The data of the first five months are used for training as shown in Table 1. Adaptive Moment Estimation (Adam)
and the data of August are used for testing. Finally, the data is used to optimize the neural networks and it can calculate
are normalized to [0, 1] by the Min-Max Scaler normalization the adaptive learning rate for each parameter [43]. In prac-
method in the Scikit-learn library. tical applications, the Adam method works well. Compared
with other adaptive learning rate algorithms, it has faster
B. EXPERIMENT DESIGN convergence, more effective learning effects, and can correct
The proposed T-LSTM model is implemented using Tensor- problems in other optimization methods. In addition, Mean
Flow and Python language. The workstation used is config- Square Error (MSE) is the most commonly used regression
ured with an Intel i7-4790 3.6 GHz CPU, a 32 GB memory, loss function, which calculates the sum of the squares of the
and an NVIDA GTX 1080 Ti GPU. distance between the predicted value and the true value.
n
1 X |fi − pi |
MAPE = , (14)
n fi
i=1
where f is the observed value of the traffic flow while p is the
predicted value, and n represents the number of samples.
TABLE 3. Prediction Results After T-LSTM Repair. processed can notably improve the accuracy of short-term
traffic flow prediction. Currently, we only forecast traffic
flow of a section of the road without considering traf-
fic flow of the road network. In the future, we will fur-
ther this research into predicting traffic flow of the road
network and implement more comparative experiments as
well.
REFERENCES
[1] Y.-J. Duan, Y.-S. Lv, J. Zhang, X.-L. Zhao, and F.-Y. Wang, ‘‘Deep learning
for control: The state of the art and prospects,’’ Acta Automatica Sinica,
vol. 42, no. 5, pp. 643–654, May 2016.
[2] X. Zheng, W. Chen, P. Wang, D. Shen, S. Chen, X. Wang, Q. Zhang, and
L. Yang, ‘‘Big data for social transportation,’’ IEEE Trans. Intell. Transp.
Syst., vol. 17, no. 3, pp. 620–630, Mar. 2016.
[3] X. Wang, X. Zheng, Q. Zhang, T. Wang, and D. Shen, ‘‘Crowdsourcing in
ITS: The state of the work and the networking,’’ IEEE Trans. Intell. Transp.
Syst., vol. 17, no. 6, pp. 1596–1605, Jun. 2016.
[4] L. Li, Y. Lv, and F.-Y. Wang, ‘‘Traffic signal timing via deep reinforcement
learning,’’ IEEE/CAA J. Automatica Sinica, vol. 3, no. 3, pp. 247–254,
Jul. 2016.
[5] Y. Lv, Y. Chen, X. Zhang, Y. Duan, and N. L. Li, ‘‘Social media based trans-
portation research: The state of the work and the networking,’’ IEEE/CAA
J. Autom. Sinica, vol. 4, no. 1, pp. 19–26, Jan. 2017.
[6] R. Fu, Z. Zhang, and L. Li, ‘‘Using LSTM and GRU neural network meth-
ods for traffic flow prediction,’’ in Proc. 31st Youth Academic Annu. Conf.
Chin. Assoc. Automat. (YAC), Wuhan, China, Nov. 2016, pp. 324–328.
[7] P. Lingras, S. Sharma, and M. Zhong, ‘‘Prediction of recreational travel
using genetically designed regression and time-delay neural network
models,’’ IEEE/CAA J. Autom. Sinica, vol. 1805, no. 1, pp. 16–24,
Jan. 2002.
[8] S. Hochreiter and J. Schmidhuber, ‘‘Long short-term memory,’’ Neural
Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
[9] X. Ma, Z. Tao, Y. Wang, H. Yu, and Y. Wang, ‘‘Long short-term memory
neural network for traffic speed prediction using remote microwave sensor
data,’’ Transp. Res. C, Emerg. Technol., vol. 54, pp. 187–197, May 2015.
[10] Y. Tian and L. Pan, ‘‘Predicting short-term traffic flow by long short-
term memory recurrent neural network,’’ in Proc. IEEE Int. Conf. Smart
City/SocialCom/SustainCom (SmartCity), Chengdu, China, Dec. 2015,
pp. 153–158.
[11] Y. Tian, K. Zhang, J. Li, X. Lin, and B. Yang, ‘‘LSTM-based traffic flow
prediction with missing data,’’ Neurocomputing, vol. 318, pp. 297–305,
FIGURE 6. Missing data repair results of T-LSTM. The time interval is
2 minutes and data from 12:00 on August 14, 2014 to 10:00 on Nov. 2018.
August 15 were used as a test set. [12] B. L. Smith, B. M. Williams, and R. K. Oswald, ‘‘Comparison of paramet-
ric and nonparametric models for traffic flow forecasting,’’ Transp. Res. C,
Emerg. Technol., vol. 10, no. 4, pp. 303–321, Aug. 2002.
[13] C. Chen, Y. Wang, L. Li, J. Hu, and Z. Zhang, ‘‘The retrieval of intra-
day trend and its influence on traffic prediction,’’ Transp. Res. C, Emerg.
historical data and temporal information, and then use the Technol., vol. 22, pp. 103–118, Jun. 2012.
inferred data and temporal information to continue to infer [14] Z. Li, Y. Li, and L. Li, ‘‘A comparison of detrending models and multi-
the next missing data. regime models for traffic flow prediction,’’ IEEE Intell. Transp. Syst. Mag.,
vol. 6, no. 4, pp. 34–44, Oct. 2014.
[15] L. Li, X. Su, Y. Zhang, Y. Lin, and Z. Li, ‘‘Trend modeling for traffic time
V. CONCLUSION AND FUTURE WORK series analysis: An integrated study,’’ IEEE Trans. Intell. Transp. Syst.,
In this paper, the recurrent time labels and the recurrent vol. 16, no. 6, pp. 3430–3439, Dec. 2015.
[16] X. Dai, R. Fu, E. Zhao, Z. Zhang, Y. Lin, F.-Y. Wang, and L. Li, ‘‘Deeptrend
networks are combined and a T-LSTM model is proposed 2.0: A light-weighted multi-scale traffic prediction model using detrend-
for short-term traffic flow prediction. The addition of tem- ing,’’ Transp. Res. C, Emerg. Technol., vol. 103, pp. 142–157, Jun. 2019.
poral information as input to the T-LSTM is effective in [17] Y. Lv, Y. Duan, W. Kang, Z. Li, and F.-Y. Wang, ‘‘Traffic flow prediction
with big data: A deep learning approach,’’ IEEE Trans. Intell. Transp. Syst.,
improving the accuracy of short-term traffic flow prediction. vol. 16, no. 2, pp. 865–873, Apr. 2015.
In experiments, it is evaluated against GRU, SAE, DBN, [18] H. van Lint and C. van Hinsbergen, ‘‘Short-term traffic and travel time
LSTM, SVM, KNN, FFNN, and ARIMA (1, 0, 1). The prediction models,’’ Transp. Res. Circular, vol. 43, no. E-C168, pp. 22–41,
Nov. 2012.
results show that temporal information is crucial for traffic [19] E. I. Vlahogianni, M. G. Karlaftis, and J. C. Golias, ‘‘Short-term traffic
flow prediction and can effectively improve the prediction forecasting: Where we are and where we’re going,’’ Transp. Res. C, Emerg.
performance of the LSTM and GRU models. Furthermore, Technol., vol. 43, pp. 3–19, Jun. 2014.
[20] M. S. Ahmed and A. R. Cook, ‘‘Analysis of freeway traffic time-series data
for the first time, we propose a technique of missing data by using box-Jenkins techniques,’’ Transp. Res. Rec., vol. 773, no. 722,
repair based on T-LSTM and the results show that the data pp. 1–19, Jan. 1979.
[21] M. Van Der Voort, M. Dougherty, and S. Watson, ‘‘Combining LUNTIAN MOU (M’12) received the B.S. degree
Kohonen maps with ARIMA time series models to forecast traffic in computer application from the Minzu University
flow,’’ Transp. Res. C, Emerg. Technol., vol. 4, no. 5, pp. 307–318, of China, Beijing, China, in 1999, and the Ph.D.
Oct. 1996. degree in computer application from the Univer-
[22] S. Lee and D. B. Fambro, ‘‘Application of subset autoregressive integrated sity of Chinese Academy of Sciences, Beijing,
moving average model for short-term freeway traffic volume forecasting,’’ in 2012.
Transp. Res. Rec., vol. 1678, no. 1, pp. 179–188, Nov. 1999. From 2012 to 2014, he was a Postdoctoral
[23] K. Yiannis and P. Poulicos, ‘‘Forecasting traffic flow conditions in an urban
Research Fellow with the National Engineering
network: Comparison of multivariate and univariate approaches,’’ Transp.
Laboratory for Video Technology, Peking Univer-
Res. Rec., vol. 1857, no. 1, pp. 74–87, Jan. 2003.
[24] B. M. Williams and L. A. Hoel, ‘‘Modeling and forecasting vehicular sity. Since 2014, he has been an Assistant Pro-
traffic flow as a seasonal ARIMA process: Theoretical basis and empirical fessor with the College of Metropolitan Transportation, Beijing University
results,’’ J. Transp. Eng., vol. 129, no. 6, pp. 664–672, Nov. 2003. of Technology. He is the first coauthor of two international standards and
[25] J. Zhang, F.-Y. Wang, K. Wang, W.-H. Lin, X. Xu, and C. Chen, ‘‘Data- two China standards. His research interests include intelligent transportation,
driven intelligent transportation systems: A survey,’’ IEEE Trans. Intell. intelligent multimedia, machine learning, and artificial intelligence. He has
Transp. Syst., vol. 12, no. 4, pp. 1624–1639, Dec. 2011. undertaken or taken part in more than ten national technological research
[26] A. Rosenblad, ‘‘J. J. Faraway: Extending the linear model with R: General- projects such as the NSFC key programs and general programs.
ized linear, mixed effects and nonparametric regression models,’’ Comput. Dr. Mou was a recipient of a certificate of Outstanding Contributor for the
Statist., vol. 24, no. 2, pp. 369–370, May 2009. 15th Anniversary of AVS, in 2017, and two certificates of the Outstanding
[27] Y. Zhang and Y. Liu, ‘‘Traffic forecasting using least squares support vector Contributor for the IEEE 1857 Standards, in 2013 and 2014.
machines,’’ Transportmetrica, vol. 5, no. 3, pp. 193–213, Jul. 2009.
[28] M. Castro-Neto, Y.-S. Jeong, M.-K. Jeong, and L. D. Han, ‘‘Online-
SVR for short-term traffic flow prediction under typical and atypical
traffic conditions,’’ Expert Syst. Appl., vol. 36, no. 3, pp. 6164–6173,
2009.
[29] L. Zhang, Q. Liu, W. Yang, N. Wei, and D. Dong, ‘‘An improved K-nearest PENGFEI ZHAO was born in Shijiazhuang,
neighbor model for short-term traffic flow prediction,’’ Procedia-Social
Hebei, China, in 1995. He received the B.S.
Behav. Sci., vol. 96, pp. 653–662, Nov. 2013.
degree in information and computing sciences
[30] H. Chang, Y. Lee, B. Yoon, and S. Baek, ‘‘Dynamic near-term traffic
flow prediction: Systemoriented approach based on past experiences,’’ IET from Shijiazhuang University, in 2018. He is cur-
Intell. Transp. Syst., vol. 6, no. 3, pp. 292–305, Sep. 2012. rently pursuing the M.S. degree in pattern recog-
[31] M. G. Karlaftis and E. I. Vlahogianni, ‘‘Statistical methods versus neural nition and intelligent system with the Department
networks in transportation research: Differences, similarities and some of Information, Beijing University of Technology.
insights,’’ Transp. Res. C, Emerg. Technol., vol. 19, no. 3, pp. 387–399, His research interests include intelligent trans-
Jun. 2011. portation, machine learning, computer vision, and
[32] K. Mannepalli, P. N. Sastry, and M. Suman, ‘‘A novel adaptive fractional pattern recognition.
deep belief networks for speaker emotion recognition,’’ Alexandria Eng.
J., vol. 56, no. 4, pp. 485–497, Dec. 2017.
[33] L. Zhao, Y. Zhou, H. Lu, and H. Fujita, ‘‘Parallel computing method of
deep belief networks and its application to traffic flow prediction,’’ Knowl.-
Based Syst., vol. 163, pp. 972–987, Jan. 2019.
[34] H. Lee, P. T. Pham, Y. Largman, and A. Y. Ng, ‘‘Unsupervised feature HAITAO XIE was born in Zhangjiakou, Hebei,
learning for audio classification using convolutional deep belief networks,’’ China, in 1993. She received the B.S. degree in
in Proc. NIPS, Vancouver, BC, Canada, 2009, pp. 1096–1104. network engineering from the Hebei Normal Uni-
[35] W. Huang, G. Song, H. Hong, and K. Xie, ‘‘Deep architecture for versity of Science and Technology, in 2017. She
traffic flow prediction: Deep belief networks with multitask learn-
is currently pursuing the M.S. degree in pattern
ing,’’ IEEE Trans. Intell. Transp. Syst., vol. 15, no. 5, pp. 2191–2201,
recognition and intelligent system with the College
Oct. 2014.
[36] L. Li, L. Qin, X. Qu, J. Zhang, Y. Wang, and B. Ran, ‘‘Day-ahead
of Metropolitan Transportation, Beijing University
traffic flow forecasting based on a deep belief network optimized by the of Technology. Her research interests include intel-
multi-objective particle swarm algorithm,’’ Knowl.-Based Syst., vol. 172, ligent transportation, machine learning, computer
pp. 1–14, May 2019. vision, and pattern recognition.
[37] Y. Wu, H. Tan, L. Qin, B. Ran, and Z. Jiang, ‘‘A hybrid deep learning based
traffic flow prediction method and its understanding,’’ Transp. Res. C,
Emerg. Technol., vol. 90, pp. 166–180, May 2018.
[38] Y. Jia, J. Wu, M. Ben-Akiva, R. Seshadri, and Y. Du, ‘‘Rainfall-integrated
traffic speed prediction using deep learning method,’’ IET Intell. Transport
Syst., vol. 11, no. 9, pp. 531–536, Nov. 2017. YANYAN CHEN received the Ph.D. degree in civil
[39] X. Ma, Z. Dai, Z. He, J. Ma, Y. Wang, and Y. Wang, ‘‘Learning traffic engineering from the Harbin Institute of Technol-
as images: A deep convolutional neural network for large-scale trans- ogy, Harbin, China, in 1997.
portation network speed prediction,’’ Sensors, vol. 17, no. 4, p. 818, Since 1999, she has been with the Beijing Uni-
Apr. 2017. versity of Technology. In 2004, she was with the
[40] Z. Duan, Y. Yang, K. Zhang, Y. Ni, and S. Bajgain, ‘‘Improved deep hybrid London Imperial College, as a Visiting Professor.
networks for urban traffic flow prediction using trajectory data,’’ IEEE She was a Postdoctoral Research Fellow for the
Access, vol. 6, pp. 31820–31827, 2018. next two years. She is currently the Dean of the
[41] Z.-T. Duan, K. Zhang, Y. Yang, Y.-Y. Ni, and S. Bajgain, ‘‘Taxi
College of Metropolitan Transportation, Beijing
demand prediction based on CNN-LSTM-ResNet hybrid depth learning
University of Technology. She has undertaken
model,’’ J. Transp. Syst. Eng. Inf. Technol., vol. 18, no. 4, pp. 215–223,
Aug. 2018. nearly 30 research projects granted by national and provincial science funds,
[42] A. D. May, N. Rouphail, L. Bloomberg, and F. Hall, ‘‘Freeway systems published more than 100 articles in related journals and six academic books.
research beyond highway capacity manual 2000,’’ Transp. Res. Rec., Her research interests include urban transportation planning and manage-
vol. 1776, no. 1, pp. 1–9, Jan. 2001. ment, big data, and ITS. She is the Co-Chair of the Urban Transportation
[43] D. P. Kingma and J. L. Ba, ‘‘Adam: A method for stochastic optimization,’’ Committee in China Highway Society.
in Proc. ICLR, San Diego, CA, USA, 2015, pp. 1–13.