T-GCN A Temporal Graph Convolutional Network For Traffic Prediction
T-GCN A Temporal Graph Convolutional Network For Traffic Prediction
9, SEPTEMBER 2020
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY DURGAPUR. Downloaded on February 19,2024 at 08:52:32 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.: T-GCN FOR TRAFFIC PREDICTION 3849
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY DURGAPUR. Downloaded on February 19,2024 at 08:52:32 UTC from IEEE Xplore. Restrictions apply.
3850 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 21, NO. 9, SEPTEMBER 2020
better result on the real-world traffic dataset. The Kalman outflow of crowds in each region of a city. Wu and Tan [15]
filter model predicts future traffic information based on the designed a feature fusion architecture for short-term prediction
traffic state of the previous moment and the current moment. by combining CNN and LSTM. A 1-dimensional CNN was
In 1984, Okutani and Stephanedes [7] used the Kalman filter to used to capture spatial dependence and two LSTMs were
establish the traffic flow state prediction model. Subsequently, used to mine the short-term variability and periodicity of
some studies [33], [34] used the Kalman filter to realize traffic traffic flow. Cao et al. [16] proposed an end-to-end model
prediction tasks. called ITRCN, which converted interactive network traffic into
The traditional parametric models have simple algorithms images and used CNN to capture interactive functions of traf-
and convenient calculation. However, these models depend fic, used GRU to extract temporal features, and proved that the
on the assumption that the system model is static, cannot prediction error of this model is 14.3% and 13.0% higher than
reflect the nonlinearities and uncertainties of traffic data, that of GRU and CNN, respectively. Ke et al. [45] proposed
and cannot overcome the interference of random events such a new deep learning method called the fusion convolutional
as traffic accidents. The nonparametric models solve these long short-term memory network (FCL-Net), taking into spa-
problems well and only require enough historical data to tial dependence, temporal dependence, and exogenous depen-
learn the statistical regularity from traffic data automatically. dence account for short-term passenger demand forecasting.
The common nonparametric models include: the k-nearest Yu et al. [46] combined Deep Convolutional Neural Network
neighbor model [10], the support vector regression model (DCNN) and LSTM to propose a model named SRCN; in this
[8], [9], [35], the Fuzzy Logic model [36], the Bayesian model, DCNN was used to capture spatial dependence, while
network model [11], the neural network model, and so on. LSTM was used to capture temporal dynamics. SRCN has
In recent years, with the rapid development of deep learn- been proved effective and superior via experiments on Beijing
ing [37]–[39], the deep neural network models have received traffic network data.
attention because they can capture the dynamic characteristics Although the above methods introduced the CNN to model
of traffic data well and achieve the best results at present. spatial dependence and made great progress in traffic fore-
According to whether or not spatial dependence is considered, casting tasks, the CNN is essentially suitable for Euclidean
models can be divided into two categories. Some methods space, such as images, regular grids, etc., and has limitations
consider temporal dependence only, e.g., Rilett et al. [40] used on traffic networks with complex topological structures, and
Feed Forward NN to implement traffic flow prediction tasks. thus cannot essentially characterize the spatial dependence.
Huang et al. [12] proposed a network architecture consisting Therefore, this type of method also has certain defects.
of a deep belief network (DBN) and a regression model and In recent years, with the development of the graph convo-
verified that the network could capture random features from lutional network model [47], which can be used to capture
traffic data on multiple datasets and this model improved pre- structural features of graph network, provides a good solution
diction accuracy in traffic forecasting. Qi et al. [41] presented for the above problem. Li et al. [48] proposed a DCRNN
the importance of congestion, proposed the unified definition model, which captured the spatial features through random
of traffic congestion, and used the locality constraint distance walks on graphs, and the temporal features through encoder-
metric learning method for traffic congestion detection tasks decoder architecture.
on the road. Subsequently, Qi et al. [42] proposed a robust Based on this background, in this research we propose a
hierarchical deep learning method for deep semantic feature new neural network approach that can capture the complex
extraction, which was used to better express the attributes of temporal and spatial features from traffic data, and can then
traffic congestion. In addition, since the recurrent neural net- be used for traffic forecasting tasks based on an urban road
work (RNN) and its variants long short-term memory (LSTM) network.
and the gated recurrent unit (GRU) can effectively use the self-
circulation mechanism, they can learn temporal dependence III. M ETHODOLOGY
well and achieve better prediction results [13], [43].
Models mentioned above only take the temporal features A. Problem Definition
into account but ignore the spatial dependence, so that the In this research, the goal of the traffic forecasting is to
change of traffic data is not constrained by the urban road predict the traffic information in a certain period of time
network and thus they cannot accurately predict the traffic based on the historical traffic information on the roads. In our
information on the road. Making full use of the spatial and method, the traffic information is a general concept which can
temporal dependences is the key to solve traffic forecasting be traffic speed, traffic flow, or traffic density. Without loss
problems. To better characterize spatial features, many studies of generality, we use traffic speed as an example of traffic
had made improvements in this area. Lv et al. [44] proposed information in experiment section.
a SAE model to capture the spatio-temporal features from Definition 1: road network G. We use an unweighted graph
traffic data and realize short-term traffic flow prediction. G = (V, E) to describe the topological structure of the road
Zhang et al. [14] proposed a deep learning model called network, and we treat each road as a node, where V is a set
ST-ResNet, which designed residual convolutional networks of road nodes, V = {v 1 , v 2 , · · · , v N }, N is the number of the
for each attribute based on the temporal closeness, period, and nodes, and E is a set of edges. The adjacency matrix A is
trend of crowd flows, and then three networks and external used to represent the connection between roads, A ∈ R N×N .
factors were dynamically aggregated to predict the inflow and The adjacency matrix contains only elements of 0 and 1.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY DURGAPUR. Downloaded on February 19,2024 at 08:52:32 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.: T-GCN FOR TRAFFIC PREDICTION 3851
Fig. 4. Assuming that node 1 is a central road. (a) The blue nodes indicate
the roads connected to the central road. (b) We obtain the spatial features by
obtaining the topological relationship among the road 1 and its surrounding
Fig. 3. Overview. We take the historical traffic information as input and roads.
obtain the finally prediction result through the Graph Convolution Network
and the Gated Recurrent Unit model.
thus cannot accurately capture spatial dependence. Recently,
generalizing the CNN to the graph convolutional network
The element is 0 if there is no link between two roads and (GCN), which can handle arbitrary graph-structured data, has
1 denotes there is a link. received widespread attention. The GCN model has been
Definition 2: feature matrix X N×P . We regard the traffic successfully used in many applications, including document
information on the road network as the attribute features of classification [17], unsupervised learning [47] and image
the node in the network, expressed as X ∈ R N×P , where P classification [49]. Given an adjacency matrix A and the
represents the number of node attribute features (the length of feature matrix X, the GCN model constructs a filter in the
the historical time series) and X t ∈ R N×i is used to represent Fourier domain. The filter, acting on the nodes of a graph,
the speed on each road at time i. Again, the node attribute captures spatial features between the nodes by its first-order
features can be any traffic information such as traffic speed, neighborhood, then the GCN model can be built by stacking
traffic flow, and traffic density. multiple convolutional layers, which can be expressed as:
Thus, the problem of spatio-temporal traffic forecasting can
− 2 A
D− 2 H (l) θ (l)
1 1
be considered as learning the mapping function f on the H (l+1) = σ D (2)
premise of road network topology G and feature matrix X and
then calculating the traffic information in the next T moments, where A = A + I N is the matrix with added self-connections,
as shown in equation 1: is the degree matrix, D
I N is the identity matrix, D = j A i j ,
(l) (l)
H is the output of l layer, θ contains the parameters of that
X t +1 , · · · , X t +T = f (G; (X t −n , · · · , X t −1 , X t )) (1) layer, and σ (·) represents the sigmoid function for a nonlinear
where n is the length of historical time series and T is the model.
length of the time series needed to be predicted. In this research, the 2-layer GCN model [47] is chosen to
obtain spatial dependence, which can be expressed as:
B. Overview ReLU A
X W0 W1
f (X, A) = σ A (3)
In this section, we describe how to use the T-GCN model
to realize the traffic forecasting task based on the urban roads. where A = D D
− 12 A − 12 denotes pre-processing step,
Specifically, the T-GCN model consists of two parts: the graph W0 ∈ R P×H represents the weight matrix from input to hidden
convolutional network and the gated recurrent unit. As shown layer, P is the length of feature matrix, and H is the number
in Figure 3, we first use the historical n time series data as of hidden unit, W1 ∈ R H ×T represents the weight matrix
input and the graph convolution network is used to capture from hidden to output layer. f (X, A) ∈ R N×T represents the
topological structure of urban road network for obtaining the output with the prediction length T , and ReLU (), standing for
spatial features. Second, the obtained time series with spatial REctified Linear Unit, which is a frequently used activation
features are input into the gated recurrent unit model and layer in modern deep neural networks.
the dynamic change is obtained by information transmission In summary, we use the GCN model [47] to learn spatial
between the units, to capture temporal features. Finally, we get features from traffic data. As shown in Figure 4, assuming
results through the fully connected layer. that node 1 is a central road, the GCN model can obtain
the topological relationship between the central road and its
C. Methodology surrounding roads, encode the topological structure of the road
1) Spatial Dependence Modeling: Acquiring the complex network and the attributes on the roads, and then obtain spatial
spatial dependence is a key problem in traffic forecasting. The dependence.
traditional convolutional neural network (CNN) can obtain 2) Temporal Dependence Modeling: Acquiring the temporal
local spatial features, but it can only be used in Euclidean dependence is another key problem in traffic forecasting.
space, such as images, a regular grid, etc. An urban road At present, the most widely used neural network model for
network is in the form of graph rather than two-dimensional processing sequence data is the recurrent neural network
grid, which means the CNN model cannot reflect the com- (RNN). However, due to defects such as gradient disappear-
plex topological structure of the urban road network and ance and gradient explosion, the traditional recurrent neural
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY DURGAPUR. Downloaded on February 19,2024 at 08:52:32 UTC from IEEE Xplore. Restrictions apply.
3852 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 21, NO. 9, SEPTEMBER 2020
Fig. 5. The architecture of the Gated Recurrent Unit model. In summary, the T-GCN model can deal with the complex
spatial dependence and temporal dynamics. On one hand,
the graph convolutional network is used to capture the topo-
logical structure of the urban road network for obtaining the
spatial dependence. On the other hand, the gated recurrent unit
is used to capture the dynamic variation of traffic information
on the roads for obtaining the temporal dependence and
eventually for realizing traffic prediction tasks.
4) Loss Function: In the training process, the goal is to
Fig. 6. The overall process of spatio-temporal prediction. The right part minimize the error between the real traffic speed on the roads
represents the specific architecture of a T-GCN unit, and GC represents graph and the predicted value. We use Yt and Yt to denote the
convolution.
real traffic speed and the predicted speed, respectively. The
loss function of the T-GCN model is shown in equation 8.
network has limitations for long-term prediction [50]. The The first term is used to minimize the error between the
LSTM model [51] and the GRU model [52] are variants real traffic speed and the prediction. The second term L reg
of the recurrent neural network and have been proven to is the L2 regularization term that helps to avoid an overfitting
solve the above problems. The basic principles of the LSTM problem and λ is a hyperparameter.
and GRU are roughly the same [53]. They all use gated loss = Yt − Yt +λL reg (8)
mechanism to memorize as much long-term information as
possible and are equally effective for various tasks. However,
IV. E XPERIMENTS
due to its complex structure, LSTM has a longer training time
while the GRU model has a relatively simple structure, fewer A. Data Description
parameters, and faster training ability. Therefore, we chose the In this section, we evaluate the prediction performance of
GRU model to obtain temporal dependence from the traffic the T-GCN model on two real-world datasets: SZ-taxi dataset
data. As shown in Figure 5, h t −1 denotes the hidden state and Los-loop dataset. Since these two datesets are all related
at time t-1; x t denotes the traffic information at time t; rt is to traffic speed. Without loss of generality, we use traffic speed
the reset gate, which is used to control the degree of ignoring as traffic information in the experiment section.
the status information at the previous moment; u t is the update (1) SZ-taxi. This dataset consists of the taxi trajectory of
gate, which is used to control the degree of to which the status Shenzhen from Jan. 1 to Jan. 31, 2015. We selected 156 major
information at the previous time is brought into the current roads of Luohu District as the study area. The experimental
status; ct is the memory content stored at time t; and h t is data mainly includes two parts. One is an 156*156 adjacency
output state at time t. The GRU obtains the traffic information matrix, which describes the spatial relationship between roads.
at time t by taking the hidden status at time t-1 and the Each row represents one road and the values in the matrix
current traffic information as inputs. While capturing the traffic represent the connectivity between the roads. Another one is
information at the current moment, the model still retains the a feature matrix, which describes the speed changes over time
changing trend of historical traffic information and has the on each road. Each row represents one road; each column
ability to capture temporal dependence. is the traffic speed on the roads in different time periods.
3) Temporal Graph Convolutional Network: To capture the We aggregated the traffic speed on each road every 15 minutes.
spatial and temporal dependences from traffic data at the same (2) Los-loop. This dataset was collected from the high-
time, we propose a temporal graph convolutional network way of Los Angeles County in real time by loop detectors.
model (T-GCN) based on the graph convolutional network and We selected 207 sensors and its traffic speed from Mar. 1 to
gated recurrent unit. As shown in Figure 6, the left side is Mar. 7, 2012. We aggregated the traffic speed every 5 minutes.
the process of spatio-temporal traffic prediction, the right side Similarity, the data consists of an adjacency matrix and a
shows the specific structure of a T-GCN cell, h t −1 denotes the feature matrix. The adjacency matrix is calculated by the
output at time t-1, GC is graph convolution process, and u t , distance between sensors in the traffic networks. Since the Los-
rt are update gate and reset gate at time t, and h t denotes the loop dataset contained some missing data, we used the linear
output at time t. interpolation method to fill missing values.
The specific calculation process is shown below. f (A, X t ) In the experiments, the input data is normalized to the
represents the graph convolution process and is defined interval [0, 1]. In addition, 80% of the data is used as the
in equation 3. represents the point-wise multiplication. training set and the remaining 20% is used as the test set.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY DURGAPUR. Downloaded on February 19,2024 at 08:52:32 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.: T-GCN FOR TRAFFIC PREDICTION 3853
Fig. 7. Comparison of predicted performance under different hidden units in the training and test set based on SZ-taxi dataset. (a) Changes in RMSE and
MAE in the training set. (b) Changes in Accuracy, R 2 and var in the training set. (c) Changes in RMSE and MAE in the test set. (d) Changes in Accuracy,
R 2 and var based in the test set.
We predict the traffic speed of the next 15 minutes, 30 minutes, (1) Hyperparameters
45 minutes and 60 minutes. The hyperparameters of the T-GCN model mainly include:
learning rate, batch size, training epoch, and the number of
B. Evaluation Metrics hidden unit. In the experiment, we manually adjust and set the
We use five metrics to evaluate the prediction performance learning rate to 0.001, the batch size to 32, and the training
of the T-GCN model: epoch to 5000.
(1) Root Mean Squared Error (RMSE): The number of hidden units is a very important parameter
of the T-GCN model, as different number of hidden units
may greatly affect the prediction precision. To choose the best
1 j j 2
M N
RM S E = (yi − yi ) (9) value, we experiment with different hidden units and select
MN the optimal value by comparing the predictions.
j =1 i=1
In our experiment, for the SZ-taxi dataset, we choose the
(2) Mean Absolute Error (MAE):
number of hidden units from [8, 16, 32, 64, 100, 128] and
1 j j
M N analyze the change of prediction precision. As shown in 7,
M AE = yi − yi (10) the horizontal axis represents the number of hidden units and
MN
j =1 i=1 the vertical axis represents the change of different metrics.
(3) Accuracy: Figure 7(a) shows the results of RMSE and MAE for different
hidden units in the training set. It can be seen that the error
Y −Y F
Accur acy = 1 − (11) is the smallest when the number is 100. Figure 7(b) shows
Y F the variation of Accuracy, R 2 , and var for different hidden
(4) Coefficient of Determination (R 2 ): units. Figure 7(c) and 7(d) show the results in the test set.
M N Similarly, the results reach a maximum when the number
j j 2
j =1 i=1 (yi − yi ) is 100. In summary, the prediction results are better when
R = 1 − M N
2
j
(12) the number is set to 100. When increasing the number of
i=1 (yi − Ȳ )
2
j =1 hidden units, the prediction precision firstly increases and then
(5) Explained Variance Score (var ): decreases. This is mainly because when the hidden unit is
larger than a certain degree, the model complexity and the
V ar Y − Y
var = 1 − (13) computational difficulty are greatly increased and as a result,
V ar {Y } overfitting on the training data occurs. Therefore, we set the
j j number of hidden units to 100 in our experiments on the
where yi and yi represent the real traffic information and
predicted one of the jth time sample in the ith road. M is the SZ-taxi dataset.
number of time samples; N is the number of roads; Y and In the same way, the results of Los-loop are shown in
represent the set of y j and yj respectively, and Ȳ is the
Y
Figure 8(a), 8(b), 8(c), 8(d). it can be seen that when the
i i number of hidden units is 64, the prediction precision is the
average of Y . highest, and the prediction error is the lowest.
Specifically, RMSE and MAE are used to measure the
(2) Training
prediction error: the smaller the value is, the better the
For input layer, the training dataset (80% of the overall data)
prediction effect is. Accuracy is used to detect the prediction is taken as input in the training process and the remaining
precision: the lager the value is, the better the prediction
data is used as input in the test process. The T-GCN model is
effect is. R 2 and var calculate the correlation coefficient,
trained using the Adam optimizer.
which measures the ability of the predicted result to represent
the actual data: the larger the value is, the better the prediction
effect is. D. Experimental Results
We compare the performance of the T-GCN model with the
C. Choosing Model Parameters following baseline methods:
In this section, we choose the relevant parameters of the (1) History Average model (HA) [2], which uses the average
T-GCN model. traffic information in the historical periods as the prediction.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY DURGAPUR. Downloaded on February 19,2024 at 08:52:32 UTC from IEEE Xplore. Restrictions apply.
3854 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 21, NO. 9, SEPTEMBER 2020
Fig. 8. Comparison of predicted performance under different hidden units in the training and test set based on Los-loop dataset. (a) Changes in RMSE and
MAE in the training set. (b) Changes in Accuracy, R 2 and var in the training set. (c) Changes in RMSE and MAE in the test set. (d) Changes in Accuracy,
R 2 and var in the test set.
TABLE I
T HE P REDICTION R ESULTS OF THE T-GCN M ODEL AND O THER BASELINE M ETHODS ON SZ-TAXI AND L OS -L OOP D ATASETS
(2) Autoregressive Integrated Moving Average model model, which emphasize the importance of modeling the tem-
(ARIMA) [5], which fits a parametric model on the observed poral features, generally have better prediction precision than
time series to predict future traffic data. other baselines, such as the HA model, the ARIMA model and
(3) Support Vector Regression model (SVR) [35], which the SVR model. For example, for the 15-min traffic forecasting
uses historical data to train the model and obtains the rela- task, RMSE errors of the T-GCN model and the GRU model
tionship between the input and output, and then predicts the are reduced by approximately 8.58% and 6.88% compared
future traffic data by the trained model. The kernel function with the HA model, and the accuracies are approximately
we use in this model is a linear kernel. 4.15% and 3.44% higher than that of HA. The RMSE errors
(4) Graph Convolutional Network model (GCN) [47]: see of the T-GCN model and the GRU model are approximately
3.2.1 for details. 45.77% and 32.97% lower than that of the ARIMA model and
(5) Gated Recurrent Unit model (GRU) [52]: see 3.2.2 for the accuracies of these two models are improved by 63.54%
details. and 62.42%. Compared with the SVR model, the RMSE
Table I shows the performance of T-GCN model and other errors of the T-GCN and the GRU models are reduced by
baseline methods for 15 minutes, 30 minutes, 45 minutes 5.28% and 0.67%, and approximately 2.63% and 1.93% higher
and 60 minutes forecasting tasks on SZ-taxi and Los-loop than that of the SVR model. This is mainly due to methods
datasets. ∗ means that the values are small enough to be such as the HA, ARIMA, and SVR that find it difficult to
negligible, indicating that the model’s prediction effect is handle complex, nonstationary time series data. The lower
poor. It can be seen that the T-GCN model obtains the best prediction effect of the GCN model is because the GCN
prediction performance under almost all evaluation metrics for considers the spatial features only and ignores that the traffic
all prediction horizons, proving the effectiveness for spatio- data is typical time series data. In addition, as a mature
temporal traffic forecasting tasks. traffic forecasting method, the ARIMA’s prediction precision
(1) High prediction precision. We can find that the neural is relatively lower than the HA, mainly because the ARIMA
network-based methods, including the T-GCN model, the GRU has difficulty dealing with long-term and nonstationary data
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY DURGAPUR. Downloaded on February 19,2024 at 08:52:32 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.: T-GCN FOR TRAFFIC PREDICTION 3855
Fig. 9. Spatio-temporal prediction capability. (a) The RMSE of the T-GCN Fig. 10. Long-term prediction ability. (a) Under different prediction horizons,
model lower than the GCN model, which considers spatial features only, the change in RMSE and Accuracy of the T-GCN model. (b) Under different
indicating the effectiveness of the TGCN to capture spatial features. (b) The prediction horizons, the RMSE errors of the T-GCN model and all baseline
RMSE of the T-GCN model lower than the GRU model, which considers methods.
temporal features only, indicating the effectiveness of the T-GCN to capture
temporal features.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY DURGAPUR. Downloaded on February 19,2024 at 08:52:32 UTC from IEEE Xplore. Restrictions apply.
3856 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 21, NO. 9, SEPTEMBER 2020
V. C ONCLUSION
This research develops a novel neural network based
approach for traffic forecasting called T-GCN, which combines
the GCN and the GRU. We use a graph network to model the
Fig. 13. The visualization results for prediction horizon of 30 minutes. urban road network in which the nodes on the graph repre-
sent roads, the edges represent the connection relationships
between roads, and the traffic information on the roads is
described as the attribute of the nodes on the graph. On one
hand, the GCN is used to capture the spatial topological
structure of the graph for obtaining the spatial dependence; on
the other hand, the GRU model is introduced to capture the
dynamic change of node attribute for obtaining the temporal
dependence. Eventually the T-GCN model is used to tackle
spatio-temporal traffic forecasting tasks. When evaluated on
two real-world traffic datasets and compared with the HA
model, the ARIMA model, the SVR model, the GCN model,
and the GRU model, the T-GCN model achieves a better
performance under different prediction horizons. In addition,
the perturbation analysis illustrates the robustness of our
Fig. 14. The visualization results for prediction horizon of 45 minutes. approach. In summary, the T-GCN model successfully captures
the spatial and temporal features from traffic data so that can
be applied to other spatio-temporal tasks.
(1) The T-GCN model predicts poorly at local minima/
maxima. We speculate that the main cause is that the GCN R EFERENCES
model defines a smooth filter in the Fourier domain and
[1] H. Huang, “Dynamic modeling of urban transportation networks and
captures spatial feature by constantly moving the filter. This analysis of its travel behaviors,” Chin. J. Manage., vol. 2, no. 1,
process leads to a small change in the overall prediction pp. 18–22, Jan. 2005.
results, which makes peaks smoother. [2] J. Liu and W. Guan, “A summary of traffic flow forecasting methods,”
J. Highway Transp. Res. Develop., vol. 21, no. 3, pp. 82–85, Mar. 2004.
(2) There is a certain error between the real traffic infor- [3] J. Yuan and B. Fan, “Synthesis of short–term traffic flow forecasting
mation and the prediction results caused by “zero taxi value”. research progress,” Urban Transp. China, vol. 10, no. 6, pp. 73–79,
Zero taxi value is the phenomenon which traffic feature matrix, Jun. 2012.
[4] C. J. Dong, C. F. Shao, Z. Cheng-Xiang, and M. Meng, “Spatial and
whose true value is not zero, will be set to zero because of no temporal characteristics for congested traffic on urban expressway,”
taxis on the road. J. Beijing Univ. Technol., vol. 38, no. 8, pp. 128–132, 2012.
(3) Regardless of the prediction horizons, the T-GCN model [5] M. S. Ahmed and A. R. Cook, “Analysis of freeway traffic time-series
data by using Box-Jenkins techniques,” Transp. Res. Rec., no. 722,
always achieves better results. The T-GCN model can capture pp. 1–9, 1979.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY DURGAPUR. Downloaded on February 19,2024 at 08:52:32 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.: T-GCN FOR TRAFFIC PREDICTION 3857
[6] M. M. Hamed, H. R. Al-Masaeid, and Z. M. B. Said, “Short-term [30] S. Lee and D. Fambro, “Application of subset autoregressive inte-
prediction of traffic volume in Urban arterials,” J. Transp. Eng., vol. 121, grated moving average model for short-term freeway traffic volume
no. 3, pp. 249–254, 1995. forecasting,” Transp. Res. Rec., J. Transp. Res. Board, vol. 1678, no. 1,
[7] I. Okutani and Y. J. Stephanedes, “Dynamic prediction of traffic volume pp. 179–188, 1999.
through Kalman filtering theory,” Transp. Res. B, Methodol., vol. 18, [31] B. M. Williams and L. A. Hoel, “Modeling and forecasting vehicular
no. 1, pp. 1–11, 1984. traffic flow as a seasonal ARIMA process: Theoretical basis and empir-
[8] C.-H. Wu, J.-M. Ho, and D. T. Lee, “Travel-time prediction with support ical results,” J. Transp. Eng., vol. 129, no. 6, pp. 664–672, Nov. 2003.
vector regression,” IEEE Trans. Intell. Transp. Syst., vol. 5, no. 4, [32] M. Lippi, M. Bertini, and P. Frasconi, “Short-term traffic flow forecast-
pp. 276–281, Dec. 2004. ing: An experimental comparison of time-series analysis and supervised
[9] Z. S. Yao, C. F. Shao, and Y. L. Gao, “Research on methods of short- learning,” IEEE Trans. Intell. Transp. Syst., vol. 14, no. 2, pp. 871–882,
term traffic forecasting based on support vector regression,” J. Beijing Jun. 2013.
Jiaotong Univ., vol. 30, no. 3, pp. 19–22, 2006. [33] C. P. I. J. van Hinsbergen, T. Schreiter, F. S. Zuurbier, J. W. C. van Lint,
[10] X. L. Zhang, H. E. Guo-Guang, and L. U. Hua-Pu, “Short-term and H. J. van Zuylen, “Localized extended Kalman filter for scalable
traffic flow forecasting based on K-nearest neighbors non-parametric real-time traffic state estimation,” IEEE Trans. Intell. Transp. Syst.,
regression,” J. Syst. Eng., vol. 24, no. 2, pp. 178–183, Feb. 2009. vol. 13, no. 1, pp. 385–394, Mar. 2012.
[11] S. Sun, C. Zhang, and G. Yu, “A Bayesian network approach to traffic [34] L. L. Ojeda, A. Y. Kibangou, and C. C. de Wit, “Adaptive Kalman
flow forecasting,” IEEE Trans. Intell. Transp. Syst., vol. 7, no. 1, filtering for multi-step ahead traffic flow prediction,” in Proc. Amer.
pp. 124–132, Mar. 2006. Control Conf., Washington, DC, USA, Jun. 2013, pp. 4724–4729.
[12] W. Huang, G. Song, H. Hong, and K. Xie, “Deep architecture for traffic [35] A. J. Smola and B. Schölkopf, “A tutorial on support vector regression,”
flow prediction: Deep belief networks with multitask learning,” IEEE Statist. Comput., vol. 14, no. 3, pp. 199–222, Aug. 2004.
Trans. Intell. Transp. Syst., vol. 15, no. 5, pp. 2191–2201, Oct. 2014. [36] H. Yin, S. C. Wong, J. Xu, and C. K. Wong, “Urban traffic flow
[13] R. Fu, Z. Zhang, and L. Li, “Using LSTM and GRU neural network prediction using a fuzzy-neural approach,” Transp. Res. C, Emerg.
methods for traffic flow prediction,” in Proc. 31st Youth Academic Technol., vol. 10, no. 2, pp. 85–98, 2002.
Annu. Conf. Chin. Assoc. Automat. (YAC), Wuhan, China, Nov. 2016, [37] D. Silver et al., “Mastering the game of Go with deep neural networks
pp. 324–328. and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016.
[14] J. Zhang, Y. Zheng, and D. Qi, “Deep spatio-temporal residual networks [38] D. Silver et al., “Mastering the game of go without human knowledge,”
for citywide crowd flows prediction,” in Proc. 31st AAAI Conf. Artif. Nature, vol. 550, no. 7676, pp. 354–359, 2017.
Intell., San Francisco, CA, USA, Feb. 2017, pp. 1655–1661. [39] M. Moravčik, M. Schmid, N. Burch, V. Lisý, D. Morrill, N. Bard,
[15] Y. Wu and H. Tan, “Short-term traffic flow forecasting with spatial- T. Davis, K. Waugh, M. Johanson, and M. Bowling, “DeepStack:
temporal correlation in a hybrid deep learning framework,” Dec. 2016, Expert-level artificial intelligence in heads-up no-limit poker,” Science,
arXiv:1612.01022. [Online]. Available: https://arxiv.org/abs/1612.01022 vol. 356, no. 6337, pp. 508–513, May 2017.
[16] X. Cao, Y. Zhong, Y. Zhou, J. Wang, C. Zhu, and W. Zhang, “Interactive [40] D. Park and L. R. Rilett, “Forecasting freeway link travel times
temporal recurrent convolution network for traffic prediction in data with a multilayer feedforward neural network,” Comput.-Aided Civil
centers,” IEEE Access, vol. 6, pp. 5276–5289, 2018. Infrastruct. Eng., vol. 14, no. 5, pp. 357–367, 1999.
[17] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural [41] Q. Wang, J. Wan, and Y. Yuan, “Locality constraint distance metric
networks on graphs with fast localized spectral filtering,” in Proc. Adv. learning for traffic congestion detection,” Pattern Recognit., vol. 75,
Neural Inf. Process. Syst., Jun. 2016, pp. 3844–3852. pp. 272–281, Mar. 2018.
[18] X.-Y. Xu, J. Liu, H.-Y. Li, and J.-Q. Hu, “Analysis of subway station [42] W. Qi, W. Jia, and L. Xuelong, “Robust hierarchical deep learning
capacity with the use of queueing theory,” Transp. Res. C, Emerg. for vehicular management,” IEEE Trans. Veh. Technol., vol. 68, no. 5,
Technol., vol. 38, no. 1, pp. 28–43, Jan. 2014. pp. 4148–4156, May 2019.
[19] P. Wei, Y. Cao, and D. Sun, “Total unimodularity and decomposition [43] J. W. C. Van Lint, S. P. Hoogendoorn, and H. J. van Zuylen, “Freeway
method for large-scale air traffic cell transmission model,” Transp. travel time prediction with state-space neural networks: Modeling state-
Res. B, Methodol., vol. 53, pp. 1–16, Jul. 2013. space dynamics with recurrent neural networks,” Transp. Res. Rec.,
[20] W. Qi, L. I. Li, H. U. Jianming, and B. Zou, “Traffic velocity distrib- vol. 1811, no. 1, pp. 30–39, Jan. 2002.
utions for different spacings,” J. Tsinghua Univ. Sci. Technol., vol. 51, [44] Y. Lv, Y. Duan, W. Kang, Z. Li, and F.-Y. Wang, “Traffic flow prediction
no. 3, pp. 309–312, Mar. 2011. with big data: A deep learning approach,” IEEE Trans. Intell. Transp.
[21] F. F. Xu, Z. C. He, and Z. R. Sha, “Impacts of traffic management mea- Syst., vol. 16, no. 2, pp. 865–873, Apr. 2015.
sures on urban network microscopic fundamental diagram,” J. Transp. [45] J. Ke, H. Zheng, H. Yang, and X. M. Chen, “Short-term forecasting
Syst. Eng. Inf. Technol., vol. 13, no. 2, pp. 185–190, Apr. 2013. of passenger demand under on-demand ride services: A spatio-temporal
[22] E. I. Vlahogianni, “Computational intelligence and optimization for deep learning approach,” J. Transp. Res. C, Emerg. Technol., vol. 85,
transportation big data: Challenges and opportunities,” in Engineer- pp. 591–608, Dec. 2017.
ing and Applied Sciences Optimization, vol. 38, N. Lagaros and [46] H. Yu, Z. Wu, S. Wang, Y. Wang, and X. Ma, “Spatiotemporal recurrent
M. Papadrakakis, Eds. Cham, Switzerland: Springer, 2015. convolutional networks for traffic prediction in transportation networks,”
[23] Z. Shan, D. Zhao, and Y. Xia, “Urban road traffic speed estimation for Sensors, vol. 17, no. 7, p. 1501, Jun. 2017.
missing probe vehicle data based on multiple linear regression model,” in [47] T. N. Kipf and M. Welling, “Semi-supervised classification with
Proc. 16th Int. IEEE Conf. Intell. Transp. Syst., The Hague, Netherlands, graph convolutional networks,” Sep. 2016, arXiv:1609.02907.
Oct. 2013, pp. 118–123. [Online]. Available: https://arxiv.org/abs/1609.02907
[24] S. G. X. Xiangjie, “Short-term traffic volume intelligent hybrid fore- [48] Y. Li, R. Yu, C. Shahabi, and Y. Liu, “Diffusion convolutional
casting model and its application,” Syst. Eng.—Theory Pract., vol. 31, recurrent neural network: Data-driven traffic forecasting,” Jul. 2017,
no. 3, pp. 562–568, 2011. arXiv:1707.01926. [Online]. Available: https://arxiv.org/abs/1707.01926
[25] E. I. Vlahogianni, J. C. Golias, and M. G. Karlaftis, “Short-term traffic [49] J. Bruna, W. Zaremba, A. Szlam, and Y. Lecun, “Spectral networks and
forecasting: Overview of objectives and methods,” Transp. Rev., vol. 24, locally connected networks on graphs,” Dec. 2013, arXiv:1312.6203.
no. 5, pp. 533–557, Sep. 2004. [Online]. Available: https://arxiv.org/abs/1312.6203
[26] H. van Lint and C. van Hinsbergen, “Short-term traffic and travel [50] Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies
time prediction models,” Artif. Intell. Appl. Critical Transp., vol. 22, with gradient descent is difficult,” IEEE Trans. Neural Netw., vol. 5,
pp. 22–41, Nov. 2012. no. 2, pp. 157–166, Mar. 1994.
[27] H. Sun, C. Zhang, and B. Ran, “Interval prediction for traffic time series [51] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
using local linear predictor,” in Proc. 7th Int. IEEE Conf. Intell. Transp. Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
Syst., Washington, DC, USA, Nov. 2004, pp. 410–415. [52] K. Cho, B. van Merrienboer, D. Bahdanau, and Y. Bengio,
[28] G. Dudek, “Pattern-based local linear regression models for short-term “On the properties of neural machine translation: Encoder-decoder
load forecasting,” Electr. Power Syst. Res., vol. 130, pp. 139–147, approaches,” Sep. 2014, arXiv:1409.1259. [Online]. Available:
Jan. 2016. https://arxiv.org/abs/1409.1259
[29] M. van der Voort, M. Dougherty, and S. Watson, “Combining kohonen [53] J. Chung, C. Gulcehre, K. H. Cho, and Y. Bengio, “Empirical evaluation
maps with ARIMA time series models to forecast traffic flow,” Transp. of gated recurrent neural networks on sequence modeling,” Dec. 2014,
Res. C, Emerg. Technol., vol. 4, no. 5, pp. 307–318, 1996. arXiv:1412.3555. [Online]. Available: https://arxiv.org/abs/1412.3555
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY DURGAPUR. Downloaded on February 19,2024 at 08:52:32 UTC from IEEE Xplore. Restrictions apply.
3858 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 21, NO. 9, SEPTEMBER 2020
Ling Zhao received the Ph.D. degree from the Pu Wang received the B.S. degree in physics from
School of Geosciences and Info-Physics, Central the University of Science and Technology of China,
South University, Changsha, China. She is currently Hefei, China, in 2005, and the Ph.D. degree in
an Associate Professor with the School of Geo- physics from the University of Notre Dame, Notre
sciences and Info-Physics, Central South University. Dame, IN, USA, in 2010. From May 2010 to
She has published more than 10 peer-reviewed jour- December 2011, he was a Post-Doctoral Researcher
nal articles. Her research interests mainly concen- with the Department of Civil and Environmental
trate in humanities and social science based on big Engineering, MIT. He is currently a Full Professor
geodata and artificial intelligence. with the School of Traffic and Transportation Engi-
neering, Central South University, Changsha, China.
He has authored or coauthored several papers in
international leading journals, such as Science, Nature Physics, and Nature
Communications. His research interests include complex networks, traffic
analysis, human dynamics, and data mining. He is a Guest Editor of the
IEEE T RANSACTIONS ON I NTELLIGENT T RANSPORTATION S YSTEMS .
Yujiao Song received the B.S. degree in 2016.
She is currently pursuing the master’s degree with
the School of Geosciences and Info-Physics, Cen- Tao Lin received the M.S. and Ph.D. degrees in agri-
tral South University. Her research interests include cultural and biological engineering from the Univer-
humanities and social science based on big geodata sity of Illinois at Urbana–Champaign. He joined the
and artificial intelligence. Faculty of the Biosystems Engineering Department,
Zhejiang University, by 100 Talents Program of
Zhejiang University and serviced as a Full Research
Professor. He has published more than ten peer-
reviewed journal articles. His research focuses on
agricultural big data and AI systems informatics
and analytics, ranging from spatio-temporal analysis,
GIS, optimization modeling analysis, to high perfor-
mance cyberinfrastructure enabled decision support systems.
Chao Zhang received the Ph.D. degree in computer
science from the University of Illinois at Urbana– Min Deng is currently a Professor with the School
Champaign. He is currently an Assistant Professor of Geosciences and Info-Physics, Central South
with the School of Computational Science and Engi- University, Changsha, China. His research interests
neering, Georgia Institute of Technology. He has mainly concentrate in humanities and social science
published more than 50 papers in top-tier confer- based on big geodata and artificial intelligence.
ences and journals. His work has been honored by
multiple awards, including the ACM SIGKDD Dis-
sertation Runner-Up Award and the ECML/PKDD
Best Student Paper Runner-Up Award. His research
interests include data mining and machine learning.
He is particularly interested in developing label-efficient and robust learning
techniques, with applications in text mining and spatiotemporal data mining. Haifeng Li (M’15) received the master’s degree
in transportation engineering from the South
China University of Technology, Guangzhou, China,
in 2005, and the Ph.D. degree in photogrammetry
and remote sensing from Wuhan University, Wuhan,
Yu Liu received the B.S., M.S., and Ph.D. degrees China, in 2009. He was a Research Associate
from Peking University in 1994, 1997, and 2003, with the Department of Land Surveying and Geo-
respectively. He is currently a Professor with the Informatics, The Hong Kong Polytechnic University,
Institute of Remote Sensing and Geographic Infor- Hong Kong, in 2011, and a Visiting Scholar with
mation System, Peking University. His research the University of Illinois at Urbana–Champaign,
interest mainly concentrates in humanities and social Urbana, IL, USA, from 2013 to 2014. He is cur-
science based on big geo-data. rently a Professor with the School of Geosciences and Info-Physics, Central
South University, Changsha, China. He has authored more than 30 journal
papers. His current research interests include geo/remote sensing big data,
machine/deep learning, and artificial/brain-inspired intelligence. He is a
reviewer for many journals.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY DURGAPUR. Downloaded on February 19,2024 at 08:52:32 UTC from IEEE Xplore. Restrictions apply.