A_Multi-Index_Fusion_Clustering_Strategy_for_Traffic_Flow_State_Identification
A_Multi-Index_Fusion_Clustering_Strategy_for_Traffic_Flow_State_Identification
INTELLIGENT TRANSPORTATION
Received October 22, 2019, accepted November 4, 2019, date of publication November 8, 2019, date of current version
November 27, 2019.
Digital Object Identifier 10.1109/ACCESS.2019.2952359
ABSTRACT This paper attempts to improve the identification accuracy of traffic flow states, and disclose
the impacts of different evaluation indices on the identification results. To this end, a multi-index fusion
clustering strategy is proposed in this research. Firstly, flow, velocity and occupancy were selected as the
evaluation indices. Then, the weights of the three indices were initialized by a group of experts. After that, the
objective function of a weight optimization model was set up to maximize the distance between projection
centers of samples under different traffic flow states and to minimize the projection variance between samples
under the same traffic flow state. The model was solved by the method of Lagrange multipliers, producing
the optimal weight combination. Then, the optimal weights were introduced to the fuzzy c-means (FCM)
clustering, forming the multi-index fusion clustering method. The results of example analysis show that our
method differentiated between traffic flow states more accurately than the original FCM clustering. And the
traffic flow identification accuracy improved from 94.0% to 96.6%. This is because the improved method
retains most of the original features of the evaluation indices, which further facilitates the accurate clustering
of traffic flow states.
INDEX TERMS Identification of traffic flow states, index fusion, weight optimization, fuzzy c-means
(FCM) clustering.
166404 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ VOLUME 7, 2019
D. Bao: Multi-Index Fusion Clustering Strategy for Traffic Flow State Identification
the experts. This index can be computed by: Finally, the upper limit a and lower limit b of the weight
3 interval for each evaluation index can be derived by the
X
ω= αi qi (1) three-sigma rule:
i=1 a = m − σ, b = m − σ (6)
where, qi is the background level of expert i (Table 2); αi is the where, Pi is the probability that a weight of an index belongs
degree of impacts of each background element (professional to the weight interval i; j and n are the number of experts;
title: 0.5; work experience: 0.4; education: 0.1). ci is the median of weight interval i; m and σ are the mean
TABLE 2. Levels of expert background. and standard deviation of the weight distribution of each
evaluation index, respectively.
4) WEIGHT OPTIMIZATION
The weights of three evaluation indices, namely, flow, veloc-
ity and occupancy, were optimized, forming the optimal
weight combination that can effectively identity traffic flow
states.
Let R = {(xi , yi )}ni=1 , yi ∈ {0, 1} be a set of road
monitoring data, and Xi , µi and Di be the set, mean vector and
3) LIMITS OF WEIGHT INTERVALS variance matrix of type i samples, respectively. Suppose the
In general, the weights assigned by experts obey Gaussian data are projected to a straight line y = wx. Then, the projec-
distribution. The greater the difference between the weights tion centers of the two types of samples are wT µ0 and wT µ1 ,
of an evaluation index, the less likely for them to fall into respectively, and the projection variances of the two types of
the same weight interval. The probability pi that the weight samples are wT D0 w and wT D1 w, respectively. According to
assigned by an expert to an evaluation index falls into weight the principle of optimal classification, the projection variance
interval i can be expressed as: (wT D0 w + wT D1 w) of the same type of samples should
(ck − ck−i ) 1−ω be minimized, and the distance between projection centers
k−1
× , 1≤i≤k −1 2
of different types of samples ( wT µ0 − wT µ1 2 ) should be
2
P
j=1 (ck − cj )
maximized. Taking the initial weight intervals as the con-
pi = ω, i=k (2)
straints, the evaluation index weights of traffic flow states can
(c − c ) 1 − ω
11+k−i k
× , k + 1 ≤ i ≤ 10 be optimized by:
P10
2
j=k+1 (ck − cj ) 2
wT µ 0 − wT µ 1
where, ω is the reliability index of the expert; i, j and k are MaxJ (w) = 2
wT D0 w + wT D1 w
weight intervals; ck , cj , ck−i and c11+k−I are the median of
each weight interval (Table 1). wT (µ0 − µ1 ) (µ0 − µ1 )T w
=
That is to say, when an expert gives the membership inter- wT (D0 + D1 ) w
m
val of indicator weight, the membership probability of the X
indicator within this interval is expressed by the expert reli- s.t.0 ≤ ai ≤ wi ≤ bi , wi = 1 (7)
ability index ω. At the same time, according to formula (2), i=1
the membership probability of this index belonging to other where, J (w) is the objective function of the weight opti-
intervals can be obtained. Then, the membership function of mization model; wi is the weight of evaluation index i; i =
each index weight can be obtained by weighted average of the (1, 2, . . . , m) is the number of evaluation indices; ai and bi
initial weight information of several experts. The membership are the upper and lower limits of the interval of weight wi .
degree of each weight of an evaluation index to a weight Obviously, both the numerator and denominator of the
interval can be obtained by the weighted average of the initial model are quadratic terms about weight vector w. This means
weights given to the index by multiple experts: the solution of the objective function is independent of the
n length of w, and only related to the direction of w. Without
loss of generality, it is assumed that wT (D0 + D1 )w = 1, and
X
Pi = pi n (3)
j=1
the model was solved by the method of Lagrange multipliers.
The optimal weight vector can be obtained by setting up a
Then, the mean and standard deviation of the weight dis- Lagrange function, taking the derivative of w and making the
tribution of each evaluation index can be obtained respec- reciprocal zero.
tively by:
X10 B. FCM CLUSTERING BASED ON THE OPTIMAL WEIGHTS
m = E(P) = (ci × Pi ) (4)
ri=1 The optimized weights of evaluation indices were introduced
p X10 to improve the traditional FCM clustering, such that the traf-
σ = D(P) = [(ci − E(P))2 ] × Pi (5)
i=1 fic flow states could be clustered more rationally, using the
evaluation indices. The key to the improvement is to replace execute Steps 2 ∼ 4 iteratively until reaching the maximum
the traditional Euclidean distance with the Euclidean distance number of iterations bmax .
based on the optimal weights. The objective function of the
III. EXAMPLE VERIFICATION
improve FCM clustering can be expressed as:
This section uses actual traffic flow data to verify the effec-
n X
c
tiveness of our multi-index fusion clustering strategy for
min Jmω (U , v1 , v2 , · · · , vc ) =
X m (ω)
µij (dij )2 (8)
traffic flow state identification, which considers the optimal
j=1 i=1 weights of evaluation indices. In the light of the workflow
where, U is the membership matrix of each data point and of our strategy, the example verification was carried out in
the corresponding cluster center; vc is fuzzy cluster center i; the following steps: Firstly, the intervals of the initial weights
uij (uij ∈ [0, 1]) is the membership degree of data point j were computed and taken as the constraints of the weight
(ω)
to cluster center i;dij is the weighted Euclidean distance optimization model. Next, the optimal weights of the eval-
between fuzzy cluster center i and data point j; m ∈ [1, ∞) uation indices were determined by the method of Lagrange
is a weighted index, which is positively correlated with the multipliers. Finally, the FCM clustering based on optimal
fuzziness of the clustering and satisfies weights was applied to identify the traffic flow states.
Xc m
µij = 1, ∀j = 1, · · · , n. A. DATA EXTRACTION
i=1 Considering data availability, the traffic flow data released
In the conventional fuzzy c-means clustering algorithm, the by the Freeway Performance Measurement System, Califor-
weight of indexes in different dimensions is usually equally nia were selected as the data source. The sample data were
divided, and the difference of different indexes in traffic state collected at 2min periods from the lanes in multiple sections
clustering is not considered. The weight optimization model of Interstate 5 in January 2017. There are three indices of
above is used to optimize the weight of different evaluation traffic flow states in the data: flow, velocity and occupancy.
indexes, which not only ensures the effectiveness of traffic Each lane has 22,320 sets of data. The main purpose of
state classification, but also takes into account the different this paper is to identify traffic flow state effectively through
effect of different evaluation indexes in state clustering. The cluster analysis. In order to facilitate cluster analysis, this
improved FCM clustering is implemented in the following paper firstly carried out data cleaning such as data elimination
steps: and data complement, aiming at the defects existing in the
Step 1. Initialization. Determine the number of classes c by original data. Then, dimensionless processing is carried out
formula (9), i.e. select the c value that makes L the largest. for each traffic flow state index data, so as to calculate the
Set the fuzzy coefficient m to 2. Configure the threshold optimal weight and cluster analysis through the model.
to terminate iteration ε. Define the maximum number of
iterations bmax . Initialize the membership matrix with random TABLE 3. Limits of weight intervals.
numbers.
,
c P n
µm 2
(c − 1)
P
ij kvi − xk
i=1 j=1
L (c) = , (9)
c P
n
2
µm (n − c)
P
ij xj − vi B. CALCULATION OF INITIAL WEIGHT INTERVALS
i=1 j=1
Based on the abovementioned calculation method for the
Step 2. Computing fuzzy cluster centers. Set up the vector limits of weight intervals, four experts on expressway traf-
matrix of fuzzy clusters V by formula (10): fic flow states were invited to give initial weights to the
n
n
three evaluation indices. Then, the limits of weight intervals
m m
(b)
X (b)
X (b) were computed by formulas (1)∼(6) and used to constrain
vi = µij · xj / µij , i = 1, 2 · · · , c
the weight optimization model. The computation results are
j=1 j=1
listed in Table 3 below.
(10)
Step 3. Parameter optimization. Compute and update the C. CALCULATION OF OPTIMAL WEIGHTS
fuzzy membership matrix U (b+1) by formula (11): Under the constraint of initial weight intervals, a weight
" c 2/(m−1) #−1 optimization model was established according to Subsec-
(b+1) (ω) (b+1) tion 2.1(4). Substituting the sample data into the objective
(ω) (b+1)
X
µij = (dij ) / dkj ,
function of the weight optimization model, i.e. formula (7),
k=1
the optimal weights of flow, velocity and occupancy were
k = 1, 2 · · · , c (11)
obtained as w = (0.12,0.16,0.72).
Step 4. Iterative solution. Select a suitable matrix norm to The optimization results show that occupancy weighed
compare U (b) and U (b+1) . If U (b+1) − U (b) ≤ ε, terminate much heavier than velocity and flow. This phenomenon can
the iteration, otherwise, make b = b + 1, return to Step 2, and be explained as follows: The objective function of the weight
TABLE 4. The cluster centers determined by FCM clustering. Next, the results of the improved FCM clustering were
evaluated to judge its effectiveness in identifying traffic
flow states. As shown in Table 6, the improved FCM cluster-
ing elevated the recognition accuracy of traffic flow states by
2.6% from the level of the original FCM clustering. Thus, the
improved method can effectively improve the identification
effect of traffic flow states.
Through the comparison between the clustering results In the improved FCM clustering, the weights of flow,
of the two methods, it is learned that the FCM clustering velocity and occupancy are initialized by experts and
without weight optimization output fuzzy results and had optimized based on the interrelationship between the
three indices. Therefore, the improved FCM clustering can [8] R. Herman and I. Prigogine, ‘‘A two-fluid approach to town traffic,’’
not only characterize the features of different traffic flow Science, vol. 204, no. 4389, pp. 148–151, 1979.
[9] F. L. Hall, V. F. Hurdle, and J. H. Banks, ‘‘Synthesis of recent work on
states, but also retain most of the original features of the eval- the nature of speed-flow and flow-occupancy (or density) relationships on
uation indices. This further facilitates the accurate clustering freeways,’’ Transp. Res. Rec., no. 1365, pp. 12–18, 1992.
of traffic flow states. [10] B. S. Kerner, ‘‘Three-phase traffic theory and highway capacity,’’ Phys. A,
Stat. Mech. Appl., vol. 333, pp. 379–440, Feb. 2004, doi: 10.1016/j.
physa.2003.10.017.
IV. CONCLUSION [11] B. S. Kerner and H. Rehborn, ‘‘Experimental properties of complexity in
traffic flow,’’ Phys. Rev. E, Stat. Phys. Plasmas Fluids Relat. Interdiscip.
The identification of traffic flow states is the key to the Top., vol. 53, no. 5, pp. R4275–R4278, May 1996, doi: 10.1103/PhysRevE.
prediction and control of traffic flows. This paper develops 53.R4275.
a multi-index fusion clustering method to achieve accurate [12] C. Stutz and T. A. Runkler, ‘‘Classification and prediction of road
traffic using application-specific fuzzy clustering,’’ IEEE Trans. Fuzzy
identification of traffic flow states. Syst., vol. 10, no. 3, pp. 297–308, Jun. 2002, doi: 10.1109/TFUZZ.2002.
Firstly, the weights of evaluation indices were initialized 1006433.
by a group of experts. Under the initial weights, the weight [13] H. Yang and F. Qiao, ‘‘Neural network approach to classification of traffic
flow states,’’ J. Transp. Eng., vol. 124, no. 6, pp. 521–525, 1998.
optimization model was set up by the principle of opti- [14] L. Ruiqi, Z. Xian, Z. Luo, and L. Lin, ‘‘Research on the intelligent judg-
mal classification. The model was solved by the method of ment of traffic congestion in intelligent traffic based on pattern recognition
Lagrange multipliers, producing the optimal weights to dif- technology,’’ Cluster Comput., pp. 1–8, Mar. 2018.
[15] F. Su, H. Dong, L. Jia, and X. Sun, ‘‘On urban road traffic state evaluation
ferentiate between different traffic flow states. Then, the opti- index system and method,’’ Mod. Phys. Lett. B, vol. 31, no. 1, 2017,
mal weights were introduced to the FCM clustering, forming Art. no. 1650428.
the multi-index fusion clustering method. [16] V. Ventos and O. Teytaud, ‘‘Bridge: New challenge for artificial intelli-
gence,’’ Revue D’Intell. Artificielle, vol. 31, no. 3, pp. 249–279, 2018.
The results of example analysis show that the FCM [17] C. Xu, W. Wang, P. Liu, and F. W. Zhang, ‘‘Development of a real-time
clustering coupling optimal weights can effectively iden- crash risk prediction model incorporating the various crash mechanisms
tify traffic flow states, giving overall consideration of the across different traffic states,’’ Traffic Injury Prevention, vol. 16, no. 1,
pp. 28–35, 2015.
information in multiple evaluation indices. This approach [18] Y. Yang, Z. Yuan, J. Chen, and M. Guo, ‘‘Assessment of osculating value
overcomes the one-sidedness of single-index identification method based on entropy weight to transportation energy conservation
and takes account of the difference of multiple evaluation and emission reduction,’’ Environ. Eng. Manage. J., vol. 16, no. 10,
pp. 2413–2424, 2017.
indices in the identification process. It should be pointed out [19] D.-W. Chen, ‘‘Classification of traffic flow situation of urban freeways
that the method in this paper introduces the optimized traffic based on fuzzy clustering,’’ J. Transp. Syst. Eng. Inf. Technol., vol. 5, no. 1,
condition evaluation index weight into the calculation process pp. 62–67, 2005.
[20] S. Jang and W. Guan, ‘‘A Method to estimate traffic flow state of urban
of Euclidean distance in the clustering process, which is an freeway road section,’’ J. Beijing Jiaotong Univ., vol. 33, no. 6, pp. 47–51,
improvement on the conventional fuzzy c-means clustering 2009.
method. Compared with the conventional fuzzy c-means clus- [21] R. Yuan, Z. Li, X. Guan, and L. Xu, ‘‘An SVM-based machine learning
method for accurate Internet traffic classification,’’ Inf. Syst. Frontiers,
tering method, this method can improve the traffic identi- vol. 12, no. 2, pp. 149–156, 2010.
fication accuracy to some extent. However, compared with [22] L. Zhang, Y. Jia, Z. Niu, and C. Liao, ‘‘Traffic state classification based
other machine learning algorithms, its accuracy improvement on parameter weighting and clustering method,’’ J. Transp. Syst. Eng. Inf.
Technol., vol. 14, no. 6, pp. 147–151, 2014.
effect is not particularly obvious. Therefore, more effective [23] J. Cao, L. J. Zhang, L. Hou, Z. H. Chen, and H. Zhang, ‘‘Traffic status
algorithms should be considered to further improve the accu- recognition based on information entropy weighted FCM,’’ Comput. Appl.
racy of the algorithm. Softw., vol. 35, no. 10, pp. 68–73, 2018.
[24] D. Sun, Y. Jia, L. Qin, Y. Yang, and J. Zhang, ‘‘A variance maximization
based weight optimization method for railway transportation safety perfor-
REFERENCES mance measurement,’’ Sustainability, vol. 10, no. 8, p. 2903, 2018.
[1] G. M. D’Este, R. Zito, and M. A. P. Taylor, ‘‘Using GPS to measure traf-
fic system performance,’’ Comput.-Aided Civil Infrastruct. Eng., vol. 14,
no. 4, pp. 255–265, Dec. 2002, doi: 10.1111/0885-9507.00146.
DI BAO was born in Daqing, Heilongjiang,
[2] K. W. K. Lui and H. C. So, ‘‘Range-based source localisation with pure
in 1982. She is currently pursuing the Ph.D. degree
reflector in presence of multipath propagation,’’ Electron. Lett., vol. 46,
no. 13, pp. 957–958, Jun. 2010, doi: 10.1049/el.2010.3431. in enterprise management with the Harbin Institute
of Technology. Her main research directions are
[3] Y. E. Hawas, ‘‘A fuzzy-based system for incident detection in urban street
networks,’’ Transp. Res. C, Emerg. Technol., vol. 15, no. 2, pp. 69–95, financial management theory and practice, man-
2007. agement control and corporate governance, and
[4] Y. Yang, Z. Yuan, X. Fu, Y. Wang, and D. Sun, ‘‘Optimization model of public finance. She has participating in the formu-
taxi fleet size based on GPS tracking data,’’ Sustainability, vol. 11, no. 3, lation by the Ministry of Industry and Information
p. 731, 2019. Technology of the measures for the Administra-
[5] Y. Yuan, H. Van Lint, F. Van Wageningen-Kessels, and S. Hoogendoorn, tion of Investment and financing in Colleges and
‘‘Network-wide traffic state estimation using loop detector and floating car Universities, one project of the Humanities and Social Science Planning
data,’’ J. Intell. Transp. Syst., vol. 18, no. 1, pp. 41–50, 2014. Fund of the Ministry of Education, one project of the Natural Science
[6] H. Tan, L. Song, Y. Cheng, and B. Cheng, ‘‘A tensor completion-based Foundation of the Ministry of Education, one project of the Provincial Social
traffic state estimation model,’’ in Proc. 14th COTA Int. Conf. Transp., Science Planning Fund, one research project of the Provincial Department
Changsha, China, 2014, pp. 298–309. of Transportation, one project of the Science and Technology Department of
[7] B. Ran, H. Tan, J. Feng, W. Wang, Y. Cheng, and P. Jin, ‘‘Estimating Heilongjiang Province, and one project of the horizontal project and three EI
missing traffic volume using low multilinear rank tensor completion,’’ articles were published.
J. Intell. Transp. Syst., vol. 20, no. 2, pp. 152–161, 2015.