Unsupervised Classification of Multivariate Time Series Using VPCA and Fuzzy Clustering With Spatial Weighted Matrix Distance
Unsupervised Classification of Multivariate Time Series Using VPCA and Fuzzy Clustering With Spatial Weighted Matrix Distance
3, MARCH 2020
Abstract—Due to high dimensionality and multiple variables, etc. [1]–[3]. The clustering of time series is helpful for dis-
unsupervised classification of multivariate time series (MTS) covering unknown interesting patterns that can be used in
involves more challenging problems than those of univariate classification, modeling, and prediction of systems. These
ones. Unlike the vectorization of a feature matrix in tradi-
tional clustering algorithms, an unsupervised pattern recognition days, there are generally three different ways to cluster time
scheme based on matrix data is proposed for MTS samples series, namely: 1) shape based; 2) feature based; and 3) model
in this paper. To reduce the computational load and time based, respectively [2]. Since shape-based approaches extract
consumption, a novel variable-based principal component anal- features in terms of the fluctuation shape of raw time series,
ysis (VPCA) is first devised for the dimensionality reduction of their performances may be affected by some challenging
MTS samples. Afterward, a spatial weighted matrix distance-
based fuzzy clustering (SWMDFC) algorithm is proposed to problems, such as noise, amplitude scaling, offset transla-
directly group MTS samples into clusters as well as preserve the tion, longitudinal scaling, linear drift, and discontinuities [2],
structure of the data matrix. The spatial weighted matrix dis- [3]. In model-based methods, clustering indirectly works on
tance (SWMD) integrates the spatial dimensionality difference the model parameters of time series [4]. Besides the time-
of elements of data into the distance of MST pairs. In terms of consuming procedure of modeling, the model uncertainty
the SWMD, the MTS samples are clustered without vectoriza-
tion in the dimensionality-reduced feature matrix space. Finally, of time series also leads to poor clustering results, espe-
three open-access datasets are utilized for the validation of the cially when the clusters are close to each other [2], [5]. On
proposed unsupervised classification scheme. The results show the other hand, feature-based approaches first convert the
that the VPCA can capture more features of MTS data than prin- raw time series data into lower dimensional feature data by
cipal component analysis (PCA) and 2-D PCA. Furthermore, the the techniques of dimensionality reduction, such as princi-
clustering performance of SWMDFC is superior to that of fuzzy
c-means clustering algorithms based on the Euclidean distance pal component analysis (PCA), nonlinear component analysis,
or image Euclidean distance. independent component correlation algorithm, and singular
value decomposition. Then, the clustering algorithm is applied
Index Terms—Dimensionality reduction, fuzzy clustering, mul-
tivariate time series (MTS). to the selected feature data. There is some evidence to show
that feature-based approaches are more suitable for long time
series than shape-based ones [2]. Furthermore, unlike model-
I. I NTRODUCTION based approaches, prior knowledge of data is not necessary for
URING THE recent few decades, there has been the unsupervised classification of time series in feature-based
D a tremendous growth of interest in clustering of time
series data due to an explosion of data and the rapid develop-
methods. Therefore, with the increasing number of simultane-
ously observed variables, feature-based approaches are applied
ment of sensing technologies. Its applications involve a vari- to more and more pattern recognition problems of multivariate
ety of fields, ranging from analysis of the stock market, time series (MTS) in complex systems.
pattern discovery of medical or biological signals, feature Clustering of MTS data is an unsupervised classification
monitoring of the environment, tracing of dynamic objects, process where a set of objects is grouped into clusters so
that the objects in the same cluster have high similarity
Manuscript received January 28, 2018; revised May 14, 2018 and but are very dissimilar to the objects in other clusters [6].
November 10, 2018; accepted November 18, 2018. Date of publication
December 11, 2018; date of current version January 21, 2020. This work was Compared to univariate time series, MTS samples are 2-D
supported in part by the National Natural Science Foundation of China under data. Dozens of variables of MTS samples usually lead to
Grant 61571302, Grant 61371145, and Grant 61671303, in part by the the exponential growth of data quantity, which makes the tra-
project of the Science and Technology Commission of Shanghai under Grant
18070503000, and in part by the Industry-Education-Research Project of ditional clustering algorithms inefficient or even infeasible.
Shanghai Normal University under Grant DCL201704. This paper was recom- Hence, dimensionality reduction has become an essential step
mended by Associate Editor A. F. Skarmeta Gomez. (Corresponding author: before the clustering in the feature-based pattern recognition
Yonghong Tan.)
The authors are with the College of Information, Mechanical and Electrical of MTS samples. Currently, among a variety of dimensionality
Engineering, Shanghai Normal University, Shanghai 200234, China (e-mail: reduction techniques, PCA and its variants are widely used to
[email protected]; [email protected]). compress MTS in a feature space with lower dimensionality.
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. Since most of the traditional clustering algorithms are vector-
Digital Object Identifier 10.1109/TCYB.2018.2883388 based approaches, the features of MTS are often converted
2168-2267 c 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: NANKAI UNIVERSITY. Downloaded on September 25,2024 at 09:28:07 UTC from IEEE Xplore. Restrictions apply.
HE AND TAN: UNSUPERVISED CLASSIFICATION OF MTS USING VPCA AND FUZZY CLUSTERING WITH SWMD 1097
into vectors before clustering [1], [2], which definitely breaks of univariate time series, rather than MTS. To classify spatial
the inherent data structure. Moreover, the similarity com- units by using MTS, Disegna et al. [24] and Coppi et al. [28]
parison of MTS feature vectors with the Euclidean distance developed a fuzzy clustering model based on a set of quan-
cannot find the relationship between variables [7]. Hence, titative features observed at several time occasions. In those
some researchers devised similarities of MTS samples directly methods, different investigated objects with multiple variables
on the base of principal components after dimensionality at every sampling time construct one sampling matrix. MTS
reduction. According to eigenvalues and the angles between are identified by extracting their cross-sectional and longi-
the principal component subspaces, a weighted PCA simi- tudinal trajectory features over time. Instead of focusing on
larity factor SPCA is proposed to measure the similarity of the data structure, the term spatial unit in the aforementioned
MTS pairs after PCA [8]. Ferreira et al. [9] also presented method mainly stresses the physical or geological nature of
a new clustering method, that is, CPT-M based on SPCA for the objects in the real world. It has been proved that this
the pattern recognition of MTS of the electric power sec- method is efficient for clustering the multiple time series with
tor. To capture the similarity degree between the datasets, a long continuous record into different groups. On account of
Singhal and Seborg [10] proposed a modified SPCA that the dissimilarity based on continuous time series trajectories
weighs each PC by its explained variance. They successfully and the objective functions constructed by feature homogeneity
applied the SPCA to the k-means clustering of MTS from and spatial homogeneity, it is not suitable for clustering the
batch fermentation and a continuous exothermic chemical multiple batch data of MTS without any physical or geological
reactor. Combining SPCA with the fuzzy clustering technique, spatial difference into different groups.
Fontes and Pereira [11] developed a comprehensive method Therefore, in order to reduce the dimensionality of MTS
for pattern recognition associated with fault prediction in without breaking the data structure and realize the similarity
gas turbines. Nevertheless, SPCA -based pattern recognition comparison of matrix data, we proposed a novel unsuper-
approaches are unable to provide higher-order representations vised classification scheme for MTS using variable-based
for non-Gaussian data [12]. Besides, Yang and Shahabi [13] PCA (VPCA) and spatial weighted matrix distance-based
devised an extended Frobenius norm, Eros, which measures fuzzy clustering (SWMDFC). A set of MTS samples is
the similarity between two MTS samples by comparing the first transformed into a lower dimensional space by the
corresponding weighted principal components. Zhu et al. [14] VPCA. Without vectorizing the PC feature matrices of sam-
also applied Eros to analyze the multivariate emotional behav- ples, the MTS pairs are compared through a spatial weighted
ior of users in social networks. However, Eros does not matrix distance (SWMD) in order to preserve the information
satisfy the triangle inequality. It only considers the acute angle of the inherent feature structure of the MTS data. Furthermore,
between the corresponding PCs by taking the absolute value three MTS datasets in the UCI Repository are chosen for the
of the inner product [13]. validation of the proposed scheme. The VPCA is compared
To extract efficient features from high-dimensional time with PCA and 2DPCA, while the SWMD is compared with the
series, some modified PCAs have also been devised over the Euclidean distance and image Euclidean distance (IMED) [29],
past few years. Yoon et al. [15] proposed a family of unsu- which is a matrix distance for image data.
pervised methods for a feature subset selection from MTS The rest of this paper is organized as follows. Section II
based on common PCA (CPCA). Li [16] further developed presents the dimensionality reduction of MTS samples by
a classification method (CCPCA) based on CPCA which is VPCA. Then, the proposed SWMD and SWMDFC algorithms
a counterpart of 2-D PCA (2DPCA) used in image data for are described in Section III. Subsequently, the experiments
MTS [17], [18]. Nevertheless, there is still no evaluation cri- and the corresponding analysis on the experimental results
terion for the appropriate common principal components of are presented in Section IV. Finally, Section V will give the
all MTS. conclusions of this paper.
In addition, the discrete wavelet transform (DWT) is also
used for feature extraction in the clustering of MTSs [19]–[21].
II. D IMENSIONALITY R EDUCTION OF
After the DWT, a pair of MTS compared with the Euclidean
MTS BASED ON VPCA
distance of the wavelet variance matrix, wavelet correlation
matrix, or wavelet covariance matrix [19]–[21]. Although the Suppose there are M MTS samples for pattern recognition.
relationship between the variables of an MTS sample is con- Each MTS sample consists of N uncorrelated variables with L
sidered in the wavelet-based distance, the computation of the observations. As a result, an MTS sample Xi (i = 1, 2, . . . , M)
wavelet variance matrix, wavelet correlation matrix, or wavelet can be defined by the following N × L matrix:
covariance matrix for every MTS sample will undoubtedly ⎡ i ⎤
x11 · · · x1L
i
increase the complexity of the clustering procedure. To cap- ⎢
Xi = ⎣ ... .. .. ⎥
ture the dynamic behavior of time series in evolution over . . ⎦ (1)
time, D’Urso et al. integrated fuzzy clustering with the auto- i
xN1 ··· i
xNL
correlation function [22], autocovariance functions [23], and
copula function [24] to identify similar patterns of time series, where xnl i is the lth observation of the nth variable of the
improving the analysis accuracy through transforming a high- where Ynp i = f n , p = 1, 2, . . . , p and Yi = fn is the
ip s n i
dimensional data space into a low-dimensional one with an row vector of Yi .
equivalent structure while preserving most of the relevant Thus, all records of every variable have been represented
information. by the principal components in their own feature space. Each
MTS sample Xi is transformed into matrix Yi ∈ RN×ps by
A. Variable-Based Principal Component Analysis VPCA. Furthermore, for MTS, N ≥ 2 and ps ≥ 2. It should be
noted that the feature vector of every variable of Yi is obtained
These days, most of the dimensionality reduction meth-
according to its variation distribution of all MTS samples in the
ods are vector-based approaches. They cannot directly be
corresponding variable eigenvector space. The dimensionality
implemented to the samples with matrix features without vec-
reduction procedure of VPCA is shown in Fig. 1.
torization. However, the vectorization of a matrix sample may
break the raw data structure and lead to the loss of useful struc-
ture information inherent in between the variables of MTS B. Complexity Analysis of VPCA
data. Hence, we developed the VPCA for matrix samples.
Suppose there are M samples of MTS for pattern recogni-
Besides the reduction of computational load of pattern recogni-
tion. Each MTS sample is an N ×L matrix, that is, N variables
tion, the main advantage of the VPCA is that it can be applied
with length L. The time complexity for reconstructing Vn or
to the dimensionality reduction of MTS without vectorization.
Yi is O(MN). In step 2, the time complexity for computing
Since each MTS can be regarded as one record of all the
the covariance matrix is O(L2 M) and for eigenvector analy-
variables, the application of PCA to every MTS sample is actu-
sis and projecting Vn by Ũn is O(L3 ). Note that there are
ally to find eigenvectors separately, which can maximize the
N variable matrices that need to be handled. Thus, the total
variance of all variables in one record. However, only find-
time complexity for computation of the VPCA is O(MN +N
ing the maximum difference of all MTS samples is helpful
(L2 M +L3 )+ MN), namely, O(N L2 M +NL3 ). However, if each
to cluster them into groups. Hence, it is reasonable to com-
MTS sample is vectorized as a vector, the time complexity of
bine the records of every variable from all of the samples into
PCA for M MTS samples is O(N 2 L2 M +N3 L3 ). Moreover,
one matrix. Thus, the difference of the same variable from
due to the storage of the covariance matrix, the memory com-
all of the MTS samples can be observed. In the VPCA, there
plexity of PCA is O(N 2 L2 ), while that of the VPCA is only
are three main steps to realize the dimensionality reduction of
O(L2 ). Thus, it can be seen that the VPCA not only realizes
MTS samples.
the dimensionality reduction without breaking the data struc-
1) Reconstructing the N variable matrix Vn (n =
ture but also has the superiority in less time complexity and
1, 2, . . . , N). According to Xi , each Vn can be repre-
smaller memory complexity over the method of PCA after
sented as the following M × L matrix:
⎡ n ⎤ vectorization.
v11 · · · vn1L
⎢ .. ⎥
Vn = ⎣ ... ..
. . ⎦ (2)
III. F UZZY C LUSTERING W ITH S PATIAL -W EIGHTED
vnM1 ··· vnML M ATRIX D ISTANCE
where vnil = xnl i , each row vector vn = xi .
i n It is known that fuzzy clustering is an unsupervised parti-
2) Deducing the dimensionality of every variable matrix tion procedure in which each object can be grouped into any
Vn . of the clusters with a certain degree of membership. Due to
a) computing the variance matrix Ṽn = Vn − V̄n ; the uncertainty and complexity of the system, MTS obtained
b) computing covariance matrix n = ṼTn Ṽn ; from the real world may not be clearly defined and they usually
c) calculating the eigenvectors and eigenvalues of the exist with a vague boundary. Hence, fuzzy clustering usually
covariance by n = Un UTn ; fits natural partitions better than crisp clustering [30]–[33].
d) projecting each Vn by Ũn to a space with lower However, traditional fuzzy clustering algorithms can only be
dimensionality, that is, Fn = Vn Ũn and Ũn = applied to vector-based objects. In addition, without prior
[un1 , . . . , unps ]. ps = minn=1,2,...,N {pn }, where pn knowledge of class labeling, the pattern partition of time
is determined by the same accumulative percentage series mainly depends on pairwise similarity comparisons of
of total variation Q. Then, Fn is expressed as sequences by means of a distance measure [5]. Although
⎡ n n ⎤ dozens of distance measures have been devised in liter-
f11 · · · f1p s ature studies in order to reflect the underlying similarity
⎢ .. ⎥.
Fn = ⎣ ... ..
. . ⎦ (3) of time series, the Euclidean distance is still widely used
fM1 · · · fMps
n n to compare PC features of MTS pairs after dimensional-
ity reduction [1], [34]. However, the Euclidean distance is
3) Reconstructing the M dimensionality-reduced MTS sam- a vector-based measure. It is known that transforming the fea-
ple matrix Yi by Fn , i.e., ture matrix of an MTS sample into a vector not only destroys
⎡ i i ⎤ the inherent data structure but also loses the information of
Y11 · · · Y1p s
⎢ .. ⎥ the intrinsic relationship among features. Therefore, taking the
Yi = ⎣ ... ..
. . ⎦ (4) matrix structure of MTS features into account, it is necessary
YN1 · · · YNps
i i
to develop a new fuzzy clustering with the matrix distance
Authorized licensed use limited to: NANKAI UNIVERSITY. Downloaded on September 25,2024 at 09:28:07 UTC from IEEE Xplore. Restrictions apply.
HE AND TAN: UNSUPERVISED CLASSIFICATION OF MTS USING VPCA AND FUZZY CLUSTERING WITH SWMD 1099
measure that involves the spatial information of every element if σ remains unchanged, the range of Suw will also become
of the matrix. smaller. However, too small of a range of Sst is not efficient
to identify the spatial distance of Ynpi and Y j
n p in the feature
A. Spatial-Weighted Matrix Distance matrix space. Besides, it can be seen from Fig. 2 that the range
Generally, an N ×L grayscale image sample can be regarded of Suw increases as σ becomes smaller in the same range of
as a matrix. In order to recognize different image samples, dD , that is, S1 > S2 > S3 with the same d. Therefore, unlike
a Euclidean distance for images, called IMED, is proposed uniforming σ as one in the IMED [29], we adjust σ according
according to spatial relationships of pixels [29]. Suppose there to the dimensionality of a feature matrix and set σ = 1 − p−1 s ,
are M feature matrices Yi (i = 1, 2, . . . , M) of all MTS sam- that is, σ varies directly with ps . Thus, the range of Suw will
ples after VPCA. yi is the feature vector corresponding to Yi not shrink greatly as ps deceases. In this case, Suw is improved
in a 1 × Nps dimensional Euclidean space which is formed as
by Nps eigenvectors, that is, u11 , . . . , uN1 , . . . , u1ps , . . . , and 1 pu − pw 22
i (1 ≤ i ≤ M, 1 ≤ n ≤ N, 1 ≤ p ≤ p )
uNps . Thus, Ynp Suw = exp − . (8)
s 2π(.1 − p−1
s )
2 2(1 − p−1
s )
2
in Yi corresponds to yiu , that is, the uth element in yi and
u = n + (p − 1)N. Inspired by the IMED [29], an SWMD Suw can be regarded as a weight applied to the distance
between Yi and Yj can be devised as follows: between two features of two MTS samples. It decreases
as the position difference of two elements increases, which
Nps
directly integrates the spatial dimensionality difference with
dM Yi , Yj = Suw yiu − yju yiw − yjw the Euclidean distance. The proposed SWMD between Yi and
u,w=1 Yj can be further described as
= (yi − yj )T S(yi − yj ) (5)
dM Yi , Yj
where yi,j = [y1 , y2 , . . . , yNps ]T ; S is the spatial distance 1
Nps
pu − pw 22
= −1 2
exp − −1 2
yiu − yju yiw − yjw .
matrix which captures the intrinsic correlations between dif- 2π(1 − ps ) u,w=1 2(1 − ps )
ferent coordinates in the 2-D N × ps space; moreover, S is
(9)
positive definite, and Suw is the element of S and represents
the spatial dimensionality distance of yiu and yjw (correspond- Hence, the SWMD measures both the value difference and
i and Y j , respectively) in the feature matrix space.
ing to Ynp the spatial dimensionality difference of elements of two MTS
n p
i of Y and Y
Set the dimensionality distance of Ynp
j feature matrices. It can be directly used to compare the sim-
i n p of Yj
as ilarity between two feature matrices without vectorization.
Moreover, it can also be easily verified that the SWMD sat-
dD = pu − pw 2 = (n − n )2 + (p − p )2 (6) isfies the following three distance properties: 1) symmetry:
dM (Yi , Yj ) = dM (Yj , Yi ); 2) positivity: dM (Yi , Yj ) ≥ 0 for
where pu = [n, p] and pw = [n , p ]. They are, respec- all Yi and Yj ; and 3) reflexivity: dM (Yi , Yj ) = 0 iff Yi = Yj .
i and Y j . In addition,
tively, the dimensionality vectors of Ynp
n p
dD ∈ [0, (N − 1)2 + (ps − 1)2 ]. Suw can be designed as B. Implementation Procedure of SWMDFC
a monotonically decreasing function with dD , i.e., Since the SWMD measures the difference of two MTS fea-
1 dD2 ture matrices directly without vectorization, we can use the
Suw = exp − 2 . (7) SWMD to compute the membership matrix of the MTS feature
2π σ 2 2σ
sample belonging to the K matrix centroids. Thus, a new fuzzy
Taking all of the variables of an MTS sample into account, clustering algorithm for matrix data based on the proposed
the variation range of dD shrinks as ps decreases. Accordingly, SWMD (SWMDFC) can be described as follows.
Authorized licensed use limited to: NANKAI UNIVERSITY. Downloaded on September 25,2024 at 09:28:07 UTC from IEEE Xplore. Restrictions apply.
1100 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 50, NO. 3, MARCH 2020
Authorized licensed use limited to: NANKAI UNIVERSITY. Downloaded on September 25,2024 at 09:28:07 UTC from IEEE Xplore. Restrictions apply.
HE AND TAN: UNSUPERVISED CLASSIFICATION OF MTS USING VPCA AND FUZZY CLUSTERING WITH SWMD 1101
TABLE I
D ESCRIPTION OF DATASETS
TABLE II
C HARACTERISTICS OF C OMPARED M ETHODS
TABLE III
a clustering based on a prespecified structure are usually used M EAN VALUES OF RI OF F IVE S CHEMES , R ESPECTIVELY,
for the validation of the clustering scheme and comparison FOR T HREE DATASETS
of different clustering algorithms. Therefore, we chose three
common external clustering indices, that is, Rand index (RI),
adjusted RI (ARI), and Folks and Mallows index (FM), to
verify the proposed scheme in pattern recognition of MTS.
Suppose X and Y are, respectively, a prespecified parti-
tion and an algorithm-derived partition of a given dataset
D = {d1 , d2 , . . . , dn }. Three external criteria applied to the
TABLE IV
clustering validation can be described as [36] and [37]. M EAN VALUES OF ARI OF F IVE S CHEMES , R ESPECTIVELY,
1) Rand index (RI): FOR T HREE DATASETS
Authorized licensed use limited to: NANKAI UNIVERSITY. Downloaded on September 25,2024 at 09:28:07 UTC from IEEE Xplore. Restrictions apply.
1102 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 50, NO. 3, MARCH 2020
TABLE V
M EAN VALUES OF FM OF F IVE S CHEMES , R ESPECTIVELY,
FOR T HREE DATASETS
TABLE VI
M EAN VALUES OF FRI OF F IVE S CHEMES , R ESPECTIVELY,
FOR T HREE DATASETS
Fig. 5. Mean values of FM indices of ten subjects for the MHealth dataset.
Fig. 6. Mean values of FRI indices of ten subjects for the MHealth dataset.
TABLE VII
T IME C ONSUMPTION OF F IVE S CHEMES , R ESPECTIVELY,
FOR T HREE DATASETS
Fig. 3. Mean values of RI indices of ten subjects for the MHealth dataset.
Fig. 4. Mean values of ARI indices of ten subjects for the MHealth dataset.
Moreover, since both the IMED and the SWMD mea-
sure the Euclidean distance and spatial distance of two
values of covariance matrices of all samples, these eigenvec- feature matrices, the index curves of VPCA_IMED_FCM
tors cannot comprehensively capture the distribution feature and VPCA_SWMDFC are close to each other. However,
of all variables of all MTS samples. It can also be seen in the index curve of VPCA_IMED_FCM presents a promi-
Figs. 3–6 that the index curve of 2DPCA_FCM is above that nent fluctuation, as shown in Figs. 3–6. It means that the
of the index curve of PCA_FCM for most subject datasets, but clustering performance of VPCA_IMED_FCM is unstable.
is still lower than all of the index curves of the VPCA_FCM, Comparatively, the VPCA_SWMDFC shows the highest index
VPCA_IMED_FCM, and VPCA_SWMDFC. values for all subject data.
Authorized licensed use limited to: NANKAI UNIVERSITY. Downloaded on September 25,2024 at 09:28:07 UTC from IEEE Xplore. Restrictions apply.
HE AND TAN: UNSUPERVISED CLASSIFICATION OF MTS USING VPCA AND FUZZY CLUSTERING WITH SWMD 1103
TABLE VIII
C LUSTERING R ESULTS OF VPCA_FCM FOR S UBJECT 2
TABLE IX
C LUSTERING R ESULTS OF VPCA_IMED_FCM FOR S UBJECT 2
Suppose the clustering accuracy for one class is ra = are higher than or equal to 75% for VPCA_SWMDFC. It is
(Nmax /Nclass )100%, where Nclass is the sample number of further verified that the feature matrices of MTS pairs can
a given class and Nmax is the maximum number of samples be efficiently compared by the SWMD without destroying the
grouped in one cluster for this class by the clustering algo- data structure.
rithm. Taking subject 2 of Mhealth as an example, the specific
clustering results are shown in Tables VIII–X, respectively, for
VPCA_FCM, VPCA_IMED_FCM, and VPCA_SWMDFC as E. Comparison of Time Consumption
they, respectively, run for one time. Each row of the table In the time consumption experiment, we set the iteration
shows the clustering results of the same cluster obtained by number Gm of fuzzy clustering as 100 and the cumula-
the clustering algorithm. The total number of each column tive percentage for selecting principal component as 95%
Nclass indicates the real number of samples belonging to class in PCA, 2DPCA, and VPCA. Besides, the previously men-
Li . Each element in the column is the sample number of tioned five schemes are also, respectively, applied to each
Li grouped into cluster Ck by the algorithm. As a result, of the 12 datasets for 100 times. The mean run time of
the last row of each table indicates the clustering accuracy the five schemes is recorded in Table VII. The last row
of the algorithm for every class. The results in those three of the table indicates the mean values of ten subjects for
tables show that both VPCA_FCM and VPCA_IMED_FCM the Mhealth dataset. It can be seen that the run time of
have an ra s lower than 60%. The value of ra of class 12 for every scheme increases as the number of samples M becomes
VPCA_IMED_FCM is even lower than 50%. Nevertheless, larger. Since the time complexity of 2DPCA is O(N 2 L2 M) as
except for class 12, all values of ra of the other 11 classes M>>L, the run times of 2DPCA_FCM and VPCA_FCM are
Authorized licensed use limited to: NANKAI UNIVERSITY. Downloaded on September 25,2024 at 09:28:07 UTC from IEEE Xplore. Restrictions apply.
1104 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 50, NO. 3, MARCH 2020
TABLE X
C LUSTERING R ESULTS OF VPCA_SWMDFC FOR S UBJECT 2
close to each other. On the other hand, the computation of The prominent advantage of the VPCA_SWMDFC scheme
the matrix distance, that is, the IMED [29] or the proposed is that it can be directly applied to MTS matrix data with-
SWMD, consumes relatively more time due to the integra- out vectorizing the data in the clustering process. Besides,
tion of the spatial dimensionality difference of data elements to validate the proposed pattern recognition scheme, three
into the distance of two samples. With the increase of M, open-access datasets (Mhealth also includes ten subdatasets)
the run times of VPCA_IMED_FCM and VPCA_SWMDFC are used in the clustering experiments. The results indi-
will increase rapidly. Nevertheless, for the larger dataset, cate that through constructing eigenvectors on all samples of
PenDigits, their time consumptions are still lower than that of the same variable, more important features can be extracted
the PCA_FCM. Since the time complexity of the covariance by VPCA than by PCA or 2DPCA. The complexity of
matrix in the PCA for MST samples is O(N 2 L2 M), larger than the VPCA is smaller than that of the PCA based on the
that of 2DPCA or VPCA, the run time of the PCA_FCM is the vectorization of the matrix sample.
largest of all schemes for the PenDigits dataset. Comparatively, In addition, inherent structural information of the MTS
VPCA_FCM spends the least time to realize the pattern parti- data can be kept with an SWMD. From the comparison, it is
tion of MST samples, which further verifies that the VPCA can known that the SWMDFC outperforms the fuzzy c-means clus-
effectively reduce the computational cost for high-dimensional tering algorithms based on the Euclidean distance or IMED in
multivariate data. pattern recognition of MTS. Since two matrix data can be
compared with the proposed SWMD, more clustering algo-
rithms can be designed for matrix data classification by using
V. C ONCLUSION the SWMD. Moreover, because of the great diversity of real
With the rapid development of sensor and network tech- data, our future work will focus on further exploring the
nology, MTS data have been growing up exponentially in the VPCA for MTS with a different length or with nonlinear
last few decades. Recognizing unknown features from massive mappings.
MTS data can help us construct the model and predict future
developing trends of systems. However, unsupervised classifi-
cation of MTS samples is still a hard nut to crack due to its R EFERENCES
properties of high dimensionality and multiple variables [40]. [1] T. W. Liao, “Clustering of time series data—A survey,” Pattern
Therefore, an unsupervised classification scheme of MTS Recognit., vol. 38, no. 11, pp. 1857–1874, 2005.
is proposed in this paper by using VPCA and fuzzy cluster- [2] S. Aghabozorgi, A. S. Shirkhorshidi, and T. Y. Wah, “Time-
series clustering—A decade review,” Inf. Syst., vol. 53, pp. 16–38,
ing with an SWMD. The contributions of this paper can be Oct./Nov. 2015.
summarized as follows. [3] G. He et al., “Early classification on multivariate time series,”
1) A new VPCA method is developed for the dimension- Neurocomputing, vol. 149, pp. 777–787, Feb. 2015.
[4] P. D’Urso, L. De Giovanni, and R. Massari, “GARCH-based robust
ality reduction of MTS samples. fuzzy clustering of time series,” Fuzzy Sets Syst., vol. 305, pp. 1–28,
2) An SWMD is devised by integrating the spatial dimen- Dec. 2016.
sionality difference of elements of a matrix pair into the [5] A. Antonucci, R. De Rosa, A. Giusti, and F. Cuzzolin, “Robust classifi-
Euclidean distance. cation of multivariate time series by imprecise hidden Markov models,”
Int. J. Approx. Reason., vol. 56, pp. 249–263, Jan. 2015.
3) A new SWMDFC algorithm is developed for the clus- [6] H. He and Y. Tan, “A two-stage genetic algorithm for automatic
tering of MTS matrix data. clustering,” Neurocomputing, vol. 81, no. 1, pp. 49–59, 2012.
Authorized licensed use limited to: NANKAI UNIVERSITY. Downloaded on September 25,2024 at 09:28:07 UTC from IEEE Xplore. Restrictions apply.
HE AND TAN: UNSUPERVISED CLASSIFICATION OF MTS USING VPCA AND FUZZY CLUSTERING WITH SWMD 1105
[7] J. Mei, M. Liu, Y.-F. Wang, and H. Gao, “Learning a Mahalanobis [33] P. D’Urso, “Fuzzy C-means clustering models for multivariate time-
distance-based dynamic time warping measure for multivariate varying data: Different approaches,” Int. J. Uncertainty Fuzziness Knowl.
time series classification,” IEEE Trans. Cybern., vol. 46, no. 6, Based Syst., vol. 12, no. 3, pp. 287–326, 2004.
pp. 1363–1374, Jun. 2016. [34] X. Wang and K. Smith, “Characteristic-based clustering for time series
[8] A. Singhal and D. Seborg, “Clustering multivariate time-series data,” J. data,” Data Min. Knowl. Disc., vol. 13, no. 3, pp. 335–364, 2006.
Chemometrics, vol. 19, pp. 427–438, Jan. 2006. [35] M. Lichman, UCI Machine Learning Repository, School Inf. Comput.
[9] A. M. S. Ferreira, C. H. de Oliveira Fontes, C. A. M. T. Cavalcante, Sci., Univ. California, Irvine, CA, USA, 2013.
and J. E. S. Marambio, “Pattern recognition as a tool to support decision [36] R. Xu and D. C. Wunsch, II, “Clustering algorithms in biomedical
making in the management of the electric sector. Part II: A new method research: A review,” IEEE Rev. Biomed. Eng., vol. 3, pp. 120–154,
based on clustering of multivariate time series,” Elect. Power Energy Oct. 2010.
Syst., vol. 67, pp. 613–626, May 2015. [37] H. He and Y. Tan, “Pattern clustering of hysteresis time series with
[10] A. Singhal and D. E. Seborg, “Clustering of multivariate time-series multivalued mapping using tensor decomposition,” IEEE Trans. Syst.,
data,” in Proc. Amer. Control Conf., vol. 5, 2002, pp. 3931–3936. Man, Cybern., Syst., vol. 48, no. 6, pp. 993–1004, Jun. 2018.
[11] C. H. Fontes and O. Pereira, “Pattern recognition in multivariate time [38] E. Hullermeier, M. Rifqi, S. Henzgen, and R. Senge, “Comparing fuzzy
series—A case study applied to fault detection in a gas turbine,” Eng. partitions: A generalization of the Rand index and related measures,”
Appl. Artif. Intell., vol. 49, pp. 10–18, Mar. 2016. IEEE Trans. Fuzzy Syst. vol. 20, no. 3, pp. 546–556, Jun. 2012.
[12] J. F. Barragan, C. H. Fontes, and M. Embiruçu, “A wavelet-based clus- [39] P. D’Urso, C. Cappelli, L. De Giovanni, and R. Massari, “Autoregressive
tering of multivariate time series using a multiscale SPCA approach,” metric-based trimmed fuzzy clustering with an application to PM10 time
Comput. Ind. Eng., vol. 95, pp. 144–155, May 2016. series,” Chemometrics Intell. Lab. Syst., vol. 161, pp. 15–26, Feb. 2017.
[13] K. Yang and C. Shahabi, “An efficient k nearest neighbor search for [40] P. D’Urso, L. De Giovanni, and R. Massari, “Robust fuzzy cluster-
multivariate time series,” Inf. Comput., vol. 205, no. 1, pp. 65–98, 2007. ing of multivariate time trajectories,” Int. J. Approx. Reason., vol. 99,
[14] J. Zhu, B. Wang, and B. Wu, “Social network users clustering based pp. 12–38, Aug. 2018.
on multivariate time series of emotional behavior,” J. China Univ. Posts
Telecommun., vol. 21, no. 2, pp. 21–31, 2014.
[15] H. Yoon, K. Yang, and C. Shahabi, “Feature subset selection and feature
ranking for multivariate time series,” IEEE Trans. Knowl. Data Eng.,
vol. 17, no. 9, pp. 1186–1198, Sep. 2005.
[16] H. Li, “Accurate and efficient classification based on common princi-
pal components analysis for multivariate time series,” Neurocomputing,
vol. 171, pp. 744–753, Jan. 2016.
[17] J. Yang, D. Zhang, A. F. Frangi, and J. Yang, “Two-dimensional PCA: A
new approach to appearance-based face representation and recognition,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 1, pp. 131–137,
Jan. 2004.
[18] Q. Gao, L. Ma, Y. Liu, X. Gao, and F. Nie, “Angle 2DPCA: A Hong He (M’15) received the Ph.D. degree in
new formulation for 2DPCA,” IEEE Trans. Cybern., vol. 48, no. 5, control science and engineering from the East
pp. 1672–1678, May 2018. China University of Science and Technology,
[19] P. D’Urso, L. D. Giovanni, E. A. Maharaj, and R. Massari, “Wavelet- Shanghai, China, in 2008.
based self-organizing maps for classifying multivariate time series,” J. She was a Visiting Professor with the University
Chemometrics, vol. 28, no. 1, pp. 28–51, 2014. of Edinburgh, Edinburgh, U.K., from 2013 to 2014,
[20] S. M. Barbosa, S. Gouveia, M. G. Scotto, and A. M. Alonso, “Wavelet- and with the University of California at San Diego,
based clustering of sea level records,” Math. Geosci., vol. 48, no. 2, San Diego, CA, USA, in 2008. She is currently an
pp. 149–162, 2016. Associate Professor with the College of Information,
[21] P. D’Urso and E. A. Maharaj, “Wavelets-based clustering of multivariate Mechanical and Electrical Engineering, Shanghai
time series,” Fuzzy Sets Syst., vol. 193, no. 9, pp. 33–61, 2012. Normal University, Shanghai. She has published
[22] P. D’Urso and E. A. Maharaj, “Autocorrelation-based fuzzy clustering over 40 papers in peer-reviewed journals and referenced conferences. She
of time series,” Fuzzy Sets Syst., vol. 160, no. 24, pp. 3565–3589, 2009. also holds eight patents. Her current research interests include computational
[23] J. A. Vilar, B. Lafuente-Rego, and P. D’Urso, “Quantile autocovariances: intelligence and its application in pattern recognition, signal processing, and
A powerful tool for hard and soft partitional clustering of time series,” system modeling.
Fuzzy Sets Syst., vol. 340, pp. 38–72, Jun. 2018.
[24] M. Disegna, P. D’Urso, and F. Durante, “Copula-based fuzzy clustering
of spatial time series,” Spatial Stat., vol. 21, pp. 209–225, Aug. 2017.
[25] P. D’Urso, L. De Giovanni, R. Massari, and D. Di Lallo, “Noise fuzzy
clustering of time series by the autoregressive metric,” Metron, vol. 71,
no. 3, pp. 217–243, 2013.
[26] P. D’Urso, L. De Giovanni, and R. Massari, “Time series clustering
by a robust autoregressive metric with application to air pollution,”
Chemometrics Intell. Lab. Syst., vol. 141, no. 15, pp. 107–124, 2015.
[27] P. D’Urso, “Fuzzy clustering for data time arrays with inlier and outlier
time trajectories,” IEEE Trans. Fuzzy Syst., vol. 13, no. 5, pp. 583–604,
Oct. 2005. Yonghong Tan (M’03) received the Ph.D. degree in
[28] R. Coppi, P. D’Urso, and P. Giordani, “A fuzzy clustering model for mul- electrical engineering from the University of Ghent,
tivariate spatial time series,” J. Classification, vol. 27, no. 1, pp. 54–88, Ghent, Belgium, in 1996.
2010. He was a Postdoctoral Fellow with Simon
[29] L. Wang, Y. Zhang, and J. Feng, “On the Euclidean distance of images,” Fraser University, Vancouver, BC, Canada. He was
IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 8, pp. 1334–1339, a Visiting Professor with Colorado State University,
Aug. 2005. Fort Collins, CO, USA; Concordia University,
[30] H. He and Y. Tan, “Automatic pattern recognition of ECG signals using Montreal, QC, Canada; the Shibaura Institute of
entropy-based adaptive dimension reduction and clustering,” Appl. Soft Technology, Tokyo, Japan; and the University of
Comput., vol. 55, pp. 238–252, Jun. 2017. Windsor, Windsor, ON, Canada. He held profes-
[31] H. He, Y. Tan, and J. Huang, “Unsupervised classification of smartphone sorships with the Guilin University of Electronic
activities signals using wavelet packet transform and half-cosine fuzzy Technology, Guilin, China, and the University of Electronic Science and
clustering,” in Proc. IEEE Conf. Fuzzy Syst. (Fuzz-IEEE), Naples, Italy, Technology of China, Chengdu, China. He is currently a Professor with
Jul. 2017, pp. 1–6. the College of Information, Mechanical and Electrical Engineering, Shanghai
[32] R. Coppi and P. D’Urso, “Fuzzy unsupervised classification of multivari- Normal University, Shanghai, China. His current research interests include
ate time trajectories with the Shannon entropy regularization,” Comput. modeling and control of nonlinear systems, mechatronics, intelligent control,
Stat. Data Anal., vol. 50, no. 6, pp. 1452–1477, 2006. and signal processing.
Authorized licensed use limited to: NANKAI UNIVERSITY. Downloaded on September 25,2024 at 09:28:07 UTC from IEEE Xplore. Restrictions apply.