A_Survey_on_Tensor_Techniques_and_Applications_in_Machine_Learning
A_Survey_on_Tensor_Techniques_and_Applications_in_Machine_Learning
ABSTRACT This survey gives a comprehensive overview of tensor techniques and applications in machine
learning. Tensor represents higher order statistics. Nowadays, many applications based on machine learning
algorithms require a large amount of structured high-dimensional input data. As the set of data increases,
the complexity of these algorithms increases exponentially with the increase of vector size. Some scientists
found that using tensors instead of the original input vectors can effectively solve these high-dimensional
problems. This survey introduces the basic knowledge of tensor, including tensor operations, tensor decom-
position, some tensor-based algorithms, and some applications of tensor in machine learning and deep
learning for those who are interested in learning tensors. The tensor decomposition is highlighted because it
can effectively extract structural features of data and many algorithms and applications are based on tensor
decomposition. The organizational framework of this paper is as follows. In part one, we introduce some
tensor basic operations, including tensor decomposition. In part two, applications of tensor in machine learn-
ing and deep learning, including regression, supervised classification, data preprocessing, and unsupervised
classification based on low rank tensor approximation algorithms are introduced detailly. Finally, we briefly
discuss urgent challenges, opportunities and prospects for tensor.
INDEX TERMS Machine learning, tensor decomposition, higher order statistics, data preprocessing,
classification.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
162950 VOLUME 7, 2019
Y. Ji et al.: Survey on Tensor Techniques and Applications in Machine Learning
of the sum of products of factor tensor or factor vector. decomposition, the Tensor train decomposition and Higher-
Tensor network decomposes the high-dimensional tensor into order singular value decomposition (also known as higher-
sparse factor matrices and low-order core tensor, which we order tensor decomposition) in Chapter C. In Chapter D,
call factors or blocks. In this way, we set the compression (that we give a detailed description of tensor train decomposition
is, distributed) representation of large-size data, enhancing and the related algorithms. In Chapter E, i.e., the last section
the advantage of interpretation and calculation. of the first part, we summarize the advantages and disadvan-
Tensor decomposition is regarded as a sub-tensor network tages of these decompositions and applications. In part two,
in this survey. That is to say, the decomposition of tensor we mainly describe tensor application algorithms in machine
can be used in the same way as the tensor network. We can learning and deep learning. In Chapter A, we introduce
divide the data into related and irrelevant parts by using the application of structured tensor in data preprocessing
tensor decomposition. High-dimensional big data can be including tensor completion and tensor dictionary learning.
compressed several times without breaking data correlation In Chapter B of this part, we introduce some applications of
by using tensor decomposition (tensor network). Moreover, tensor in classification, including algorithm innovation and
tensor decomposition can be used to reduce unknown param- data innovation. Then, we illustrate the application of tensor
eters, and then the exact solution can be obtained by alternate in regression, including tensor regression and multivariate
iterative algorithms. tensor regression, in Chapter C. In the last of part two,
We provide a general block diagram of the survey (see we explain the background of the tensor network and discuss
figure 1). The survey consists of two parts. In part one, its advantages, shortcomings, opportunities and challenges in
we first give the basic definition and notations of tensor detail.
in Chapter A. Then we introduce the basic operation of
tensor, and the block diagram of the network structure of II. PART ONE: TENSOR AND TENSOR OPERATION
tensor in Chapter B. Next, we begin to describe tensor A. TENSOR NOTATIONS
decomposition, including several famous decompositions A tensor can be seen as a generalization of multidimensional
such as the CP (regularization) decomposition, the Tucker arrays. For example, a scalar quantity can be considered as a
0-order tensor, a vector can be treated as a first-order tensor, such as pentagons or hexagons) to represent tensor, and the
and a matrix can be regarded as a second-order tensor. And a outgoing line of the node represents the index of a particular
third-order tensor looks like a cuboid (see figure 2). dimension (see figure 5 and figure 6). The Nth-order tensor
can be expressed in a similar way.
illustrate tensor slice and tensor fiber. 3) THE MODE-n PRODUCT OF A TENSOR AND A MATRIX
1 2 5 6
C= , (1) C = A ×nm B (5)
3 4 7 8
This is a 3rd-order tensor C ∈ R2×2×2 . For tensor where in ×nm , m means matrix, n means mode-n,
slices(matrices), we can get two matrices C(1, :, :) and A ∈ RI1 ×I2 ×I3 ×···IN means the Nth-order tensor, B ∈
C(2, :, :) when we fix the first dimension: 13 24 , 57 68 , which RJ ×In means the matrix. They yield a tensor C ∈
we usually call them horizontal slices. If we fix the second RI1 ×···×In−1 ×J ×In+1 ×···×IN with entries ci1 ,··· ,in−1 ,j,in+1 ,··· ,iN =
dimension, we can get another two matrices C(:, 1, :) and PIn
C(:, 2, :): 15 26 , 37 48 , which we usually call them lateral ai1 ,··· ,in−1 ,in ,in+1 ,··· ,iN bj,in .
slices. If we fix the third dimension, we in =1
can
stillget another
two matrices C(:, :, 1) and C(:, :, 2): 15 37 , 26 48 , which we
4) THE MODE-(a,b) PRODUCT(TENSOR CONTRACTION) OF
usually call them frontal slices.
A TENSOR AND ANOTHER TENSOR
For tensor fibers(vectors), we can get four vectors
C(1, 1, :), C(1, 2, :), C(2, 1, :), C(2, 2, :) when we fix the first
and the second indices: [ 1 2 ], [ 3 4 ], [ 5 6 ], [ 7 8 ]. If we fix the C = A ×(a,b) B (6)
first and the third indices, we can get another four vectors
C(1, :, 1), C(1, :, 2), C(2, :, 1), C(2, :, 2): [ 1 3 ], [ 2 4 ], [ 5 7 ], where A ∈ RI1 ×I2 ×I3 ×···IN means the Nth-order tensor, B ∈
[ 6 8 ]. If we fix the second and the third indices, we can RJ1 ×J2 ×J3 ×···JM means another tensor and here we should note
get another four vectors C(:, 1, 1), C(:, 1, 2), C(:, 2, 1), that Ia = Jb (a ∈ [1, N ], b ∈ [1, M ]). They yield a tensor
C(:, 2, 2): [ 1 5 ], [ 2 6 ], [ 3 7 ], [ 4 8 ]. C ∈ RI1 ×···×Ia−1 ×Ia+1 ×···IN ×J1 ×···Jb−1 ×Jb+1 ···×JM with entries
Ia
P
ci1 ,··· ,ia−1 ,ia+1 ,··· ,iN ,j1 ,··· ,jb−1 ,jb+1 ,··· ,jM = ai1 ,··· ,ia ,··· ,iN
B. TENSOR OPERATION ia =1
In this chapter, we begin to discuss some basic tensor bj1 ,··· ,jb−1 ,ia ,jb+1 ,··· ,jM . Note that it is also called tensor contrac-
operations. Tensor operations are similar to traditional linear tion because the dimension of the new tensor is the sum of the
algebras, but are richer and more meaningful than them. dimensions of the original two tensors minus the dimension
The same operation will also be applied to the following of the same size. We draw a picture to show tensor contrac-
tensor decomposition in Chapter C. In order to get a clearer tion(see figure 9). For convenience, when two tensors have
description of the formulas, we will give examples and graph- same size of one dimension, the () in the above formula is
ical instructions. Thirteen tensor calculation formulas will be usually omitted, that is C = A ×a,b B.
given.
N We first give the definition ofJ
the following operations:
means the Kronecker product, means the Khatri-Rao
product, ◦ means the outer product, ×n means the mode-n
product. Next, we introduce a few commonly used formulas.
C =A+B (2)
where A ∈ RI1 ×I2 ×I3 ×···IN , B ∈ RI1 ×I2 ×I3 ×···IN , C ∈
RI1 ×I2 ×I3 ×···IN , ci1 ,··· ,iN = ai1 ,··· ,iN + bi1 ,··· ,iN .
6) THE CONJUGATE TRANSPOSE OF A 3RD-ORDER TENSOR 10) THE MODE-n MATRICIZATION AND VECTORIZATION
The conjugate transpose of a 3rd-order tensor X ∈ RI1 ×I2 ×I3 OF THE TENSOR
is a tensor X ∗ ∈ RI2 ×I1 ×I3 obtained by conjugate transpos- In the previous section, we introduced the concepts of ten-
ing each of the frontal slices(fix the third order X (:, :, i3 )) sor slice and tensor fiber. Here we present two similar but
and then reversing the order of transposed frontal slices 2 different concepts, matricization and vectorization. Tensor
through I3 . We give a simple example to show(see formula 8): slice and tensor fiber take some specific elements of the
tensor to form a matrix or a vector, while matricization and
1 3 5 7
C = , vectorization are to matricize or vectorize all the elements.
2 4 6 8 We now give a formal definition.
1 3 2 4 The mode-n matricization of a tensor, Y ∈
C∗ = , (8)
5 7 6 8 RI1 ×I2 ×I3 ×···IN , is as follows:
7) THE OUTER PRODUCT OF A TENSOR AND ANOTHER mat(Y )n = mat(Y )n = Y mn ∈ RIn ×I1 ···In−1 In+1 ···IN
TENSOR Where for the matrix element (in j),
XN k−1
Y
C =A◦B (9) j=1+ (ik − 1)Jk with Jk = Im
k=1,k6 =n m=1,m6 =n
where A ∈ RI1 ×I2 ×I3 ×···IN and B ∈ RJ1 ×J2 ×J3 ×···JM .
(14)
They yield an (N+M)th-order tensor C with entries
ci1 ,··· ,iN ,j1 ,··· ,jM = ai1 ,··· ,iN bj1 ,··· ,jM . The mode-n vectorization of a tensor, Y ∈ RI1 ×I2 ×I3 ×···IN,
is as follows:
8) THE (RIGHT)KRONECKER PRODUCT OF TWO TENSORS
vec(Y )n = Y vn ∈ RIn I1 ···In−1 In+1 ···IN (15)
C = A ⊗R B (10) For the mode-n vectorization, we first perform mode-n matri-
where in ⊗R , R means right, A ∈ RI1 ×I2 ×I3 ×···IN and B ∈ cization and then stack the matrix in columns. Of course,
RJ1 ×J2 ×J3 ×···JN . They yield a tensor C ∈ RJ1 I1 ×···×JN IN with vectors and matrices can also be transformed into tensor.
entries ci1 j1 ,··· ,iN jN = ai1 ,··· ,iN bj1 ,··· ,jN , where iN jN = jN + We give examples of mode-1 matricization and vectorization
of a 3rd-order tensor(see formula 16).
(iN − 1)JN is called multi-indices. Note that for Kroneker
product, two tensors must have the same dimension. They 1
must not carry out the Kronecker product of the matrix and 5
the 3rd-order tensor, and must have a 3rd-order tensor and 2
another 3rd-order tensor. A simple example of a second-order 1 3 5 7 1 2 3 4 6
C= , ⇔ ⇔
matrix is provided, as follows: 2 4 6 8 5 6 7 8 3
7
1 2 5 6
4
C = ⊗R
3 4 7 8 8
1×5 1×6 2×5 2×6 (16)
1 × 7 1 × 8 2 × 7 2 × 8
= 3 × 5 3 × 6 4 × 5 4 × 6
(11)
11) THE TENSOR QUANTITATIVE PRODUCT
3×7 3×8 4×7 4×8
J1
X JN
X
In fact, c=A•B= ··· aj1 ,··· ,jN bj1 ,··· ,jN (17)
j1 =1 jN =1
C = A ⊗R B = B ⊗L A (12)
where A ∈ RJ1 ×···×JN , B ∈ RJ1 ×···×JN . Note that the require-
The right-most equation is called the left Kronecker product. ments for tensor quantitative product are too strict. Not only
the dimension of the two tensors should be the same, but also
9) THE RIGHT KHATRI-RAO PRODUCT OF MATRICES the size of the two tensors has to be the same. In this way, we
can further define the Frobenius norm of tensor.
C =A RB
kAkF = (A • A)1/2 (18)
= [a1 ⊗R b1 , a2 ⊗R b2 , · · · , aK ⊗R bK ] ∈ RIJ ×K (13)
product, similar to the quantitative product, the two tensors vectors and matrices (just change the dimension to 1 or 2 in
must have the same dimension and the same size. the formulas). Many researchers have also defined some new
operations, such as the strong Kronecker product(de Launey
13) THE TENSOR TRACE and Seberry [140]; Phan et al. [8]) and the mode-n Khatri-Rao
Similar to trace of the matrix, tensor also has a trace. product of tensors (Ballard et al.) [33]. Based on the Kroneker
(Gu, 2009) [162] proposed the concept of tensor trace. Let’s product, these two operations are just grouped into blocks to
first look at the concept of inner indices. If a tensor has the perform the Kroneker product operation.
same size for several dimensions, those same size dimensions This chapter mainly introduces basic calculation formulas
are called inner indices. For example, a tensor X ∈ RA×B×A commonly used by tensor. If you want to know more about
has two inner indices. Modes 1 and 3 are both size A. Then, many other formulas, please refer to (Kolda and Bader) [127].
we define the following concept of tensor trace:
R
C. TENSOR DECOMPOSITION
This chapter begins to discuss the knowledge of tensor
X
x = Trace(X ) = X (r, :, r) (20)
r=1
decomposition, which is similar but different from matrix
R decomposition. Tensor decomposition aims to reduce the
X
x = [x1 , x2 , · · · , xB ]T , xi = X (r, i, r) (21) computational complexity while ensuring the data structure,
r=1 so as to better deal with the data. Tensor decomposition
x = [tr(X1 ), · · · , tr(XB )]T , Xi ∈ RR×R (22) technology has been gradually used in data analysis and
processing. This chapter will focus on five main types of
Let’s give an example of the 3rd-order tensor that we have decomposition, i.e., the Canonical Polyadic(CP) decompo-
used before. sition, the Tucker decomposition, the MultiLinear Singular
1 2
5 6
Value(the higher-order SVD or HOSVD) decomposition,
C = , (23) the Hierarchical Tucker(HT) decomposition and the tensor-
3 4 7 8
train(TT) decomposition, respectively.
c = Trace(C) = [1 + 6, 3 + 8]T = [7, 11]T (24)
1 2 3 4
C1 = , C2 = (25) 1) THE CANONICAL POLYADIC(CP) DECOMPOSITION
5 6 7 8 Before introducing CP decomposition, we first introduce the
bidirectional component analysis, i.e., the constrained low-
14) THE TENSOR CONVOLUTION
rank matrix factorization.
Tensor also has convolution, which is similar to matrix con-
volution. For two Nth-order tensors A ∈ RI1 ×I2 ×I3 ×···IN and R
X
B ∈ RJ1 ×J2 ×J3 ×···JN . Their tensor convolution is as follows: C = 3ABT + E = λr ar bTr + E (27)
r=1
C =A∗B (26)
where 3 = diag(λ1 , · · · , λr ) is an diagonal matrix.
C ∈ R(I1 +J1 −1)×(I2 +J2 −1)×···×(IN +JN −1) , with entries C ∈ RI ×J is a known matrix (for example, known input data,
J1 P
P J2 JN
P etc.). E ∈ RI ×J is a noise matrix. A = [a1 , · · · , aR ] ∈
ck1 ,k2 ,··· ,kN = ··· bj1 ,··· ,jn ak1 −j1 ,··· ,kn −jn For a
j1 =1 j2 =1 jN =1 RI ×R , B = [b1 , · · · , bR ] ∈ RJ ×R are two unknown factor
simple and intuitive display, we use matrix convolution to matrices with ar ∈ RI , br ∈ RJ , r ∈ [1, R]. In fact, if the
illustrate (see figure 10). noise matrix is very small, it can be ignored and the upper
expression can be written as C ≈ 3ABT .
In fact, based on low rank matrix decomposition,
(Hitchcock [31]; Harshman, 1970 [110]) proposed the CP
decomposition of tensor. Before introducing the definition of
CP decomposition, we give the definition of rank-1 tensor.
If a tensor can be represented as follows:
Y = b1 ◦ b2 ◦ · · · bN (28)
FIGURE 10. A schematic diagram of the results of matrix convolution, where Y ∈ RI1 ×I2 ×I3 ×···IN , bn ∈ RIn , yi1 ,··· ,iN = b1i1 · · · bN
iN ,
with C11 = 0 × 1 = 0, C12 = 1 × 1 + 0 × 2 = 1, C22 = 1 × 2 + 2 ×
1 + 1 × 1 + 0 × 0 = 5, · · · .
then we call the tensor rank-1 tensor. In CP decomposition,
tensor is decomposed into the linear sum of these vectors.
CP decomposition is defined as follows:
15) SHORT SUMMARY R
X
The formulas for tensor operations described above are rela- Y≈ λr b1r ◦ b2r ◦ · · · bN
r = 3 ×1m B1 ×2m B2 · · · ×Nm BN
tively basic ones. Because tensor can be seen as a generaliza- r=1
tion of matrices and vectors, the above formulas also apply to (29)
Similar to the constrained low-rank matrix factorization that example: the factor matrices can be iterative updated as
we have just described, where λr = 3r,r,r,··· ,r , r ∈ [1, R]
are entries of the diagonal core tensor 3 ∈ RR×R×R×···R . Bn = Y mn [(BN R · · · Bn+1 R Bn−1 · · · B1 )T ]† (31)
Bn = [bn1 , bn2 , · · · , bnR ] ∈ RIn ×R are factor matrices. With the where Y mn represents the mode-n matricization of tensor Y ,
help of other formulas, CP decomposition has a lot of other † means the Moore-Penrose pseudo-inverse of the matrix.
similar expressions, among which we give two commonly We give an algorithm for the 4th-order tensor CP decompo-
used equations. Considering a special case, when all factor sition (see Algorithm 1).
matrices are the same, we call the CP decomposition a sym-
metric tensor decomposition, then Y ∈ RI ×I ×I ···I . Figure 11 Algorithm 1 The CP Decomposition Algorithm of a
shows CP decomposition of a 3rd-order tensor(see figure 11). 4th-Order Tensor
Input:
The 4th-order tensor Y ∈ RI ×J ×K ×L
Output:
Factor matrices A,B,C,D and the core tensor 3
1: Initialize A,B,C,D and CP rank R, where R ≤
min{IJ , JK , IK };
2: while the iteration threshold does not reach or the algo-
rithm has not converged do
3: A = Y m1 [(D R C R B)T ]† ;
4: Normalize column vectors of A to unit vector;
5: B = Y m2 [(D R C R A)T ]† ;
FIGURE 11. CP decomposition of a 3rd-order tensor, 6: Normalize column vectors of B to unit vector;
Y ≈ 3 ×1m B1 ×2m B2 ×3m B3 [3].
7: C = Y m3 [(D R B R A)T ]† ;
8: Normalize column vectors of C to unit vector;
CP Rank: Similar to matrix, tensor also has a rank. Since 9: D = Y m4 [(C R B R A)T ]† ;
it is a CP decomposition at this time, we call it CP rank. CP 10: Normalize column vectors of D to unit vector;
rank refers to the smallest R for which the CP decomposition 11: Save the value of the norms of the R column vectors
in the above formula holds exactly. We use rcp (Y ) to represent in the factor matrix C to the core tensor 3;
the CP rank. 12: end while
In practice, unlike traditional matrix decomposition, tensor 13: return Factor matrices A,B,C,D and the core tensor 3
usually have interference (such as noise or even data loss).
Therefore, it is usually difficult to find the exact solution
From the above algorithm, we can see that the key
of CP decomposition, so most of them are approximate
to calculate CP decomposition is to calculate Khatri-
solutions.
Rao product and the pseudo inverse of the matrices.
So the question comes, that how do we get tensor approx-
(Choi and Vishwanathan [63]; Karlsson et al. [79]) proposed
imate CP decomposition, or in other words, that how can we
the least-squares solution method of CP decomposition and
get the core tensor? The general approach is to first find the
the detailed derivation process can be referenced by them.
factor matrix Bn by minimizing an appropriate loss function.
(A.Vorobyov, 2005) [116] presents a loss function similar to
2) THE TUCKER DECOMPOSITION
the least square method.
The Tucker decomposition was first proposed by
J (B1 , · · · , BN ) = kY − 3 ×1m B1 · · · ×Nm BN k2F (30) (Tucker) [81], so it was named Tucker decomposition. Similar
to the CP decomposition, the Tucker decomposition also
Our goal is to minimize the loss function in the upper form, divides tensor into small size of core tensor and factor
and we use the alternating least square method, which means matrices, but what we need to pay attention to is that the core
iterative optimization by fixing the value of a variable other tensor here is not necessarily the diagonal tensor. We define
than one. That is to say, one of those N factor matrices the Tucker decomposition as follows:
Bn , is optimized separately at a time, keep the values of R1
X RN
X
other N-1 factor matrices unchanged (we first initialize all Y ≈ ··· ar1 r2 ···rN b1r1 ◦ b2r2 ◦ · · · bN
rN
N factor matrices, and optimize only B1 by gradient descent r1 =1 rN =1
while keep the initial values of B2 to BN unchanged). This = A ×1m B1 ×2m B2 · · · ×Nm BN
becomes a single variable loss function optimization prob- Y v1 = (BN ⊗R BN −1 · · · ⊗R B1 )Av1 (32)
lem. Then it continues to iterate until the iteration threshold
is reached or the algorithm has converged. The derivation where ar1 r2 ···rN are entries of the small size core tensor
is not given here. We give the results directly, and take the A ∈ RR1 ×R2 ···RN , Bn = [bn1 , bn2 , · · · , bnRn ] ∈ RIn ×Rn are factor
4th-order tensor as an example to write the following matrices, Y v1 is the mode-1 vectorization of the tensor Y ,
FIGURE 14. Because only B3 is the identity matrix, the graph is the
Tucker-1 decomposition model.
Z = X ⊗Y
FIGURE 15. Schematic diagram of HT decomposition of 5th-order tensor,
= (AX ⊗ AY ) ×1m (B1 ⊗ C1 ) · · · ×Nm (BN ⊗ CN ) (37) in which the core tensor is split into two small-size 3rd-order tensors
A12 , A345 , and the right core tensor is split into the factor matrix B3 and
2. The Hadamard product of the two tensors (the same sizes the 3rd-order core tensor of smaller size A45 . Finally, A12 and A45
continue to be decomposed into the last four factor matrices
and order): B1 , B2 , B3 , B4 . The diagram on the right is the HT tensor network
structure diagram with the core tensor A12345 in the original left image
Z = X ~ Y = (AX ⊗L AY )×1m (B1 C1 ) · · · ×1m (BN CN ) replaced by a connecting line.
(38)
z = X • Y = (vec(X )1 )T vec(Y )1
= (vec(AX )1 )T ⊗L ((B1 )T C1 ) ⊗L ((B2 )T C2 ) · · · ⊗L
((BN )T CN )vec(AY )1 (39)
In fact, the core idea is to replace the core tensor with smaller
dimension of tensors until the original tensor is decomposed
into factor matrices. Finally, the original tensor is decom-
posed into a case where several 3rd-order tensors and sev-
eral factor matrices are connected to each other. Here we
introduce the HT decomposition of the 5th-order and the
6th-order tensor. The higher order tensor HT decomposition
of the tensor network diagram can be drawn with a similar
example and for more details please refer to (Tobler [22];
Kressner et al. [23]).
After Tucker decomposition, although the size of the core
tensor is reduced, the dimension of the core tensor is still the
same as before. When the original tensor dimension is very
large (for example, greater than 10), we usually express it FIGURE 19. The truncated HOSVD decomposition of a 3rd-order
with the distributed tensor network similar to the HT decom- tensor [127].
position. That is, the dimension of core tensor is not limited
to the 3rd order. According to the actually need, it can be
4th or 5th order (see figure 18). In fact, the orthogonal constraints of tensors and the
constraints of matrix SVD decomposition are very simi-
lar. Similar to the truncated SVD decomposition of the
matrix, the tensor also has a truncated HOSVD decomposi-
tion (see figure 19).
The first step in finding the solution of HOSVD decom-
position is to first perform the mode-n matricization of the
original input tensor and then use a truncated or randomized
SVD to find the factor matrices(see equation 157)
X mn = Un Sn VnT = [Un1 , Un2 ][Sn1 , 0][Vn1
T
, Vn2
T
] (44)
When the factor matrix is obtained, the core tensor can be
decomposed using the following formula:
FIGURE 18. The blue rectangles represent the core tensors and the red
circles represent the factor matrices. The diagram on the left is an A = X ×1m BT1 ×2m BT2 · · · ×Nm BTN (45)
18th-order tensor HT decomposition tensor network diagram, in which
the 4th-order small size core tensors are connected to each other. The where X ∈ RI1 ×I2 ···IN
is the input tensor, A ∈ RR1 ×R2 ···RN
diagram on the right is a 20th-order tensor HT decomposition tensor
network diagram, in which the 5th-order small size core tensors are
is the core tensor, and Bn ∈ RIn ×Rn are the fac-
connected to each other. tor matrices. See Algorithm 2 for details and refer to
(Vannieuwenhoven et al. [101]; Halko et al. [96]).
After performing the mode-n matricization of the tensor, Compared with the truncated SVD decomposition of the
if the tensor size is too large, we can also obtain the factor standard matrix, the tensor HOSVD decomposition does not
matrices by matrix partitioning, as follows: produce the best multiple linear rank, but only the weak linear
rank approximation (De Lathauwer et al. ) [73]:
X mn = [X1n , X2n , · · · , XMn ] √
T
= Un Sn [V1n , V2n T
, · · · , VMn
T
] (46) kX −A×1m B1 ×2m B2 · · ·×Nm BN k ≤ N kX − b X Prefect k (47)
where we divide the resulting matrix (called the unfolded where b X Prefect is the best approximation for X .
matrix) X mn into M parts. Then we use the eigenvalue decom- In order to find an accurate approximation of Tucker
M decomposition, researchers have extended the alternating
position X mn X Tmn = Un (Sn )2 UnT = T , U =
P
Xmn Xmn n least squares method to the higher-order orthogonal iterations
m=1
(Jeon et al. [56]; Austin et al. [138];Constantine et al. [103];
[Un1 , Un2 ], Bn = Un1 . And we can get Vmn = Xmn
T U (S )−1 .
n n
De Lathauwer et al. [74]). For details, please refer to
Thus, computational complexity and computational memory
Algorithm 4.
will be decreased and the efficiency will be improved to
some extent by matrix partitioning. At the same time, it also
Algorithm 4 The Higher-Order Orthogonal Iterations
alleviates the curse of dimension problem.
(Austin et al. [138]; De Lathauwer et al. [74])
Some researchers proposed a random SVD decomposi-
Input:
tion algorithm for matrices with large size and low rank.
The Nth-order input tensor X ∈ RI1 ×I2 ···IN decomposed
(Halko et al.) [96] reduced the original input matrix to a
by Tucker.
small size matrix by random sketching, i.e., by multiplying
Output:
a random sampling matrix (see Algorithm 3).
the core tensor A and factor orthogonal matrices
Bn ,BTn Bn = IRn
Algorithm 3 The Random SVD Decomposition Algorithm
1: Initialize all parameters via the Truncated HOSVD by
for Large-Size and Low Rank Matrices (Halko et al.) [96]
Algorithm 2;
Input: 2: while the cost function kX − A ×1m B1 · · · ×Nm BN k2F
The large-size and low rank matrix X ∈ RI ×J , estimated does not reach convergence do
rank R, oversampling parameter P, overestimated rank 3: for n=1 to N do
R = R + P, exponent of the power method q (q=0
b
4: Y ← X ×(p6=n)m (BTp );
or 1)
5: Z ← Y mn (Y mn )T ∈ RR×R ;
Output:
6: Bn ← leading Rn eigenvectors of Z;
the SVD of X, orthogonal matrix U ∈ RI ×R , diagonal
b
7: end for
matrix S ∈ RR×R and V ∈ RJ ×R
b b b
8: A ← Y ×Nm (BTN );
1: Initialize a random Gaussian matrix W ∈ RJ ×R ;
b
9: end while
2: Calculate sample matrix Y = (XX T )q XW ∈ RI ×R ;
b
10: return the core tensor A and factor matrices Bn
3: Compute the QR decomposition of the sample matrix
Y = QR;
4: Calculate the small-size matrix A = QT X ∈ RR×J ;
b When the size of the original tensor is too large(too many
5: Compute the SVD of the small-size matrix A = U b SV T ; elements), it will result in insufficient memory, and finally
6: Calculate the orthogonal matrix U = QU ;
b the computational complexity may also increase. In this case,
7: return orthogonal matrices U ∈ RI ×R , diagonal matrix
b the operation can be simplified in the form of a matrix prod-
S ∈ RR×R and V ∈ RJ ×R
b b b uct. Simply put, the mode-n product of the tensor and the
matrix is converted into the product of the general matrix to
simplify the operation and reduce the memory (see figure 20).
The advantage of using the overestimated rank of the
For the large size tensor, another way to simplify the
matrix is that it can achieve a more accurate approximation
operation is to use the blocking method. It simply divides
of the matrix. (Chen et al.) [129] improved the approxima-
the original tensor and the factor matrix into blocks, and then
tion of SVD decomposition by integrating multiple random
performs the mode-n product between the small size matrix
sketches, that is, multiplying the input matrix X by a set of
and the small size tensor (see figure 21).
random Gaussian matrices. (Halko et al.) [96] used a special
As seen from the figure 21, we divide the input tensor X
sampling matrix to greatly reduce the execution time of the
into small pieces X (x1 ,x2 ,··· ,xN ) . Similarly, we divide the factor
algorithm while reducing complexity. However, for a matrix
matrix BTn into B(xn ,bn ) . The tensor An remained by the mode-
with a slow singular value decay, this method will result in a
n product of the matrix and the tensor is equal to:
lower accuracy of SVD.
Many researchers developed a variety of different Xn
X
algorithms to solve the HOSVD decomposition. For An(x1 ,x2 ,··· ,bn ,··· ,xN ) = X (x1 ,x2 ,··· ,bn ,··· ,xN ) ×nm (Bn (xn , bn ))T
details, please refer to (Vannieuwenhoven et al. [101]; xn =1
Austin et al. [138]; Constantine et al. [103]). (48)
FIGURE 24. The schematic diagram of TCA is similar to MCA. It is noted that R1 , R2 and R3 are selected appropriately, and then
four new tensors A, B, C , D are formed. The rightmost is the equivalent Tucker decomposition diagram. For detailed derivation,
please refer to formula 52.
FIGURE 25. A simple tensor network diagram of TCA and Tucker decomposition. Here is a schematic
diagram of the conversion of TCA and Tucker decomposition [3].
2. (Caiafa and Cichocki) [17] proposed a Fiber Sampling For an Nth-order tensor, the formula for FSTD is as follows
Tucker Decomposition that operates directly on the input (Caiafa and Cichocki, 2015) [17]:
matrix, but with the premise that it is based on low rank
X = A ×1m B1 ×2m B2 × · · · ×Nm BN
Tucker decomposition. Since tensor usually has a good low-
rank Tucker decomposition, FSTD algorithm is often used. = W ×1m C 1m1 ×2m C 2m2 · · · ×Nm C N
mN (53)
Figure 24 and 25 shows the TCA by FSTD algorithm. For a 3rd-order tensor, the four cross tensors of the above
We can see from figure 24 and 25 that the FSTD algorithm FSTD(W , B, C, D) can be obtained by random projection
first finds a suitable cross tensor from the original input ten- (see formula 51), as follows:
sor, and then changes the size of the core tensor. Specifically
by the formula: W = X ×1m B1 ×2m B2 ×3m B3 ∈ RR1 ×R2 ×R3
B = X ×2m B2 ×3m B3 ∈ RI1 ×R2 ×R3
X = A ×1m B1 ×2m B2 ×3m B3
C = X ×1m B1 ×3m B3 ∈ RR1 ×I2 ×R3
= W ×1m Bm1 ×2m C m2 ×3m Dm3 (52)
D = X ×1m B1 ×2m B2 ∈ RR1 ×R2 ×I3 (54)
where the first equation is the standard Tucker decomposi-
where Bn ∈ RRn ×In are the projection matrices.
tion. In the second equation, where Bm1 ∈ RI1 ×R2 R3 , C m2 ∈
† †
RI2 ×R1 R3 , Dm3 ∈ RI3 ×R1 R2 , W = A ×1m Am1 ×2m Am2 ×3m 7) THE TENSOR TRAIN AND TENSOR CHAIN
† † †
Am3 ∈ RR2 R3 ×R1 R3 ×R1 R2 . Note that Bm1 Am1 = B1 , C m2 Am2 = DECOMPOSITION
†
B2 , Dm3 Am3 = B3 . The above is for the 3rd-order tensor, and CP decomposition is a special case of Tucker decomposi-
when the dimension becomes 2(the matrix), it is easy to see tion. The core tensor of Tucker decomposition is further
that the TCA degenerates into MCA. decomposed into hierarchical tree structure and becomes
FIGURE 26. The TT and TC decomposition for a large size vector. Figure (a) first reorganizes the vector into a
suitable Nth-order tensor, Y ∈ R I1 ×I2 ···×IN ← y ∈ R I , I = I1 I2 · · · IN , and then TT and TC decomposition are
performed on the Nth-order tensor. Figure (a) is TT decomposition, and Figure (b) is TC decomposition. Please
refer to formula 55 for the TT decomposition of Nth-order tensor.
HT decomposition. The Tensor Chain(TC) decomposition In figure 26 and figure 27, we first transform the large
is a special case of HT decomposition. The core tensor is size vector and matrix into the Nth-order and 2Nth-order
in series and aligned, i.e., every core tensor has the same small size tensor, respectively. Then we decompose them by
dimension, and at the same time, all the factor matrices TT or TC. We can see that the only difference between TT
are unit matrices. The advantage of having the same form decomposition and TC decomposition is that TC decomposi-
of core tensor and unit matrix is that it can significantly tion connects the first core tensor and the last core tensor with
reduce the amount of computation, facilitate subsequent opti- a single line RN .
mization, and so on. The Tensor Train(TT) decomposition Then we give a concrete mathematical expression of TT
is also a special case of HT decomposition. (Oseledet [60] decomposition of an Nth-order tensor Y ∈ RI1 ×I2 ×I3 ×···IN .
and Oseledet and Tyrtyshnikov [61]) first put forward the
concept of TT decomposition. The only difference between Y = A1 ×3,1 A2 · · · ×3,1 AN (55)
TT decomposition and TC decomposition is that the dimen- where An ∈ RRn−1 ×In ×Rn , R0 = RN = 0, n = 1, 2, · · · , N
sion of the first and the Nth core tensor is one less than
R1 ,R2 ,··· ,RN −1
the dimension of the intermediate N-2 core tensors in TT X
decomposition. In different domains, TT decomposition has yi1 ,i2 ,··· ,iN = a11,i1 ,r1 a2r1 ,i2 ,r2
r1 ,r2 ,··· ,rN −1 =1
different names. Generally speaking, in the field of physics,
−1
when we refer to the Tensor Chain(TC) decomposition as · · · aN N
rN −2 ,iN −1 ,rN −1 arN −1 ,iN ,1 (56)
the Matrix Product State (MPS) decomposition with periodic
where yi1 ,i2 ,··· ,iN and anrn−1 ,in ,rn are entries of Y and An ,
boundary conditions(PBC), we also refer to the TT decompo-
respectively.
sition as the Matrix Product State (MPS) decomposition with
the Open Boundary Conditions. Before we give the concrete R1 ,R2 ,··· ,RN −1
,rN −1 rN −1 ,1
◦ ar21 ,r2 ◦ · · · ◦ aNN−1
r
a1,r
X
1 −2
expression, we draw a picture to give an intuitive explanation Y = 1 aN
of the TT decomposition and the TC decomposition (see r1 ,r2 ,··· ,rN −1 =1
figure 26 and figure 27). (57)
FIGURE 27. The TT and TC decomposition for a large size matrix. Figure (a) first reorganizes the matrix into a suitable
2Nth-order tensor, Y ∈ R I1 ×J1 ···×IN ×JN ← Y ∈ R I×J , I = I1 I2 · · · IN , J = J1 J2 · · · IN , and then TT and TC decomposition are
performed on the 2Nth-order tensor. Figure (a) is TT decomposition, and Figure (b) is TC decomposition. Please refer to
formula 58 for the TT decomposition of 2Nth-order tensor.
r ,r
where ann−1 n = An (rn−1 , :, rn ) ∈ RIn are tensor Similarly, the 3rd-order large-size tensor or higher-order
fiber(vectors). large-size tensor can be decomposed by TT in a similar
The above three formulas are TT decomposition for- way (by decomposing them into 3Nth-order or higher
mula corresponding to the large-size vector decomposed into tensor.)
Nth-order tensors (that is, figure 26). Similar to the TT Here we no longer give the mathematical expression of
decomposition for the Nth-order tensor, the TT decomposi- the TC decomposition, because there is almost no differ-
tion for the 2Nth-order tensor (see figure 27) is as follows: ence between the TT decomposition and TC decomposition
(mainly the first and last core tensors have a dimension with
Y = A1 ×4,1 A2 · · · ×4,1 AN (58)
a size of Rn ).
where An ∈ RRn−1 ×In ×Jn ×Rn , R = R = 0, n = 1, 2, · · · , N
0 N Here we give three common methods. The first is the
R1 ,R2 ,··· ,RN −1
X product form between the core tensor contractions, the second
yi1 ,i2 ,··· ,iN = a11,i1 ,j1 ,r1 a2r1 ,i2 ,j2 ,r2 is the expression between the scalars and the third is the
r1 ,r2 ,··· ,rN −1 =1 outer product of tensor slice or the outer product of tensor
−1 fiber. There are some other mathematical expressions for
· · · aN N
rN −2 ,iN −1 ,jN −1 ,rN −1 arN −1 ,iN ,jN ,1 (59)
other uses, such as, the TT decomposition can be calculated
where n
yi1 ,i2 ,··· ,iN and arn−1 ,in ,jn ,rn are entries of Y and An , by performing the mode-n matricization of the core tensor
respectively. and then we can use the strong Kronecker product or tensor
R1 ,R2 ,··· ,RN −1 slices to calculate. Those who are interested can refer to
,rN −1 rN −1 ,1
A11,r1 ◦ Ar21 ,r2 ◦ · · · ◦ ANN−1
r
X
Y = −2
AN (Cichocki et al.) [3].
r1 ,r2 ,··· ,rN −1 =1 Similar to the CP rank,we define the TT rank.
(60)
r ,r rTT (Y ) = (R1 , R2 , · · · , RN −1 ),
where Ann−1 n= An (rn−1 , :, :, rn ) ∈ RIn ×Jn are tensor
slice(matrices). Rn = rank(Y mcn ) = r(Y mcn ) (61)
D. THE NATURE AND ALGORITHM OF TT Algorithm 5 The Quantitative Product of Two Tensors
DECOMPOSITION Expressed in the Form of TT Decomposition
1) BASIC OPERATIONS IN TT DECOMPOSITION Input:
If large-size tensors are given in the form of TT decom- The two Nth-order tensors X = X 1 ×3,1 X 2 · · · ×3,1
position, then many calculations can be performed on the X N ∈ RI1 ×I2 ×I3 ×···IN , Y = Y 1 ×3,1 Y 2 · · · ×3,1
small-size core tensors. By performing operations on small- Y N ∈ RI1 ×I2 ×I3 ×···IN , where X n ∈ RRn−1 ×In ×Rn , Y n ∈
size core tensors, the unknown parameters can be reduced RQn−1 ×In ×Qn ,R0 = Q0 = RN = QN = 1.
effectively, and the operations can be simplified to achieve Output:
the effect of the optimization algorithm. the quantitative product of the two tensors
Consider two Nth-order tensors in TT decomposition: Initialize A0 = 1;
for n=1 to N do
X = X 1 ×3,1 X 2 · · · ×3,1 X N ∈ RI1 ×I2 ×I3 ×···IN (Z n )m1 = An−1 (Y n )m1 ∈ RQn−1 ×Qn ;
Y = Y 1 ×3,1 Y 2 · · · ×3,1 Y N ∈ RI1 ×I2 ×I3 ×···IN (64) An = ((X n )mc2 )T (Z n )mc2 ∈ RRn ×Qn ;
end for
where the core tensors X n ∈ RRn−1 ×In ×Rn , Y n ∈ RQn−1 ×In ×Qn
return AN = X • Y ∈ R
and their TT ranks are rTT (X ) = (R1 , · · · , RN −1 ) and
rTT (Y ) = (Q1 , · · · , QN −1 ), respectively. Note that the size
and dimension of two tensors are the same. Their operations
have the following properties:
1. the Hadamard product of two tensors:
Z = X ~ Y = Z 1 ×3,1 Z 2 · · · ×3,1 Z N (65)
We can use the tensor slice to represent the core tensor Z.
Z (in n ) = X (in n ) ⊗L Y n(in ) , n = 1, · · · , N , in = 1, · · · , In FIGURE 30. The multiplication of large-size matrix and vector.
Ax ≈ y , A ∈ R I×J , X ∈ R J1 ×J2 ···×JN ← x ∈ R J , J = J1 J2 · · · JN , Y ∈
(66) R I1 ×I2 ···×IN ← y ∈ R I , I = I1 I2 · · · IN [3].
(i )
where Z n ∈ RRn−1 Qn−1 ×In ×Rn Qn is the core tensor and Z n n ∈
(i ) (i )
RRn−1 Qn−1 ×Rn Qn , X n n ∈ RRn−1 ×Rn , Y n n ∈ RQn−1 ×Qn is the
tensor slice (fix the second dimension in to get). are decomposed in TT. We give an intuitive picture to show
2. the sum of two tensors: it(see figure 30).
As we can see from the figure 30, An ∈ RAn−1 ×In ×Jn ×An ,
Z =X +Y (67) X n ∈ RRn−1 ×Jn ×Rn , Y n ∈ RQn−1 ×In ×Qn . If starting from the
where its TT rank rTT (Z ) = rTT (X ) + rTT (Y ) = (R1 + Q1 , form of the outer product of the TT decomposition, it is as
R2 + Q2 , · · · , RN + QN ), similar to the previous one, we can follows:
A1 ,A2 ,··· ,AN −1
still use tensor slice to represent Z. ,aN −1 aN −1 ,1
◦ Aa21 ,a2 ◦ · · · ◦ ANN−1
a
A1,a
X
1 −2
" # A= 1 AN
(i )
Xnn 0 a1 ,a2 ,··· ,aN −1 =1
(i ) , n = 2, 3, 4, · · · , N − 1
(in )
Zn = (68)
0 Y nn R1 ,R2 ,··· ,RN −1
,rN −1 rN −1 ,1
x11,r1 ◦ x2r1 ,r2 ◦ · · · ◦ xNN−1
r
X
−2
Note that the tensor slices of the first and last core tensors are X = xN
r1 ,r2 ,··· ,rN −1 =1
as follows:
" # R1 ,R2 ,··· ,RN −1
(i ) ,rN −1 rN −1 ,1
XnN ◦ yr21 ,r2 ◦ · · · ◦ yNN−1
r
y1,r
h i X
1 −2
Yn , Zn =
(1) (1) (1) (iN ) Y = yN
Zn = Xn (i ) (69) 1
Y nN r1 ,r2 ,··· ,rN −1 =1
2) TT DECOMPOSITION SOLUTION
FIGURE 32. SVD-based TT algorithm (TT-SVD) [40] for a 4th-order tensor
The solution of TT decomposition is similar to the solution X ∈ R I1 ×I2 ×I3 ×I4 . First,we perform the mode-n matricization of the
of the truncated HOSVD algorithm mentioned above (see tensor X , here we perform the mode-1 matricization for convenience.
Then we perform the SVD decomposition and execute algorithm 6 step by
algorithm 2), and the following constraints need to be met: step.
N
X −1 In
X
Y kl2 )2 ≤
(kY − b (σk (Y mcn ))2 (74)
n=1 k=Rn +1
FIGURE 34. Restricted Tucker-1 decomposition(RT1D) [10] for a 4th-order tensor X ∈ R I1 ×I2 ×I3 ×I4 and a 5th-order tensor
X ∈ R I1 ×I2 ×I3 ×I4 ×I5 . Similar to TT-SVD and LRMD, we first convert the original tensor into a new 3rd-order tensor, next
perform the Tucker-1 decomposition, and then follow the algorithm 8 step by step. On the left is a schematic diagram of
the 4th-order tensor and on the right is a schematic diagram of the 5th-order tensor.
It is noted that the algorithm 9 actually performs the we need from the correlation. At the same time, the biggest
Nth cannonical matricization of the core tensor and then feature of tensor decomposition is that the increase of dimen-
performs a low rank matrix approximation (SVD and QR). sion will lead to the non-uniqueness of decomposition. So we
We noticed that in the process of calculating the low rank usually want to get an approximate solution of it instead of an
matrix decomposition, the size of matrix will become smaller exact solution, so that don’t waste too much computation time
and smaller because of continuous iterative optimization, and can get a good approximation of the original data.
so the complexity will be continuously reduced in the process Due to the limited space of this survey, there are some new
of performing decomposition. By TT Truncation, TT rank tensor decompositions that are not covered in detail in this
can be reduced to the utmost extent and the correspond- survey, such as t-svd(Zhang and Aeron) [165], tensor ring
ing approximate tensor can be found, which greatly reduces decomposition(Zhao et al.) [109]. The above introduction
the computational complexity and improves the efficiency is several important tensor decompositions in this survey,
for future data processing, mathematical operations, and so and has important applications in part two. At the same
on. Of course, some researchers have developed a similar time, some of these decomposition algorithms have their own
method for the HT decomposition. For details, please refer to advantages or limitations.
(Kressner and Tobler) [25]. For CP decomposition, due to its particularity, if a certain
constraint condition is imposed on the factor matrices or core
tensor, an accurate solution can be obtained. The constraint
E. BRIEF SUMMARY FOR PART ONE is mainly determined according to the required environment.
Part one mainly introduced the basic knowledge about tensor, The advantage is that it can extract the structured infor-
including the definition of tensor, the operation of tensor, and mation of the data, which helps better extract and process
the concept of tensor decomposition. As a new technique, ten- the required data, and improves the accuracy of the appli-
sor decomposition can reduce the computational complexity cation in the future. For the Tucker decomposition, since
and memory by decomposing the tensor into lower-order ten- the decomposition is general, the solution is usually more,
sors, matrices, and vectors. At the same time, it can preserve so it is usually considered to impose a constraint term,
the data structure, effectively reduce the dimension, avoid the such as the orthogonal constraint we mentioned above. Then
curse of dimension problems, and extract the important parts the Tucker decomposition becomes HOSVD decomposition.
5: end for
6: for n=N to 2 do
7: [U n , 3n , VnT ] = truncated − svd(Y nmc1 , a),
RP
n−1
8: find the smallest rank b Rn−1 such that αr2 ≤
i>b
Rn−1
RP
n−1
a2 kαk1 = a2 ( | αi |)2 ;
i=1
n−1 n−1 b n b n
Replace b
Y mc2 = b Y mc2 U 3 ∈ RRn−2 In−1 ×Rn−1 and
b b
9:
n Rn−1 ×Inb
bnT ∈ Rb Rn ;
Y mc1 = V
b
n n
10: Reshape Y = Y mc2 .reshape([b
b b Rn−1 , In , b
Rn ]);
11: end for
1 2
12: return Approximate tensor b Y = b Y ×3,1 b Y · · · ×3,1
N n
Y ∈ RI1 ×I2 ×I3 ×···IN , where b Y ∈ RRn−1 ×In ×Rn , b R0 =
b b b
RN = 1
b
TT decomposition, etc. For example, (Zhou et al.) [55] pro- The nonlinear function of the above formula can be mod-
posed the rank-1 and CP decomposition. Then the formula eled by a Gaussian process, as follows:
becomes:
f (X ) ∼ GP(m(X ), k(X , e
X )|θ) (81)
1 2 N T
y = w ◦ w ◦ ···w •X +b+a c
where m(X ) is the mean function, k(X , e
X ) is the kernel func-
y = 3 ×1m W1 ×2m W2 · · · ×Nm WN • X + b + aT c (78)
tion and θ is the associated hyperparameter. For the sake of
simplicity, we use the standard Gaussian process m(X ) = 0.
Tensor regression of the Tucker decomposition form is
For the kernel function, we use the product probability kernel:
similar. For details, please refer to (Hoff et al. [102];
Yu et al. [113]). The general tensor regression is attributed N X X
e
Y x|n )]
D[p(x|n )||q(e
to solving the following minimization problem: k(X , e
X ) = α2 exp( ) (82)
−2βn2
n=1
N
where α represents the amplitude parameter and β repre-
X
L(a, b, W ) = arg min yi − yi )2 ,
(b i = 1, · · · , N (79)
a,b,W
P px
i=1 sents the scale parameter, D(p||q) = x=1 p(x)log q(x) =
R px X
xp(x)log q(x) dx means the KL divergence, p(x|n )
whereb yi = W • X i + b + aT c represents the predicted value
means the Gaussian distribution of vector variable x =
corresponding to the ith tensor sample, X i represents the ith
[x1 , ·P
· · , xId ], the mean vector and the covariance matrix are
tensor sample, and yi represents the true value of the ith tensor n
µn , , respectively. Note that the mean vector and the
sample.
covariance matrix are determined from the mode-n matri-
We give the following general algorithm for tensor regres-
cization X mn of X by treating each X mn as a probability model
sion (see algorithm 10).
with In number of variables and I1 ×· · ·×In−1 ×In+1 · · ·×IN
number of observations.
2) TENSOR VARIABLE GAUSSIAN PROCESS REGRESSION
When we have determined the parameters from the training
Tensor variable Gaussian process regression is similar to what set, the purpose of the tensor Gaussian process regression is
we have introduced in the previous section. The same thing is to infer the probability distribution of the output for the test
that the input X i ∈ RI1 ×I2 ···IN is still an Nth-order tensor and point X test , i.e.:
the output yi is a scalar, and the difference is that the input here
is subject to Gaussian distribution. (Hou et al.) [90] assumed p(ytest |X test , X, y, θ, σ 2 ) (83)
that the output consists of a nonlinear function with respect
to input X and Gaussian noise i ∼ N (0, σ 2 ), as follows: where X = [X 1 , X 2 , · · · , X N ]T ∈ RN ×I1 ×I2 ···IN means com-
bining all sample tensors, and y = [y1 , y2 , · · · , yN ]T ∈ RN .
yi = f (X i ) + i i = 1, · · · , N (80) But actually we only need to know the distribution of f (X test )
Algorithm 10 Tensor Regression Algorithm (Hoff) [102] used the residual mean squared error to mea-
(Zhou et al.) [55] sure the error between the true value and the prediction value:
Input: PN
N Nth-order sample data tensors X i ∈ RI1 ×I2 ···IN , i = ||Y i − AX i BT ||2F
(A, B) = arg min i=1 (87)
1, · · · , N and its true value yi , a vector-valued covariate A,B n
c.;
Output: By deriving the above formula, we finally get:
a, b, W ; X X
N A=( Y i B(X i )T )( X i BT B(X i )T )−1
yi − yi )2 ;
P
1: Initialize W = 0, solve (a, b) = mina,b,W (b X X
i=1 B = ( (Y i )T AX i )( (X i )T AT AX i )−1 (88)
2: Initialize the factor matrices Wn for n = 1, · · · , N and
core tensor 3 for CP decomposition or initialize the Similarly, we can get A and B respectively by alternating
factor vectosr for rank-1 decomposition, other decompo- least squares.
sition is similar; We further extend to generalized tensor regression as
3: while the number of iterations is not reached or there is follows:
no convergence do
4: for n=1 to N do Y i = X i ×1m W1 ×2m W2 · · · ×nm WN + E (89)
5: solve Wn = minWn L(a, b, 3, W1 , · · · , Wn−1 ,
where Wn ∈ RJn ×In are coefficient matrices (factor matrices)
Wn+1 , · · · , WN );
and X i ∈ RI1 ×I2 ···×IN , Y i ∈ RJ1 ×J2 ···×JN are input and output
6: end for
tensors, respectively. E ∈ RJ1 ×J2 ···×JN is a Noise tensor.
7: solve 3 = min3 L(a, b, 3, W1 , · · · , WN );
N
Note that there is a property between the mode-n product
yi − yi )2 ; and the Kronecker product, as follows:
P
8: (a, b) = mina,b,W (b
i=1
9: end while Z = X ×1m W1 ×2m W2 · · · ×nm WN
Z mn = Wn X mn (WN ⊗R · · · ⊗R Wn+1
⊗R Wn−1 · · · W1 )T
according to the expression. So it finally turns to solve the
following expression: Z v1 = (WN ⊗R · · · ⊗R W1 )X v1 (90)
p(f (X test )|X test , X, y, θ, σ 2 ) (84) Therefore, we only need to adopt the mode-n matricization
on both sides of the formula 89 to get the solution:
Here we omit the complicated calculations and give the
results directly. It is noted that the test samples are also subject Y mn = Wn e
X mn + E mn (91)
to the Gaussian distribution, and the probability properties of
the distribution is accorded to Bayesian conditions. We get: where mat(e X )n = mat(X )n (WN ⊗R · · · ⊗ Wn+1 ⊗R
Wn−1 · · · W1 )T . Then through formula 88 we finally get:
p(f (X test )|X test , X, y, θ, σ 2 ) ∼ N (µtest , σtest
2
) (85)
i
X X i i
Wn = ( X mn )T ) (
Y imn (e X mn )T )−1 (92)
X mn (e
e
where µtest = k(X test , X)T (K + σ 2 I )−1 y and σtest2 =
k(X , X ) − k(X , X) (K + σ I ) k(X , X).
test test test T 2 −1 test
Finally, we give the specific algorithm of the whole gener-
Tensor variable Gaussian process regression is generally alized tensor regression (see algorithm 11).
used to deal with noise-bearing and Gaussian-distributed
data. It has certain limitations, and this method is compu-
tationally expensive. Without using tensor decomposition, Algorithm 11 Generalized Tensor Regression (Hoff) [102]
the amount of parameter data is very large. Thus, the amount Input:
of calculation will also increase exponentially. N Nth-order sample data tensors X i ∈ RI1 ×I2 ···IN , i =
1, · · · , N and output tensor Y i ∈ RJ1 ×J2 ,;
3) GENERALIZED TENSOR REGRESSION Output:
Now we introduce a more general case where both input Wn , n = 1, · · · , N ;
and output are tensors. We start with a simple second-order 1: Initialize W n as random matrices;
matrix. A second-order matrix regression is as follows: 2: while the number of iterations is not reached or there is
no convergence do
Y i = AX i BT + E (86) 3: for n=1 to N do
where X i ∈ RI1 ×I2 , Y i ∈ RJ1 ×J2 , i = 1, · · · , N are N input 4: Calculate Wn by formula 92;
sample matrices and corresponding output sample matrices. 5: end for
A ∈ RJ1 ×I1 and B ∈ RJ2 ×I2 are unknown coefficient matrices. 6: return Wn ;
7: end while
E ∈ RJ1 ×J2 is a noise matrix with mean-zero.
N +1
size B1 × B2 × · · · × BN , where Bn = (In )In −1−1 , C is an As can be seen from figure 37, the purpose of the SVM is
N 2 th-order tensor with size (I1 + 1) × · · · × (I1 + 1) × (I2 + to find a hyperplane wT x + b = 0, x = [x1 , x2 , · · · , xm ] to
1) × · · · × (I2 + 1) · · · × (IN + 1) · · · × (IN + 1), and V (xn ) distinguish between the two classes. We give the two types
is the Vandermonde vector of xn : of labels +1 and −1 respectively. Where the distance from a
point x to the hyperplane in the sample space is:
T
xn T (xn ⊗ xn )T · · · (xn ⊗ · · · ⊗ xn )T |wT x + b|
V (xn ) = 1
d= (105)
(104) kwk
As shown in figure 37, the point closest to the hyperplane
This model can be generalized to multidimensional ten- is called the support vector, and the sum of the distances
sors. Similar to the scalar form of multivariate polynomial of the two heterogeneous support vectors to the hyperplane
regression, the multivariate polynomial regression in the form is:
of vector (tensor) also has an exponential rise in complexity 2
γ = (106)
as the variable n increases. Similarly, we can reduce the coef- kwk
ficient tensor from N 2 th-order to Nth-order. We can also use
CP decomposition, Tucker decomposition, or TT decompo-
sition to get a truncated model. Please refer to (Stoudenmire
andSchwab,2016 [30]; Cohen andShashua, 2016 [94]) for
details.
m
Later researchers (Zhao et al.) [144] converted the above where F(W , b) 1 T
P
= 2 tr(W W ) + C max(0, 1 −
constraints into the following formula: j=1
yj [tr(W T Xj ) + b]), G(S) = λkW k∗ . Due to the complexity
kwk2 γ of the SMM solution, please refer to (Luo et al.) [80] for
min + ξTξ
w,b,ξj 2 2 details.
s.t. yj (wT xj C b) = 1 − ξj , ξj ≥ 0, j = 1, 2 · · · , M .
c: THE SUPPORT TENSOR MACHINE(STM)
(110)
If we further extend the matrix to tensor, we will get the
where ξ = [ξ1 , ξ2 , · · · , ξM ] ∈ RM . Note that formula 110 Support Tensor Machine(STM). In general, STM currently
has two major differences compared to formula 108. 1: in have five constraint expressions, we first give the original
order to facilitate the calculation, the above constraint is constraint expression:
changed from inequality to equality. 2: the loss function in M
formula 110 is the mean square loss. The benefit of this kW k2 X
max +C ξj
modification is that the solution will be easier. Generally, w,b,ξj 2
j=1
the solution is developed by Lagrangian multiplier method. s.t. yj (W • X j + b) ≥ 1 − ξj ξj ≥ 0, j = 1, 2 · · · , M .
We do not repeated derivation here. For details, please refer to
(114)
(Corts and Vapnik) [18].
Here we usually choose to decompose the coefficient
b: THE SUPPORT MATRIX MACHINE(SMM) tensor W , and the researchers give four solutions in total.
If we extend the input sample from vector to second- (Tao et al.) [27] proposed to decompose the coefficient
order tensor (matrix), we will get the Support Matrix tensor into the form of the rank-one vector outer prod-
Machine(SMM). (Luo) [80] proposed the concept of the uct, i.e., W = w1 ◦ w2 ◦ · · · wN (see formula 28).
Support Matrix Machine. We consider a matrix sample Xa ∈ (Kotsia et al.) [58] performed CP decomposition on the coef-
RI ×J , a = 1, 2, · · · , m. The hinge loss function are replaced R
λr w1 ◦ w2 ◦ · · · wN (see for-
P
in SMM. The following constraint formula is obtained: ficient tensor, i.e., W =
r=1
mula 29). (Kotsia and Patras) [59] performed Tucker decom-
M
1 X position on the coefficient tensor, i.e., W = A ×1m W1 ×2m
min tr(W T W ) + C max(0, 1 − yj [tr(W T Xj ) + b])
W ,b,ξj 2 W2 · · · ×Nm WN (see formula 108). (Wang et al.) [155] per-
j=1
formed TT decomposition on the coefficient tensor, i.e., W =
+ λkW k∗
W 1 ×3,1 W 2 · · · ×3,1 W N (see formula 55). Substituting these
s.t. yj [tr(W T Xj ) + b] ≥ 1 − max(0, 1 − yj [tr(W T Xj ) + b]). three decompositions will result in three forms of STM.
(111) In general, the solution of STM is similar to the solution of
CP decomposition. The central idea is based on the alternat-
where kW k∗ (we usually call it the nuclear norm) represents ing least squares method, that is, N-1 other optimization items
the sum of all singular values of the matrix W, C and λ are fixed first, and only one item is updated at a time. For
are coefficient. In fact, we get the following properties after example, if we use the form of the rank-one decomposition
performing the mode-1 vectorization of the matrix w = for coefficient tensor, then the constraint expression becomes
vec(W T )1 . as follows (see algorithm 12):
Substituting the formula 136 into the formula 135 returns s.t. ×(i6=m)v wi ) + b) ≥ 1 − ξj ,
yj (wTm (X j
the constraint expression of the original SVM. Note that i = 1, 2 · · · , n − 1, n + 1, · · · , N .j = 1, 2 · · · , M .
in order to protect the data structure from being destroyed, (115)
we generally do not perform the mode-n vectorization of
the matrix and convert it into a traditional SVM. So we where α = k ◦N 2
i=1,i6 =m wi k .
give the optimization problem directly in the form of a Then the label of a test sample, X test , can be predicted as
matrix. According to (Goldstein et al.) [39], they further follows:
converted the above constraints into the following augmented
Lagrangian function form: y = sign(X test ×1v w1 · · · ×Nv wN + b) (116)
L(W , b, S, λ) = F(W , b) + G(S) + tr[3T (S − W )] However, the above-mentioned alternating least squares
a iteration method usually needs a lot of time and com-
+ kS − W k2F , a is hyperparameter putational memory, and only obtian a local optimal solu-
2
(113) tion. So many researchers proposed other algorithms.
expressed as follows:
E(X , y) = A • X − bT y − W • (X ◦ y) (125)
where A ∈ RI1 ×I2 ···×IN , b ∈ RJ are the biases of the visible
and hidden layers, respectively. And similarly, the hidden
layer variable y = [y1 , · · · , yJ ]T can be expressed as:
yj = σ (X • W (:, · · · , :, j) + bj ), j = 1, 2 · · · , J (126)
A major problem is that as the input tensor dimension
increases, the weight tensor elements will multiply. We usu-
ally use low rank tensor decomposition to solve the problem.
For example, if we perform CP decomposition on weight
tensors:
FIGURE 39. Schematic diagram of the energy function of the three sets of
variables. The middle is the weight tensor, the above is the variable a,
the lower is the variable b, and the right is the variable c. W ≈ 3×1m W1 ×2m W2 · · · ×Nm WN ×(N +1)m WN +1 (127)
where Wn ∈ RIn ×R , n = 1, · · · , N , WN +1 ∈ RJ ×R are factor
matrices, and 3 is the diagonal tensor. ThenQ the number of
very simple update formula based on the actual application. elements is reduced from the original J N n=1 In to R(J +
In practice, it usually takes only one sample to achieve very PN
n=1 I n + 1).
accurate results, so the updated formula is as follows:
More simply, if the weight tensor can be expressed in the
W = W + α(xy − x1 y1 ) form of a rank-one vector outer product:
a = a + α(x − x1 ) W = w1 ◦ w2 ◦ · · · wN ◦ wN +1 (128)
b = b + α(y − y1 ) (123)
where wn ∈ RIn , n = 1, · · · , N , wN +1 ∈ RJ . ThenQ the
where α ∈ [0, 1] is the learning rate, x1
is the updated value number of elements is reduced from the original J N n=1 In
to J + N
P
of the visible layer variable x obtained by the first back- n=1 I n .
propagation of the hidden layer y, and y1 is the first update Finally, we introduce a latent conditional high-order
value of the hidden layer obtained by x1 forward propagation Boltzmann machines(CHBM). (Huang et al.) [151] pro-
again. If it is k(k > 1) times, we only need to change the x1 posed latent conditional high-order Boltzmann machine for
of the above formula to xk (the value of the visible layer classification. The algorithm is similar to the high-order
variable obtained by the kth back-propagation). For details, Boltzmann machine of the three sets of variables we just
please refer to (Hinton) [35]. mentioned. However, in CHBM, input data are two N sample
If we increase the number of layers, the traditional RBM features xi ∈ RI , yi ∈ RJ , i = 1, · · · , N and z is the
will become a higher dimension, which we call High-order relationship label of xi , yi where z = [z1 , z2 ]. For each
restricted Boltzmann machines (HORBM). For example, sample, if x and y are matched, z = [1, 0], else z = [0, 1]
for three sets of variables, a ∈ RI , b ∈ RJ , c ∈ RK , the energy (‘‘one-hot’’ encoding). Then the author adds another set of
function can be represented (see figure 39): binary-valued latent variables to the hidden layer. The entire
structure is shown in figure 40. Where h denotes the intrinsic
,J ,K
IX
relationship between x and y. h and z are connected by a
E(a, b, c) = − wi,j,k ai bj ck − d T a − eT b − f T c weight matrix U. Then its energy function is as follows:
i,j,k=1
= W ×1v a ×2v b ×3v c − d T a − eT b − f T c E(x, y, h, z) = W ×1v x ×2v y ×3v h − hT U z
(124) − aT x − bT y − cT h − d T z (129)
where a ∈ RI and b ∈ RJ are two input variables, which where a, b, , c , d are the biases of x, y, h, z,
can be understood as two visible layers, c ∈ RK is a hidden respectively.
layer variable, and d, e, f correspond to the biases of three Then the value of zt , t = {1, 2} (which is also known as
variables. activation conditional probability) is :
Note that the input of the visible, hidden layer or the IJ
X
additional layer of the RBM is a vector. If the input becomes hk = p(hk |x, y) = σ ( wijk xi yj + ck )
a tensor, we call it Tensor-variate Restricted Boltzmann ij
machines (TvRBMs) (Nguyen et al.) [126]. We assume that K
X
the visible layer variable is, X ∈ RI1 ×I2 ···×IN , and the hidden zt = p(zt |x, y, h) = σ (dt + hk Ukt )
layer variable is, y ∈ RJ , so the weight tensor is W ∈ k
RI1 ×I2 ···×IN ×J . Then the energy function can be similarly k = 1, · · · , K . t = {1, 2} (130)
f = T (xi ) • A (134)
FIGURE 40. Schematic diagram of the energy function of the four sets of
variables. The middle is the weight tensor, the above is the variable x,
the lower is the variable y, the right is the hidden layer variable h, and the Example 3: Here we consider the example of a binary
far right is the label z. z and h are connected by a weight matrix U. polynomial for the sake of simplicity. Assuming f = 2+3x1 −
x2 +2x12 +4x1 x2 −2x12 x2 +7x22 . We can get n = (2, 2), v(x1 ) =
(1, x1 , x12 )T , v(x2 ) = (1, x2 , x22 )T , then according to formula
In fact, the model is a two-layer RBM. The first layer is 9 and 17, both T (x)and A are 2rd-order tensors(matrices):
ternary RBM (x, y, h), and the second layer is the traditional
1 x2 x22
binary RBM (h, z). For the 3rd-order tensor W of the first 2 −1 7
2
T (x) = x1 x1 x2 x1 x2 , A = 3 4 0
layer, we can use the CP decomposition to solve.
2 2
x1 x1 x2 x1 x2 2 2 2 −2 0
3) POLYNOMIAL CLASSIFIER ALGORITHM BASED ON (135)
TENSOR TT DECOMPOSITION
Polynomial classifiers are often used for classification Similar to the idea of SVM, polynomial classification is
because of their ability to generate complex surfaces and looking for a hyperplane to distinguish between these two
have a good fit to raw data. However, when coming to high- types of examples. Its ultimate goal is to find the coefficient
dimensional space, the multivariate polynomial can only use tensor A so that:
some specific kernels in the support vector machine, and the
kernel function should be mapped to the high-dimensional yi (T (xi ) • A) > 0, i = 1, 2 · · · , N (136)
space for processing, which increases the difficulty of data Considering the TT decomposition of the coefficient tensor
processing. In order to enable the Polynomial classifier to A, A = A1 ×3,1 A2 · · ·×3,1 Am , the above polynomial equation
handle high dimensional problems, (Chen et al.) [161] sim- (formula 134) has the following further properties:
plified the operation by using the polynomial with the tensor
inner product of the TT format, and proposed two algorithms. f = T (xi ) • A = A ×1v v(x1 )T ×2v v(x2 )T · · · ×mv v(xm )T
First we give the definition of pure-power-n polynomial: = (A1 ×2v v(x1 )T ) · · · (Am ×2v v(xm )T )
Given a vector n = (n1 , n2 , · · · , nm ), if in a polynomial f
with m variables, the highest power for each variable xi is = Aj ×1v pj (x) ×2v v(xj )T ×3v qj (x)T
ni , i = 1, 2, · · · , m, then the polynomial f is called pure- = (qj (x)T ⊗L v(xj )T ⊗L pj (x))(Aj )v1
power-n polynomial. for any j = 1, 2 · · · , m (137)
Example 1: The polynomial f = 1 + x1 + 3x23 + 2x3 +
Qj−1
4x32 − 2x2 x32 − 5x1 x2 x3 is a pure-power-n polynomial with where p1 (x) = 1, pj (x)j≥2 = T
k=1 (Ak ×2v v(xk ) ) and
n = (1, 3, 2). qm (x) = 1, qj (x)j<m =
Qm T
k=j+1 (Ak ×2v v(xk ) ), vec(Ai )2
For pure-power-n polynomial, it can be expressed equiva- means the mode-2 vectorization of the tensor (see for-
lently by the expression of the mode-n product of the vectors mula 15).
and a tensor A ∈ R(n1 +1)×(n2 +1)···(nm +1 ): Example 4: For the polynomial f in example 3, according to
f = A ×1v v(x1 )T ×2v v(x2 )T · · · ×mv v(xm )T (131) formula 137, T (x)•A = (q2 (x)T ⊗L v(x2 )T ⊗L p2 (x))vec(A2 )2 ,
let i = 2. Then we will get:
where v(xi ) are the Vandermonde vectors:
q2 (x) = 1
v(xi ) = (1, xi , xi2 , · · · , xini )T , i = 1, 2, · · · , m (132) x22
v(x2 ) = 1 x2
Example 2: For the polynomial f in example 1, since p2 (x)) = A1 ×2v v(x1 )T
n=(1,3,2), then v(x1 ) = (1, x1 )T , v(x2 ) = (1, x2 , x22 , x23 )T , v(x1 ) = 1 x1 x12
(138)
(Chen et al.) [161] proposed two loss functions, least Algorithm 13 The Improved Least Squares Method for TT
squares loss and logistic loss function: Decomposition (Chen et al.) [161]
N
Input:
1 X Loss function Jloss (A) and an initial guess for the TT
J (A) = (T (xi ) • A − yi )2
N decomposition of the Nth-order tensor A = A1 ×3,1
i=1
N
A2 · · · ×3,1 Am , An ∈ RRn−1 ×In ×Rn ;
1 X 1 + yi 1 − yi Output:
J (A) = − [ ln(gA (xi )) + ln(1 − gA (xi ))]
N 2 2 A in the TT format, A = argmin Jloss (A), A = b A1 ×3,1
i=1
(139) A2 · · · ×3,1 b
b Am ;
1: if the required number of iterations is not reached then
where the first formula is the least squares loss func- 2: for n=1 to N do
n e1 en−1 , en
tion, the second is the logical loss function, and 3: solve: e A = arg mine A J (A , · · · , A A , An+1 ,
n
FIGURE 42. Feature tensor generation example (n=8). We assume that the original image (800 × 800) is divided into
8 × 8 blocks, and each block is 100 × 100. Then we perform DCT transformation. Finally it is encoded into a feature tensor
8 × 8 × 100.
where Bn ∈ RIn ×Rn are called transfer matrices, d is a constant J (A, B) = arg min trace(BT (3iA − WAi )B)
A,B
according to the needs. In fact, we can see that this is similar to d
p p
the optimal solution to the Tucker(HOSVD) decomposition s.t. trace(BT (3A − WA )B) = (152)
2
with constraints. Bn is actually factor matrices, and X is
the core tensor in Tucker decomposition. But note that here For convenience, AT = B1 , BT = B2 ,
In ≤ Rn . N
X N X
X N
However, depending on the image itself, it can be seen as a 3iA = λiaa X Ta AAT X a , WAi = i
Wab X Ta AAT X a
matrix. (He et al.) [146]proposed the solution to the above a=1 a=1 b=1
problem. First, the mode-n matricization of the tensor is N N X N
p p p
X X
used to convert the above optimization problem equivalently. 3A = λpaa X Ta AAT X a , WA = Wab X Ta AAT X a
Note that since the input X a ∈ RR1 ×R2 , a = 1, 2 · · · , N a=1 a=1 b=1
are second-order tensors, according to the definition of the (153)
Using the idea of alternating least squares, when fixing A, Tensor can also combine various high-dimensional features
B consists of I1 generalized eigenvectors, which correspond to improve the classification accuracy. Therefore, a tensor-
to the largest eigenvalues of first I1 and satisfy the following based feature fusion technique is proposed for classification
p p
equation, (3A − WA )b = c(3iA − WAi )b. when fixing B, processing. Finally, using the tensor to separate the target
A consists of I2 generalized eigenvectors, which correspond from the background pattern, the discriminant space can be
to the largest eigenvalues of first I2 and satisfy the following effectively learned, thereby effectively detecting the target in
p p
equation, (3B − WB )a = c(3iB − WBi )a. the picture.
According to the above analysis, we present a graph
embedding algorithm based on a second-order tensor (matrix) C. APPLICATION OF TENSOR IN DATA PREPROCESSING
(see algorithm 15). 1) TENSOR DICTIONARY LEARNING
Dictionary learning refers to finding a sparse representation
Algorithm 15 2nd-Order Tensor-Based Graph Embedding of the original data while ensuring the structure and non-
Algorithm (Hu et al.) [143] distortion of the data, thereby achieving the effect of data
Input: compression and ultimately reducing computational com-
Input tensor sample set X a ∈ RR1 ×R2 , a = 1, 2 · · · , N ; plexity (see figure 44). General dictionary learning boils
Output: down to the following optimization problems:
Transfer matrices (factor matrices) AT , BT ; N N
1: Initially A, take the first I1 column of the unit matrix I ∈
X X
min ||yi − Axi ||22 +λ ||xi ||1 , i = 1, · · · , N (154)
RR1 ×R1 as the matrix A; A,xi
i=1 i=1
2: Initialize the weight coefficient Wab i and W p according
ab
to (Weiming Hu,2017); where A ∈ RJ ×I is sparse matrix, yi ∈ RJ , i = 1, · · · , N are
3: for k=1 to n do N raw data and xi ∈ RI are vectors sparsely represented.
p p
4: Calculate 3iA , WAi , 3A , WA in formula 153;
5: Calculate B by using the properties of generalized
p p
eigenvectors: (3A − WA )b = c(3iA − WAi )b;
p p
6: Calculate 3B , WB , 3B , WB in formula 153 by exchang-
i i
ing A and B;
7: Replace A by using the properties of generalized eigen-
p p
vectors: (3B − WB )a = c(3iB − WBi )a;
8: end for
9: return Transfer matrices (factor matrices) AT , BT ;
However, a more general sparse matrix is a low rank sep- (Peng et al.) [68] used the HOSVD decomposition. But he
aration structure matrix, which is the sum of KS matrices, did not use the traditional truncated SVD decomposition
as follows: algorithm (see algorithm 2). Since traditional algorithms need
I
X to initialize the approximate rank of a given tensor first and
D= BiN ⊗R BiN −1 · · · ⊗R Bi1 (157) the factor matrices, which actually requires a lot of pre-
i=1 calculation. So they proposed an adaptive algorithm to obtain
the low rank approximation of tensors.
Considering another property. Let D = B2 ⊗R B1 , for the
First they set an error parameter α ∈ [0, 1]. Then, sim-
elements in D we can reconstitute the form of the vector outer
ilar to truncated-SVD, SVD decomposition is performed
product as follows: Dr = vec(B1 )1 ◦ vec(B2 )1 . Then we can
on the mode-n matricization of the core tensor 3mk , k =
convert the equivalent of equation 157 to the following:
1, 2 · · · , N , 3mk = Uk Sk V T . Where S is a diagonal matrix
I
X with nonzero entries sjj , j = 1, · · · , K , K = rank(3mk ). The
Dr = (Bi1 )v1 ◦ (Bi2 )v1 · · · ◦ (BiN )v1 (158) optimal rank can be obtained by
i=1
K
P
So we can use this structure as a regular term. Finally, sjj
we get the optimal expression for the tensor dictionary learn- j=Rk +1
Rk = min(R0 < Rk < Ik < < α) (162)
ing as follows: Rk K
P
sjj
N j=1
1 1 X r
min ||Y − DX ||2F + λ ||Dmn ||∗ (159)
D,X 2 N where R0 is the lower bound of the predefined rank, which
n=1
prevents the rank from being too small. The detailed process
PI i i i r
where D = i=1 BN ⊗R BN −1 · · · ⊗R B1 , ||Dmn ||∗ is
is shown in algorithm 16.
r
the kernel norm of the matrix D after the mode-n matri-
cization of the tensor Dr . It is generally solved by the Algorithm 16 The Adaptive HOSVD Decomposition of the
Lagrangian multiplier method. Since the solution process is Tensor (Peng et al.) [68]
too complicated, it is omitted here. For details, please refer to Input:
(Ghassemi et al.) [88]. The Nth-order data tensor X ∈ RI1 ×I2 ···IN , error parame-
ter α ∈ [0, 1], R0
2) TENSOR COMPLETION FOR DATA PROCESSING Output:
In data processing, sometimes there are some missing values The core tensor A ∈ RR1 ×R2 ···RN and the factor matrices
in the data. There are many ways to complete the missing Bn ∈ RIn ×Rn ;
data, and the popular ones are matrix estimation and matrix 1: A0 ← X ;
completion. If the input data is tensor, then we call the ten- 2: for n=1 to N do
sor estimation and tensor completion. The tensor estimation 3: [Un , Sn , VnT ] = SVD(An−1
mn ), and then compute rank Rk
and the tensor completion are similar. They are all required by formula 90;
to solve the corresponding minimum constraint problem. 4: Select the first Rk column vector of U to assign to
However, the tensor estimation is mainly to minimize the the factor matrix Bk , Bk = Un (:, 1 : Rk ) =
mean square error between the estimated value and the orig- [u1 , u2 , · · · , uRk ];
inal value. Here we mainly introduce the tensor completion. 5: Anmn = Sn (1 : RK , 1 : RK )Vn (:, 1 : Rk )T ;
The general tensor completion aims to seek the optimal solu- 6: end for
tion of the following expression: 7: A = AN ;
8: return the core tensor A and factor matrices Bn
I k2F
min k(X − Y ) ~ e (160)
Y
where X ∈ RI1 ×I2 ×···×IN is a tensor with missing values, When the improved HOSVD decomposition algorithm is
Y ∈ RI1 ×I2 ×···×IN means the reconstruction tensor, ~ means completed, we obtain the factor matrices Bn and the low rank
the element product (see formula 19), and e I ∈ RI1 ×I2 ×···×IN approximate solution of the original tensor X by the mode-n
represents the indexes of missing values in X . The entries of product of the original input tensor and the factor matrices,
I are as follows:
e as follows:
Z i = X ×1m (B1 BT1 ) ×2m (B2 BT2 ) · · · ×im (Bi BTi ) (163)
0 if x i1 i2 ···iN is missing
eii1 i2 ···iN == (161)
1 otherwise
where i = 1, · · · , N , so we can get N low rank approximate
The first step in such problems is usually to find a low solutions of X : Z 1 , Z 2 , · · · , Z N . We take the average of these
rank approximation of the original tensor. The conventional N numbers as a best approximation of the original tensor X .
N
method uses several tensor decompositions introduced in
X ≈ N1
P
Z i.
part one, such as CP, HOSVD, TT decomposition, etc. i=1
After the previous steps, we first perform a zero- effectively use the data structure to extract effective infor-
compensation operation for the missing data of X , and we mation. At the same time, we use tensor decomposition
X . And then we get the approximate
get the fulfilled tensor b to reduce unknown parameters and the size of the original
solution of it. tensor. Finally, the original problem is transformed into a one-
N variable optimization problem by alternating least squares
1 X
D= Zi
b (164) algorithm. Tensor-based algorithms not only ensure the inter-
N relationship in the data characteristics, but also improve the
i=1
Finally, for missing values, we update it with the following accuracy.
formula:
X = X ~e
b I + D ~ (¬e
I) (165) IV. CHALLENGES AND PROSPECTS
A. CHALLENGES
where ¬ is the Boolean NOT operator (i.e., 0 ← 1,
As a new technology in recent years, tensor is gradually
1 ← 0). The entire tensor completion algorithm is shown in
applied to various fields, such as medicine, biology, computer
algorithm 17.
vision, machine learning, etc. But at the same time it also
Algorithm 17 Tensor Completion (Zisen Fang, 2018) faces many challenges.
For example, for the existing tensor-based tracking algo-
Input:
rithm, they cannot completely detect the intrinsic local geom-
The Nth-order data tensor X ∈ RI1 ×I2 ···IN with missing
etry and discriminant structure of the image block in tensor
values, eI , error parameter α ∈ [0, 1], R0 , the number of
form. As a result, they often ignore the influence of the
iterations required L
background, which will be interfered by the background area
Output:
and reduce the accuracy of target tracking.
The Nth-order tensor b X obtained after the missing value
Regardless of the classification problem of machine learn-
is completed, the core tensor A ∈ RR1 ×R2 ···RN and the
ing or deep learning, the tensor decomposition also requires
factor matrices Bn ∈ RIn ×Rn ;
more parameters. In order to improve the accuracy, a large
X 0 ← X ~e
1: b I , D0 = 0;
number of samples are needed. Without a better training
2: for n=1 to L do
algorithm, a large number of parameters will cause slow
3: Obtain the core tensor A and factor matrices Bn by
convergence or even no convergence. At the same time, how
applying algorithm 16 for b X n−1 ;
to obtain a huge amount of data is also a very important
4: for k=1 to N do
issue. Due to the limited sample, researchers often choose to
5: Z k = X ×1m (B1 BT1 ) ×2m (B2 BT2 ) · · · ×km (Bk BTk );
experiment on simulated data. After all, the simulation data
6: end for
N is different from the actual data, so the accuracy is not fully
Dn = 1
P
7: N Zk;
b guaranteed when applying the actual high-dimensional data.
k=1
For the traditional tensor decomposition introduced in part
8: X n = X ~e
b I + Dn ~ (¬e
I );
one, such as Tucker decomposition and CP decomposition,
9: end for
they all decompose the input tensor into multiple low-order
10: X ←b
b XL;
factors. However, due to some noise in illumination, occlu-
sion or practical applications, they are prone to deviations.
Thus the accuracy of the decomposition will decrease, which
3) DISCUSSION AND COMPARISON means that the robustness of these decomposition algorithms
This section focuses on two kinds of common data prepro- is relatively poor.
cessing, dimensionality reduction and data completion. For However, for data processing, some general tensor algo-
dictionary learning, we introduce a tensor model based on rithms often directly decompose the input features into mul-
Tucker decomposition, and it can be easily solved due to tiple dimensions, which excessively consider the combina-
the nature of the Tucker decomposition. At the same time, tion of these features with other useless features. So it is a
we also introduce the latest tensor completion algorithm huge challenge to accurately extract useful information in
based on improved HOSVD decomposition for the comple- the decomposition and abandon the useless combination of
tion of data missing values. features.
The last big problem is about tensor decomposition
D. BRIEF SUMMARY FOR PART TWO algorithms. When it comes to tensor decomposition, it is
Part two introduced the applications of tensor algorithms, indispensable to talk about alternating least squares, which
including data preprocessing, data classification and data alternately obtains the factor of tensor decomposition by
prediction (regression). We can see from part two that in iteratively updating the single core each time. However, these
order to solve the high-dimensional problem, more and more algorithms have a common problem, that is, the problem of
researchers have begun to develop tensor-based algorithms. initialization. In deep learning and machine learning, if the
The biggest feature of the tensor algorithm is that it can weight initialization is not appropriate, it will cause long
convergence time or even non converge. Therefore, how to calculations (Kossaifi et al.) [64]. However, whether using
effectively initialize tensor rank and the factor matrices is a tensor decomposition or tensor contraction to reduce
huge challenge. unknown parameters, the actual unknown parameters that
need to be solved is still more than ordinary problems.
B. PROSPECTS Therefore, for the required sample data, one is to use simula-
For the above problems, we proposed the following research tion data, and the other is to increase the sample parameters
directions: that are missing in reality by tensor completion. However,
1. For target detection and image tracking, can we find a there are accuracy problems in both methods, which also
tensor-based algorithm that can capture the features between affect the training of the later models and so on.
the background image and the target image? 5. Is it possible to improve low-rank tensor decomposition
In many cases, we need to extract the target image which algorithms?
we need from an image. At this time, the dynamic video We have mentioned in the last section that tensor decom-
image will be more difficult. How to better grasp the char- position algorithms face the problem of initializing factor
acteristics of the target and the background and distinguish matrices, core tensors and tensor rank. For factor matrices
them will affect the accuracy of target tracking. Therefore, and core tensors, we tend to use the usual random Gaussian
we urgently need to develop such a tensor-based tracking variables to initialize the parameters. According to the struc-
algorithm that can capture the geometric local structural rela- tural characteristics of tensor or the mode-n matricization and
tionship and discriminant relationship between background vectorization of tensor, we can add some additional priori
and target image blocks. information to the factor matrices and core tensor. So can
2. How to optimize the learning algorithm or avoid the we apply some constraints to the factor matrices and core
saddle point? tensor to better find the properties and effectively initialize
How to improve the traditional gradient descent algo- it? Or can we improve the alternating least squares algorithm
rithm or how to avoid the saddle point becomes an urgent so that it can find the original input tensor characteristics
requirement based on tensor deep learning. For deep learning, and initialize it automatically? For the tensor rank, we have
when the dimension becomes higher, the first problem we just introduced a new improved algorithm in part two tensor
think of is the increase of computational complexity and com- completion algorithm (see algorithm 17).
putation time. In general, we usually use tensor decomposi-
tion for dimensionality reduction. But some new problems are V. CONCLUSION
inevitably generated. A more common problem with gradient This survey focuses on the basics of tensors, including ten-
descent is that it tends to fall into local minima. As the dimen- sor definitions, tensor operations, tensor decomposition, and
sion rises, such problems become more widespread, and we low-rank tensor-based algorithms. At the same time, we also
still need to invent some improved algorithms to prevent the describe the application of tensor decomposition in various
network from falling into local minima. Also the saddle point fields and introduce some applications of tensor algorithms
is generated due to the high demension, which becomes a in the field of machine learning and deep learning. Finally,
non-convex problem. Therefore, the learning update algo- we discuss the opportunities and challenges of tensor.
rithms need to be improved urgently, otherwise the accuracy
cannot be improved. REFERENCES
3. Can the non-convex problem of the weight optimization [1] A. Bibi and B. Ghanem, ‘‘High order tensor formulation for convolutional
process be transformed into a convex optimization problem? sparse coding,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Venice,
As the dimension increases, in general, the objective func- Italy, Oct. 2017, pp. 1790–1798.
[2] A. Cichocki, ‘‘Tensor decompositions: A new concept in brain
tion will change to a non-convex function, which leads to the
data analysis?’’ 2013, arXiv:1305.0395. [Online]. Available:
non-convex optimization problem. Usually the non-convex https://arxiv.org/abs/1305.0395
optimization problem is difficult to solve, so we always [3] A. Cichocki, N. Lee, I. Oseledets, A.-H. Phan, Q. Zhao, and
want to find its equivalent convex optimization problem to D. P. Mandic, ‘‘Tensor networks for dimensionality reduction and large-
scale optimization: Part 1 low-rank tensor decompositions,’’ Found.
solve. Can we use effective tensor decomposition or other Trends Mach. Learn., vol. 9, nos. 4–5, pp. 249–429, 2016.
algorithms to transform non-convex objective functions into [4] A. Cichocki, D. Mandic, L. De Lathauwer, G. Zhou, Q. Zhao, C. Caiafa,
convex functions and optimize them? and H. A. Phan, ‘‘Tensor decompositions for signal processing appli-
cations: From two-way to multiway component analysis,’’ IEEE Signal
4. How to reduce the required samples and convergence Process. Mag., vol. 32, no. 2, pp. 145–163, Mar. 2015.
time while ensuring accuracy? [5] A. Cichocki, R. Zdunek, A. H. Phan, and S.-I. Amari, Nonnegative Matrix
Some researchers tried to convert original tensor problem and Tensor Factorizations: Applications to Exploratory Multi-Way Data
Analysis and Blind Source Separation. Chichester, U.K.: Wiley, 2009.
into a traditional vector problem, which not only destroys the
[6] A. Cichocki, A.-H. Phan, Q. Zhao, N. Lee, and I. Oseledets, ‘‘Tensor
original data structure, but also greatly increases the number networks for dimensionality reduction and large-scale optimization: Part
of parameters. The current method for tensor data is through 2 applications and future perspectives foundations and trends?’’ Mach.
tensor decomposition, which directly converts the data tensor Learn., vol. 9, no. 6, pp. 431–673, 2017.
[7] A. Desai, M. Ghashami, and J. M. Phillips, ‘‘Improved practical matrix
into factor matrices and the core tensor. Some researchers sketching with guarantees,’’ IEEE Trans. Knowl. Data Eng., vol. 28, no. 7,
have reduced the parameters by tensor contraction pp. 1678–1690, Jul. 2016.
[8] A.-H. Phan, A. Cichocki, P. Tichavský, D. Mandic, and K. Matsuoka, [32] F. Verstraete, V. Murg, and J. I. Cirac, ‘‘Matrix product states, projected
‘‘On revealing replicating structures in multiway data: A novel tensor entangled pair states, and variational renormalization group methods for
decomposition approach,’’ in Proc. 10th Int. Conf. LVA/ICA. Berlin, quantum spin systems,’’ Adv. Phys., vol. 57, no. 2, pp. 143–224, 2008.
Germany: Springer, Mar. 2012, pp. 297–305. [33] G. Ballard, N. Knight, and K. Rouse, ‘‘Communication lower bounds for
[9] A. H. Phan and A. Cichocki, ‘‘Extended HALS algorithm for nonnegative matricized tensor times Khatri-Rao product,’’ in Proc. IEEE Int. Parallel
Tucker decomposition and its applications for multiway analysis and Distrib. Process. Symp. (IPDPS), Vancouver, BC, Canada, May 2018,
classification,’’ Neurocomputing, vol. 74, no. 11, pp. 1956–1969, 2011. pp. 557–567.
[10] A.-H. Phan, A. Cichocki, A. Uschmajew, P. Tichavsky, G. Luta, and [34] G. Chabriel, M. Kleinsteuber, E. Moreau, H. Shen, P. Tichavsky, and
D. Mandic, ‘‘Tensor networks for latent variable analysis. Part I: A. Yeredor, ‘‘Joint matrices decompositions and blind source separation:
Algorithms for tensor train decomposition,’’ 2016, arXiv:1609.09230. A survey of methods, identification, and applications,’’ IEEE Signal
[Online]. Available: https://arxiv.org/abs/1609.09230 Process. Mag., vol. 31, no. 3, pp. 34–43, May 2014.
[11] A. Hyvärinen, ‘‘Independent component analysis: Recent advances,’’ [35] G. E. Hinton, ‘‘Training products of experts by minimizing contrastive
Philos. Trans. Roy. Soc. A, Math., Phys. Eng. Sci., vol. 371, no. 1984, divergence,’’ Neural Comput., vol. 14, no. 8, pp. 1771–1800, 2002.
2013, Art. no. 20110534. [36] G. Evenbly and G. Vidal, ‘‘Algorithms for entanglement renormal-
[12] A. Kolbeinsson, J. Kossaifi, Y. Panagakis, A. Bulat, A. Anandkumar, ization,’’ Phys. Rev. B, Condens. Matter, vol. 79, no. 14, 2009,
I. Tzoulaki, and P. Matthews, ‘‘Robust deep networks with randomized Art. no. 144108.
tensor regression layers,’’ 2019, arXiv:1902.10758. [Online]. Available: [37] G. Hu, Y. Hua, Y. Yuan, Z. Zhang, Z. Lu, S. S. Mukherjee,
https://arxiv.org/abs/1902.10758 T. M. Hospedales, N. M. Robertson, and Y. Yang, ‘‘Attribute-enhanced
[13] A. Tjandra, S. Sakti, and S. Nakamura, ‘‘Tensor decomposition for com- face recognition with neural tensor fusion networks,’’ in Proc. IEEE Int.
pressing recurrent neural network,’’ in Proc. Int. Joint Conf. Neural Netw. Conf. Comput. Vis. (ICCV), Venice, Italy, Oct. 2017, pp. 3764–3773.
(IJCNN), Rio de Janeiro, Brazil, Jul. 2018, pp. 1–8. [38] G. Lechuga, L. Le Brusquet, V. Perlbarg, L. Puybasset, D. Galanaud, and
[14] B. Jiang, F. Yang, and S. Zhang, ‘‘Tensor and its Tucker core: The invari- A. Tenenhaus, ‘‘Discriminant analysis for multiway data,’’ in Proc. Int.
ance relationships,’’ Jan. 2016, arXiv:1601.01469. [Online]. Available: Conf. Partial Least Squares Related Methods, in Springer Proceedings in
https://arxiv.org/abs/1601.01469 Mathematics and Statistics, 2015, pp. 115–126.
[15] B. Mao, Z. M. Fadlullah, F. Tang, N. Kato, O. Akashi, T. Inoue, and [39] T. Goldstein, B. Odonoghue, and S. Setzer, ‘‘Fast alternating direction
K. Mizutani, ‘‘A tensor based deep learning technique for intelligent optimization methods,’’ CAM Rep., 2012, pp. 12–35.
packet routing,’’ in Proc. IEEE Global Commun. Conf. (GLOBECOM), [40] G. Vidal, ‘‘Efficient classical simulation of slightly entangled quantum
Singapore, Dec. 2017, pp. 1–6. computations,’’ Phys. Rev. Lett., vol. 91, no. 14, 2003, Art. no. 147902.
[16] B. Khoromskij and A. Veit, ‘‘Efficient computation of highly oscillatory [41] G. Zhou and A. Cichocki, ‘‘Fast and unique Tucker decompositions via
integrals by using QTT tensor approximation,’’ Comput. Methods Appl. multiway blind source separation,’’ Bull. Polish Acad. Sci., vol. 60, no. 3,
Math., vol. 16, no. 1, pp. 145–159, 2016. pp. 389–407, 2012.
[17] C. F. Caiafa and A. Cichocki, ‘‘Stable, robust, and super fast recon- [42] G. Zhou and A. Cichocki, ‘‘Canonical polyadic decomposition based on a
struction of tensors using multi-way projections,’’ IEEE Trans. Signal single mode blind source separation,’’ IEEE Signal Process. Lett., vol. 19,
Process., vol. 63, no. 3, pp. 780–793, Feb. 2015. no. 8, pp. 523–526, Aug. 2012.
[18] C. Cortes and V. Vapnik, ‘‘Support-vector networks,’’ Mach. Learn., [43] G. Zhou, A. Cichocki, Y. Zhang, and D. P. Mandic, ‘‘Group component
vol. 20, no. 3, pp. 273–297, 1995. analysis for multiblock data: Common and individual feature extraction,’’
[19] C. M. Crainiceanu, B. S. Caffo, S. Luo, V. M. Zipunnikov, and IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 11, pp. 2426–2439,
N. M. Punjabi, ‘‘Population value decomposition, a framework for the Nov. 2016.
analysis of image populations,’’ J. Amer. Statist. Assoc., vol. 106, no. 495, [44] G. Zhou, Q. Zhao, Y. Zhang, T. Adalı, S. Xie, and A. Cichocki, ‘‘Linked
pp. 775–790, 2011. component analysis from matrices to high-order tensors: Applications to
[20] C. Lu, J. Feng, Y. Chen, W. Liu, Z. Lin, and S. Yan, ‘‘Tensor robust biomedical data,’’ Proc. IEEE, vol. 104, no. 2, pp. 310–331, Feb. 2016.
principal component analysis with a new tensor nuclear norm,’’ IEEE [45] H. Chen, Q. Ren, and Y. Zhang, ‘‘A hierarchical support tensor machine
Trans. Pattern Anal. Mach. Intell., to be published. structure for target detection on high-resolution remote sensing images,’’
[21] C. Peng, L. Zou, and D.-S. Huang, ‘‘Discovery of relationships between in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), Fort Worth,
long non-coding RNAs and genes in human diseases based on tensor TX, USA, Jul. 2017, pp. 594–597.
completion,’’ IEEE Access, vol. 6, pp. 59152–59162, 2018. [46] H. Fanaee-T and J. Gama, ‘‘Tensor-based anomaly detection: An interdis-
[22] C. Tobler, ‘‘Low-rank tensor methods for linear systems and eigenvalue ciplinary survey,’’ Knowl.-Based Syst., vol. 98, pp. 130–147, Apr. 2016.
problems,’’ M.S. thesis, ETH Zürich, Zürich, Switzerland, 2012. [47] H. Imtia and A. D. Sarwate, ‘‘Improved algorithms for differentially pri-
[23] D. Kressner, M. Steinlechner, and A. Uschmajew, ‘‘Low-rank tensor vate orthogonal tensor decomposition,’’ in Proc. IEEE Int. Conf. Acoust.,
methods with subspace correction for symmetric eigenvalue problems,’’ Speech Signal Process. (ICASSP), Calgary, AB, Canada, Apr. 2018,
SIAM J. Sci. Comput., vol. 36, no. 5, pp. A2346–A2368, 2014. pp. 2201–2205.
[24] D. Kressner, M. Steinlechner, and B. Vandereycken, ‘‘Low-rank tensor [48] H. Lu, L. Zhang, Z. Cao, W. Wei, K. Xian, C. Shen, and
completion by Riemannian optimization,’’ BIT Numer. Math., vol. 54, A. van den Hengel, ‘‘When unsupervised domain adaptation meets
no. 2, pp. 447–468, 2014. tensor representations,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV),
[25] D. Kressner and C. Tobler, ‘‘Algorithm 941: Htucker—A MATLAB Venice, Italy, Oct. 2017, pp. 599–608.
toolbox for tensors in hierarchical Tucker format,’’ ACM Trans. Math. [49] H. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos, ‘‘A survey of
Softw., vol. 40, no. 3, 2014, Art. no. 22. multilinear subspace learning for tensor data,’’ Pattern Recognit., vol. 44,
[26] D. Kressner and A. Uschmajew, ‘‘On low-rank approximability of solu- no. 7, pp. 1540–1551, 2011.
tions to high-dimensional operator equations and eigenvalue problems,’’ [50] H. Matsueda, ‘‘Analytic optimization of a MERA network and its rele-
Linear Algebra Appl., vol. 493, pp. 556–572, Mar. 2016. vance to quantum integrability and wavelet,’’ 2016, arXiv:1608.02205.
[27] D. Tao, X. Li, W. Hu, S. Maybank, and X. Wu, ‘‘Supervised tensor [Online]. Available: https://arxiv.org/abs/1608.02205
learning,’’ in Proc. 5th IEEE Int. Conf. Data Mining (ICDM), Nov. 2005, [51] H. Yang, J. Su, Y. Zou, B. Yu, and E. F. Y. Young, ‘‘Layout hotspot
pp. 8–16. detection with feature tensor generation and deep biased learning,’’ in
[28] D. Wang, H. Shen, and Y. Truong, ‘‘Efficient dimension reduction Proc. 54th ACM/EDAC/IEEE Design Autom. Conf. (DAC), Austin, TX,
for high-dimensional matrix-valued data,’’ Neurocomputing, vol. 190, USA, 2017, pp. 1–6.
pp. 25–34, May 2016. [52] H. Wang, Q. Wu, L. Shi, Y. Yu, and N. Ahuja, ‘‘Out-of-core tensor
[29] E. Corona, A. Rahimian, and D. Zorin, ‘‘A tensor-train accelerated approximation of multi-dimensional matrices of visual data,’’ ACM Trans.
solver for integral equations in complex geometries,’’ Nov. 2015, Graph., vol. 24, no. 3, pp. 527–535, 2005.
arXiv:1511.06029. [Online]. Available: https://arxiv.org/abs/1511.06029 [53] H. Wang, D. Huang, Y. Wang, and H. Yang, ‘‘Facial aging simulation via
[30] E. M. Stoudenmire and D. J. Schwab, ‘‘Supervised learning with tensor completion and metric learning,’’ IET Comput. Vis., vol. 11, no. 1,
quantum-inspired tensor networks,’’ 2016, arXiv:1605.05775. [Online]. pp. 78–86, Feb. 2017.
Available: https://arxiv.org/abs/1605.05775 [54] H. Zhao, Z. Wei, and H. Yan, ‘‘Detection of correlated co-clusters in
[31] F. L. Hitchcock, ‘‘Multiple invariants and generalized rank of a p-way tensor data based on the slice-wise factorization,’’ in Proc. Int. Conf.
matrix or tensor,’’ J. Math. Phys., vol. 7, pp. 39–79, Apr. 1928. Mach. Learn. (ICMLC), Ningbo, China, Jul. 2017, pp. 182–188.
[55] H. Zhou, L. Li, and H. Zhu, ‘‘Tensor regression with applications in [77] L. Grasedyck, D. Kressner, and C. Tobler, ‘‘A literature survey of low-
neuroimaging data analysis,’’ J. Amer. Stat. Assoc., vol. 108, no. 502, rank tensor approximation techniques,’’ GAMM-Mitteilungen, vol. 36,
pp. 540–552, 2013. no. 1, pp. 53–78, 2013.
[56] I. Jeon, E. E. Papalexakis, C. Faloutsos, L. Sael, and U. Kang, ‘‘Mining [78] L. Grasedyck, ‘‘Hierarchical singular value decomposition of tensors,’’
billion-scale tensors: Algorithms and discoveries,’’ VLDB J., vol. 25, SIAM J. Matrix Anal. Appl., vol. 31, no. 4, pp. 2029–2054, 2010.
no. 4, pp. 519–544, 2016. [79] L. Karlsson, D. Kressner, and A. Uschmajew, ‘‘Parallel algorithms
[57] I. Kisil, G. G. Calvi, A. Cichocki, and D. P. Mandic, ‘‘Common and for tensor completion in the CP format,’’ Parallel Comput., vol. 57,
individual feature extraction using tensor decompositions: A remedy pp. 222–234, Sep. 2016.
for the curse of dimensionality?’’ in Proc. IEEE Int. Conf. Acoust., [80] L. Luo, Y. Xie, Z. Zhang, and W.-J. Li, ‘‘Support matrix machines,’’ in
Speech Signal Process. (ICASSP), Calgary, AB, Canada, Apr. 2018, Proc. Int. Conf. Mach. Learn. (ICML), 2015, pp. 938–947.
pp. 6299–6303. [81] L. R. Tucker, ‘‘Implications of factor analysis of three-way matrices for
[58] I. Kotsia, W. Guo, and I. Patras, ‘‘Higher rank support tensor machines measurement of change,’’ in Problems Measuring Change, C. W. Harris,
for visual recognition,’’ Pattern Recognit., vol. 45, no. 12, pp. 4192–4203, Ed. Madison, WI, USA: Univ. Wisconsin Press, 1963, pp. 122–137.
2012. [82] L. Sorber, I. Domanov, M. Van Barel, and L. De Lathauwer, ‘‘Exact line
[59] I. Kotsia and I. Patras, ‘‘Support Tucker machines,’’ in Proc. IEEE Conf. and plane search for tensor optimization,’’ Comput. Optim. Appl., vol. 63,
Comput. Vis. Pattern Recognit., Jun. 2011, pp. 633–640. no. 1, pp. 121–142, 2016.
[60] I. V. Oseledets, ‘‘Tensor-train decomposition,’’ SIAM J. Sci. Comput., [83] L. Yuan, Q. Zhao, and J. Cao, ‘‘High-order tensor completion for
vol. 33, no. 5, pp. 2295–2317, 2011. data recovery via sparse tensor-train optimization,’’ in Proc. IEEE Int.
[61] I. V. Oseledets and E. E. Tyrtyshnikov, ‘‘Breaking the curse of dimension- Conf. Acoust., Speech Signal Process. (ICASSP), Calgary, AB, Canada,
ality, or how to use SVD in many dimensions,’’ SIAM J. Sci. Comput., Apr. 2018, pp. 1258–1262.
vol. 31, no. 5, pp. 3744–3759, 2009. [84] L. Zhai, Y. Zhang, H. Lv, S. Fu, and H. Yu, ‘‘Multiscale tensor dictionary
[62] J. A. Tropp, A. Yurtsever, M. Udell, and V. Cevher, ‘‘Randomized single- learning approach for multispectral image denoising,’’ IEEE Access,
view algorithms for low-rank matrix approximation,’’ Tech. Rep., 2016. vol. 6, pp. 51898–51910, 2018.
[63] J. H. Choi and S. Vishwanathan, ‘‘DFacTo: Distributed factorization of [85] M. Bebendorf, C. Kuske, and R. Venn, ‘‘Wideband nested cross approx-
tensors,’’ in Proc. Adv. Neural Inf. Process. Syst., 2014, pp. 1296–1304. imation for Helmholtz problems,’’ Numerische Mathematik, vol. 130,
[64] J. Kossaifi, A. Khanna, Z. Lipton, T. Furlanello, and A. Anandkumar, no. 1, pp. 1–34, 2015.
‘‘Tensor contraction layers for parsimonious deep nets,’’ in Proc. IEEE [86] M. Bachmayr, R. Schneider, and A. Uschmajew, ‘‘Tensor networks and
Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Honolulu, hierarchical tensors for the solution of high-dimensional partial differen-
HI, USA, Jul. 2017, pp. 1940–1946. tial equations,’’ Found. Comput. Math., vol. 16, no. 6, pp. 1423–1472,
[65] J. Virta and K. Nordhausen, ‘‘Blind source separation for nonstationary 2016.
tensor-valued time series,’’ in Proc. IEEE 27th Int. Workshop Mach. [87] M. Espig, M. Schuster, A. Killaitis, N. Waldren, P. Whnert, S. Handschuh,
Learn. Signal Process. (MLSP), Tokyo, Japan, Sep. 2017, pp. 1–6. and H. Auer, ‘‘TensorCalculus library,’’ Tech. Rep., 2012.
[66] K. Batselier and N. Wong, ‘‘A constructive arbitrary-degree Kronecker [88] M. Ghassemi, Z. Shakeri, A. D. Sarwate, and W. U. Bajwa, ‘‘STARK:
product decomposition of tensors,’’ 2015, arXiv:1507.08805. [Online]. Structured dictionary learning through rank-one tensor recovery,’’ in
Available: https://arxiv.org/abs/1507.08805 Proc. IEEE 7th Int. Workshop Comput. Adv. Multi-Sensor Adapt. Pro-
[67] K. Batselier, H. Liu, and N. Wong, ‘‘A constructive algorithm for decom- cess. (CAMSAP), Curacao, Netherlands Antilles, Dec. 2017, pp. 1–5.
posing a tensor into a finite sum of orthonormal rank-1 terms,’’ SIAM J. [89] M. Hou, ‘‘Tensor-based regression models and applications,’’
Matrix Anal. Appl., vol. 36, no. 3, pp. 1315–1337, 2015. Ph.D. dissertation, Laval Univ., Quebec City, QC, Canada, 2017.
[68] K.-Y. Peng, S.-Y. Fu, Y.-P. Liu, and W.-C. Hsu, ‘‘Adaptive runtime [90] M. Hou, Y. Wang, and B. Chaib-draa, ‘‘Online local Gaussian pro-
exploiting sparsity in tensor of deep learning neural network on het- cess for tensor-variate regression: Application to fast reconstruction of
erogeneous systems,’’ in Proc. Int. Conf. Embedded Comput. Syst., limb movements from brain signal,’’ in Proc. IEEE Int. Conf. Acoust.,
Archit., Modeling, Simulation (SAMOS), Pythagorion, Greece, Jul. 2017, Speech Signal Process. (ICASSP), Brisbane, QLD, Australia, Apr. 2015,
pp. 105–112. pp. 5490–5494.
[69] K. Makantasis, A. D. Doulamis, N. D. Doulamis, and A. Nikitakis, [91] M. Hou, Q. Zhao, B. Chaib-Draa, and A. Cichocki, ‘‘Common and dis-
‘‘Tensor-based classification models for hyperspectral data analysis,’’ criminative subspace kernel-based multiblock tensor partial least squares
IEEE Trans. Geosci. Remote Sens., vol. 56, no. 12, pp. 6884–6898, regression,’’ in Proc. 13th AAAI Conf. Artif. Intell., 2016, pp. 1673–1679.
Dec. 2018. [92] M. Steinlechner, ‘‘Riemannian optimization for solving high-dimensional
[70] K. Makantasis, A. Doulamis, N. Doulamis, A. Nikitakis, and problems with low-rank tensor structure,’’ Ph.D. dissertation, 2016.
A. Voulodimos, ‘‘Tensor-based nonlinear classifier for high-order [93] M. W. Mahoney, M. Maggioni, and P. Drineas, ‘‘Tensor-CUR decompo-
data analysis,’’ in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. sitions for tensor-based data,’’ SIAM J. Matrix Anal. Appl., vol. 30, no. 3,
(ICASSP), Calgary, AB, Canada, Apr. 2018, pp. 2221–2225. pp. 957–987, 2008.
[71] K. Naskovska and M. Haardt, ‘‘Extension of the semi-algebraic frame- [94] N. Cohen and A. Shashua, ‘‘Inductive bias of deep convolutional net-
work for approximate CP decompositions via simultaneous matrix diag- works through pooling geometry,’’ 2016, arXiv:1605.06743. [Online].
onalization to the efficient calculation of coupled CP decompositions,’’ Available: https://arxiv.org/abs/1605.06743
in Proc. 50th Asilomar Conf. Signals, Syst. Comput., Pacific Grove, CA, [95] N. D. Sidiropoulos, L. De Lathauwer, X. Fu, K. Huang,
USA, Nov. 2016, pp. 1728–1732. E. E. Papalexakis, and C. Faloutsos, ‘‘Tensor decomposition for
[72] T. Levi-Civita, The Absolute Differential Calculus. London, U.K.: Blackie signal processing and machine learning,’’ IEEE Trans. Signal Process.,
and Son, 1927. vol. 65, no. 13, pp. 3551–3582, Jal. 2017.
[73] L. De Lathauwer, B. De Moor, and J. Vandewalle, ‘‘A multilinear sin- [96] N. Halko, P. G. Martinsson, and J. A. Tropp, ‘‘Finding structure with ran-
gular value decomposition,’’ SIAM J. Matrix Anal. Appl., vol. 21, no. 4, domness: Probabilistic algorithms for constructing approximate matrix
pp. 1253–1278, 2000. decompositions,’’ SIAM Rev., vol. 53, no. 2, pp. 217–288, 2011.
[74] L. De Lathauwer, B. De Moor, and J. Vandewalle, ‘‘On the best rank-1 [97] N. H. Nguyen, P. Drineas, and T. D. Tran, ‘‘Tensor sparsification via a
and rank-(R1 ,R2 ,. . .,RN ) approximation of higher-order tensors,’’ SIAM bound on the spectral norm of random tensors,’’ 2015, arXiv:1005.4732.
J. Matrix Anal. Appl., vol. 21, no. 4, pp. 1324–1342, 2000. [Online]. Available: https://arxiv.org/abs/1005.4732
[75] L. Geng, X. Nie, S. Niu, Y. Yin, and J. Lin, ‘‘Structural compact core [98] N. Kargas and N. D. Sidiropoulos, ‘‘Completing a joint PMF from pro-
tensor dictionary learning for multispec-tral remote sensing image deblur- jections: A low-rank coupled tensor factorization approach,’’ in Proc. Inf.
ring,’’ in Proc. 25th IEEE Int. Conf. Image Process. (ICIP), Athens, Theory Appl. Workshop (ITA), San Diego, CA, USA, Feb. 2017, pp. 1–6.
Greece, Oct. 2018, pp. 2865–2869. [99] N. Lee and A. Cichocki, ‘‘Fundamental tensor operations for large-
[76] L. Albera, H. Becker, A. Karfoul, R. Gribonval, A. Kachenoura, scale data analysis using tensor network formats,’’ Multidimensional Syst.
S. Bensaid, L. Senhadji, A. Hernandez, and I. Merlet, ‘‘Localization of Signal Process., vol. 29, no. 3, pp. 921–960, 2018.
spatially distributed brain sources after a tensor-based preprocessing of [100] N. Schuch, I. Cirac, and D. Pérez-García, ‘‘PEPS as ground states:
interictal epileptic EEG data,’’ in Proc. 37th Annu. Int. Conf. IEEE Eng. Degeneracy and topology,’’ Ann. Phys., vol. 325, no. 10, pp. 2153–2192,
Med. Biol. Soc. (EMBC), Milan, Italy, Aug. 2015, pp. 6995–6998. 2010.
[101] N. Vannieuwenhoven, R. Vandebril, and K. Meerbergen, ‘‘A new trunca- [124] S. Yang, M. Wang, Z. Feng, Z. Liu, and R. Li, ‘‘Deep sparse tensor
tion strategy for the higher-order singular value decomposition,’’ SIAM J. filtering network for synthetic aperture radar images classification,’’
Sci. Comput., vol. 34, no. 2, pp. A1027–A1052, 2012. IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 8, pp. 3919–3924,
[102] P. D. Hoff, ‘‘Multilinear tensor regression for longitudinal relational Aug. 2018.
data,’’ Ann. Appl. Statist., vol. 9, no. 3, pp. 1169–1193, 2015. [125] T. D. Pham and H. Yan, ‘‘Tensor decomposition of gait dynamics
[103] P. G. Constantine, D. F. Gleich, Y. Hou, and J. Templeton, ‘‘Model in Parkinson’s disease,’’ IEEE Trans. Biomed. Eng., vol. 65, no. 8,
reduction with mapreduce-enabled tall and skinny singular value decom- pp. 1820–1827, Aug. 2018.
position,’’ SIAM J. Sci. Comput., vol. 36, no. 5, pp. S166–S191, 2014. [126] T. D. Nguyen, T. Tran, D. Phung, and S. Venkatesh, ‘‘Tensor-variate
[104] P. M. Kroonenberg, Applied Multiway Data Analysis. New York, NY, restricted Boltzmann machines,’’ in Proc. AAAI, 2015, pp. 2887–2893.
USA: Wiley, 2008. [127] T. G. Kolda and B. W. Bader, ‘‘Tensor decompositions and applications,’’
[105] Q. Li, G. An, and Q. Ruan, ‘‘3D facial expression recognition using SIAM Rev., vol. 51, no. 3, pp. 455–500, 2009.
orthogonal tensor marginal Fisher analysis on geometric maps,’’ in Proc. [128] T.-X. Jiang, T.-Z. Huang, X.-L. Zhao, L.-J. Deng, and Y. Wang,
Int. Conf. Wavelet Anal. Pattern Recognit. (ICWAPR), Ningbo, China, ‘‘A novel tensor-based video rain streaks removal approach via utilizing
Jul. 2017, pp. 65–71. discriminatively intrinsic priors,’’ in Proc. IEEE Conf. Comput. Vis. Pat-
[106] Q. Li and G. Tang, ‘‘Convex and nonconvex geometries of symmetric ten- tern Recognit. (CVPR), Honolulu, HI, USA, Jul. 2017, pp. 2818–2827.
sor factorization,’’ in Proc. 51st Asilomar Conf. Signals, Syst., Comput., [129] T.-L. Chen, D. D. Chang, S.-Y. Huang, H. Chen, C. Lin, and
Pacific Grove, CA, USA, Oct./Nov. 2017, pp. 305–309. W. Wang, ‘‘Integrating multiple random sketches for singular value
[107] Q. Shi, Y.-M. Cheung, Q. Zhao, and H. Lu, ‘‘Feature extraction for incom- decomposition,’’ 2016, arXiv:1608.08285. [Online]. Available:
plete data via low-rank tensor decomposition with feature regularization,’’ https://arxiv.org/abs/1608.08285
IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 6, pp. 1803–1817, [130] T. Wu, A. R. Benson, and D. F. Gleich, ‘‘General tensor spectral co-
Jun. 2019. clustering for higher-order data,’’ 2016, arXiv:1603.00395. [Online].
[108] Q. Zhang, L. T. Yang, Z. Chen, and P. Li, ‘‘A tensor-train deep compu- Available: https://arxiv.org/abs/1603.00395
tation model for industry informatics big data feature learning,’’ IEEE [131] T. Yokota, N. Lee, and A. Cichocki, ‘‘Robust multilinear tensor rank esti-
Trans. Ind. Informat., vol. 14, no. 7, pp. 3197–3204, Jul. 2018. mation using higher order singular value decomposition and information
[109] Q. Zhao, G. Zhou, S. Xie, L. Zhang, and A. Cichocki, ‘‘Tensor criteria,’’ IEEE Trans. Signal Process., vol. 65, no. 5, pp. 1196–1206,
ring decomposition,’’ 2016, arXiv:1606.05535. [Online]. Available: Mar. 2017.
https://arxiv.org/abs/1606.05535 [132] V. A. Kazeev, B. N. Khoromskij, and E. E. Tyrtyshnikov, ‘‘Multilevel
[110] R. A. Harshman, ‘‘Foundations of the PARAFAC procedure: Models Toeplitz matrices generated by tensor-structured vectors and convolution
and conditions for an ‘explanatory’ multimodal factor analysis,’’ UCLA with logarithmic complexity,’’ SIAM J. Sci. Comput., vol. 35, no. 3,
Working Papers in Phonetics, Tech. Rep., 1970, pp. 1–84, vol. 16. pp. A1511–A1536, 2013.
[111] R. Kountchev and R. Kountcheva, ‘‘Truncated hierarchical SVD for [133] V. Chandola, A. Banerjee, and V. Kumar, ‘‘Anomaly detection: A survey,’’
image sequences, represented as third order tensor,’’ in Proc. 8th Int. Conf. ACM Comput. Surv., vol. 41, no. 3, 2009, Art. no. 15.
Inf. Technol. (ICIT), Amman, Jordan, May 2017, pp. 166–173.
[134] V. de Silva and L.-H. Lim, ‘‘Tensor rank and the ill-posedness of the best
[112] R. Orús, ‘‘A practical introduction to tensor networks: Matrix prod- low-rank approximation problem,’’ SIAM J. Matrix Anal. Appl., vol. 30,
uct states and projected entangled pair states,’’ Ann. Phys., vol. 349, no. 3, pp. 1084–1127, 2008.
pp. 117–158, Oct. 2014.
[135] V. Giovannetti, S. Montangero, and R. Fazio, ‘‘Quantum multiscale entan-
[113] R. Yu and Y. Liu, ‘‘Learning from multiway data: Simple and efficient
glement renormalization ansatz channels,’’ Phys. Rev. Lett., vol. 101,
tensor regression,’’ in Proc. 33rd Int. Conf. Mach. Learn. (ICML), 2016,
no. 18, 2008, Art. no. 180503.
pp. 373–381.
[136] V. Kuleshov, A. Chaganty, and P. Liang, ‘‘Tensor factorization via
[114] R. Zhao and Q. Wang, ‘‘Learning separable dictionaries for sparse tensor
matrix factorization,’’ in Proc. 18th Int. Conf. Artif. Intell. Statist., 2015,
representation: An online approach,’’ IEEE Trans. Circuits Syst. II, Exp.
pp. 507–516.
Briefs, vol. 66, no. 3, pp. 502–506, Mar. 2019.
[137] V. Tresp, C. Esteban, Y. Yang, S. Baier, and D. Krompaß, ‘‘Learning
[115] R. Zdunek and K. Fonal, ‘‘Randomized nonnegative tensor factorization
with memory embeddings,’’ 2015, arXiv:1511.07972. [Online]. Avail-
for feature extraction from high-dimensional signals,’’ in Proc. 25th
able: https://arxiv.org/abs/1511.07972
Int. Conf. Syst., Signals Image Process. (IWSSIP), Maribor, Slovenia,
Jun. 2018, pp. 1–5. [138] W. Austin, G. Ballard, and T. G. Kolda, ‘‘Parallel tensor compression for
[116] S. A. Vorobyov, Y. Rong, N. D. Sidiropoulos, and A. B. Gershman, large-scale scientific data,’’ 2015, arXiv:1510.06689. [Online]. Available:
‘‘Robust iterative fitting of multilinear models,’’ IEEE Trans. Signal https://arxiv.org/abs/1510.06689
Process., vol. 53, no. 8, pp. 2678–2689, Aug. 2005. [139] W. Chu and Z. Ghahramani, ‘‘Probabilistic models for incomplete multi-
[117] S. Chen and S. A. Billings, ‘‘Representations of non-linear systems: dimensional arrays,’’ in Proc. 12th Int. Conf. Artif. Intell. Statist., vol. 5,
The NARMAX model,’’ Int. J. Control, vol. 49, no. 3, pp. 1013–1032, 2009, pp. 89–96.
1989. [140] W. de Launey and J. Seberry, ‘‘The strong Kronecker product,’’ J. Com-
[118] S. E. Sofuoglu and S. Aviyente, ‘‘A two-stage approach to robust tensor binat. Theory, Ser. A, vol. 66, no. 2, pp. 192–213, 1994.
decomposition,’’ in Proc. IEEE Stat. Signal Process. Workshop (SSP), [141] W. Guo, I. Kotsia, and I. Patras, ‘‘Tensor learning for regression,’’ IEEE
Freiburg, Germany, Jun. 2018, pp. 831–835. Trans. Image Process., vol. 21, no. 2, pp. 816–827, Feb. 2012.
[119] S. Han and P. Woodford, ‘‘Comparison of dimension reduction methods [142] W. Hackbusch and S. Kühn, ‘‘A new scheme for the tensor representa-
using polarimetric SAR images for tensor-based feature extraction,’’ in tion,’’ J. Fourier Anal. Appl., vol. 15, no. 5, pp. 706–722, Oct. 2009.
Proc. 12th Eur. Conf. Synth. Aperture Radar (EUSAR), Aachen, Germany, [143] W. Hu, J. Gao, J. Xing, C. Zhang, and S. Maybank, ‘‘Semi-supervised
Jun. 2018, pp. 1–6. tensor-based graph embedding learning and its application to visual
[120] S. Kallam, S. M. Basha, D. S. Rajput, R. Patan, B. Balamurugan, and discriminant tracking,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 39,
S. A. K. Basha, ‘‘Evaluating the performance of deep learning tech- no. 1, pp. 172–188, Jan. 2017.
niques on classification using tensor flow application,’’ in Proc. Int. [144] X. Zhao, H. Shi, M. Lv, and L. Jing, ‘‘Least squares twin support
Conf. Adv. Comput. Commun. Eng. (ICACCE), Paris, France, Jun. 2018, tensor machine for classification,’’ J. Inf. Comput. Sci., vol. 11, no. 12,
pp. 331–335. pp. 4175–4189, 2014.
[121] S. K. Biswas and P. Milanfar, ‘‘Linear support tensor machine with LSK [145] X. Deng, P. Jiang, X. Peng, and C. Mi, ‘‘An intelligent outlier detection
channels: Pedestrian detection in thermal infrared images,’’ IEEE Trans. method with one class support Tucker machine and genetic algorithm
Image Process., vol. 26, no. 9, pp. 4229–4242, Sep. 2017. toward big sensor data in Internet of Things,’’ IEEE Trans. Ind. Electron.,
[122] S. Savvaki, G. Tsagkatakis, A. Panousopoulou, and P. Tsakalides, vol. 66, no. 6, pp. 4672–4683, Jun. 2019.
‘‘Matrix and tensor completion on a human activity recognition frame- [146] X. He, D. Cai, and P. Niyogi, ‘‘Tensor subspace analysis,’’ in Proc. Annu.
work,’’ IEEE J. Biomed. Health Inform., vol. 21, no. 6, pp. 1554–1561, Conf. Neural Inf. Process. Syst, 2006, pp. 499–506.
Nov. 2017. [147] X. Xu, N. Zhang, Y. Yan, and Q. Shen, ‘‘Application of support
[123] S. V. Dolgov and D. V. Savostyanov, ‘‘Alternating minimal energy meth- higher-order tensor machine in fault diagnosis of electric vehicle range-
ods for linear systems in higher dimensions,’’ SIAM J. Sci. Comput., extender,’’ in Proc. Chin. Autom. Congr. (CAC), Jinan, China, Oct. 2017,
vol. 36, no. 5, pp. A2248–A2271, 2014. pp. 6033–6037.
[148] X. Xu, Q. Wu, S. Wang, J. Liu, J. Sun, and A. Cichocki, ‘‘Whole brain YUWANG JI is currently pursuing the master’s
fMRI pattern analysis based on tensor neural network,’’ IEEE Access, degree with the National Engineering Laboratory
vol. 6, pp. 29297–29305, 2018. for Mobile Network Security, Wireless Technol-
[149] X. Zhang, ‘‘A nonconvex relaxation approach to low-rank tensor com- ogy Innovation Institute, Beijing University of
pletion,’’ IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 6, Posts and Telecommunications (BUPT). His cur-
pp. 1659–1671, Jun. 2019. rent research interests include the tensor applica-
[150] Y. Du, G. Han, Y. Quan, Z. Yu, H.-S. Wong, C. L. P. Chen, and tion in machine learning and time series analysis.
J. Zhang, ‘‘Exploiting global low-rank structure and local sparsity
nature for tensor completion,’’ IEEE Trans. Cybern., vol. 49, no. 11,
pp. 3898–3910, Nov. 2019.
[151] Y. Huang, W. Wang, L. Wang, and T. Tan, ‘‘Conditional high-order
Boltzmann machines for supervised relation learning,’’ IEEE Trans.
Image Process., vol. 26, no. 9, pp. 4297–4310, Sep. 2017.
[152] Y.-J. Kao, Y.-D. Hsieh, and P. Chen, ‘‘Uni10: An open-source library for
tensor network algorithms,’’ J. Phys., Conf. Ser., vol. 640, no. 1, 2015,
QIANG WANG received the Ph.D. degree in
Art. no. 012040.
[153] Y. Liu, ‘‘Low-rank tensor regression: Scalability and applications,’’ in
communication engineering from the Beijing Uni-
Proc. IEEE 7th Int. Workshop Comput. Adv. Multi-Sensor Adapt. Pro- versity of Posts and Telecommunications (BUPT),
cess. (CAMSAP), Curacao, Netherlands Antilles, Dec. 2017, pp. 1–5. Beijing, China, in 2008. Since 2008, he has been
[154] Y. Wang, H.-Y. Tung, A. Smola, and A. Anandkumar, ‘‘Fast and guar- with the School of Information and Communi-
anteed tensor decomposition via sketching,’’ in Proc. Adv. Neural Inf. cation Engineering, Beijing University of Posts
Process. Syst., 2015, pp. 991–999. and Telecommunications, where he is currently
[155] Y. Wang, W. Zhang, Z. Yu, Z. Gu, H. Liu, Z. Cai, C. Wang, and an Associate Professor. He participated in many
S. Gao, ‘‘Support vector machine based on low-rank tensor train decom- national projects such as NSFC, 863, and so on.
position for big data applications,’’ in Proc. 12th IEEE Conf. Ind. Elec- His research interests include information theory,
tron. Appl. (ICIEA), Siem Reap, Cambodia, Jun. 2017, pp. 850–853. machine learning, wireless communications, VLSI, and statistical inference.
[156] Y. W. Chen, K. Guo, and Y. Pan, ‘‘Robust supervised learning based on
tensor network method,’’ in Proc. 33rd Youth Acad. Annu. Conf. Chin.
Assoc. Automat. (YAC), Nanjing, China, May 2018, pp. 311–315.
[157] Y. Xiang, Q. Jiang, J. He, X. Jin, L. Wu, and S. Yao, ‘‘The advance of
support tensor machine,’’ in Proc. IEEE 16th Int. Conf. Softw. Eng. Res.,
Manage. Appl. (SERA), Kunming, China, Jun. 2018, pp. 121–128. XUAN LI is currently pursuing the master’s degree
[158] Y. Zhang and R. Barzilay, ‘‘Hierarchical low-rank tensors for multilin- with the National Engineering Laboratory for
gual transfer parsing,’’ in Proc. Conf. Empirical Methods Natural Lang. Mobile Network Security, Wireless Technology
Process., 2015, pp. 1857–1867. Innovation Institute, Beijing University of Posts
[159] Z. Chen, K. Batselier, J. A. K. Suykens, and N. Wong, ‘‘Parallelized and Telecommunications (BUPT). Her current
tensor train learning of polynomial classifiers,’’ 2016, arXiv:1612.06505. research interests include UAV-assisted networks
[Online]. Available: https://arxiv.org/abs/1612.06505 and reinforcement learning.
[160] Z. Chen, B. Yang, and B. Wang, ‘‘Hyperspectral target detection:
A preprocessing method based on tensor principal component analysis,’’
in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), Valencia,
Spain, 2018, pp. 2753–2756.
[161] Z. Chen, K. Batselier, J. A. K. Suykens, and N. Wong, ‘‘Parallelized
tensor train learning of polynomial classifiers,’’ IEEE Trans. Neural Netw.
Learn. Syst., vol. 29, no. 10, pp. 4621–4632, Oct. 2018.
[162] Z.-C. Gu, M. Levin, B. Swingle, and X.-G. Wen, ‘‘Tensor-product rep-
JIE LIU is currently pursuing the master’s degree
resentations for string-net condensed states,’’ Phys. Rev. B, Condens.
with the National Engineering Laboratory for
Matter, vol. 79, no. 8, 2009, Art. no. 085118.
[163] Z. Fang, X. Yang, L. Han, and X. Liu, ‘‘A sequentially truncated higher Mobile Network Security, Wireless Technology
order singular value decomposition-based algorithm for tensor comple- Innovation Institute, Beijing University of Posts
tion,’’ IEEE Trans. Cybern., vol. 49, no. 5, pp. 1956–1967, May 2019. and Telecommunications (BUPT). Her current
[164] Z. Hao, L. He, B. Chen, and X. Yang, ‘‘A linear support higher-order research interests include time series prediction
tensor machine for classification,’’ IEEE Trans. Image Process., vol. 22, and reinforcement learning.
no. 7, pp. 2911–2920, Jul. 2013.
[165] Z. Zhang and S. Aeron, ‘‘Exact tensor completion using t-SVD,’’ IEEE
Trans. Signal Process., vol. 65, no. 6, pp. 1511–1526, Mar. 2015.