0% found this document useful (0 votes)
8 views

Tensors IEEE SPM March 2015

Uploaded by

Marouane Nazih
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Tensors IEEE SPM March 2015

Uploaded by

Marouane Nazih
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

[ Andrzej

 Cichocki, Danilo P. Mandic,


Anh Huy Phan, Cesar F. Caiafa,
Guoxu Zhou, Qibin Zhao, and
Lieven De Lathauwer ]

Tensor
Decompositions
for Signal Processing
Applications
[From two-way to multiway component analysis]

T
he widespread use of multisensor technology and the emergence of big data
sets have highlighted the limitations of standard flat-view matrix models and
the necessity to move toward more versatile data analysis tools. We show that
higher-order tensors (i.e., multiway arrays) enable such a fundamental para-
digm shift toward models that are essentially polynomial, the uniqueness of
which, unlike the matrix methods, is guaranteed under very mild and natural conditions.
Benefiting from the power of multilinear algebra as their mathematical backbone, data
analysis techniques using tensor decompositions are shown to have great flexibility in the
choice of constraints which match data properties and extract more general latent compo-
nents in the data than matrix-based methods.
A comprehensive introduction to tensor decompositions is provided from a signal process-
ing perspective, starting from the algebraic foundations, via basic canonical polyadic and Tucker
models, to advanced cause-effect and multiview data analysis schemes. We show that tensor
decompositions enable natural generalizations of some commonly used signal processing para-
image licensed by graphic stock digms, such as canonical correlation and subspace techniques, signal separation, linear regres-
sion, feature extraction, and classification. We also cover computational aspects and point out
Digital Object Identifier 10.1109/MSP.2013.2297439 how ideas from compressed sensing (CS) and scientific computing may be used for addressing
Date of publication: 12 February 2015 the otherwise unmanageable storage and manipulation issues associated with big data sets. The

1053-5888/15©2015IEEE IEEE SIGNAL PROCESSING MAGAZINE [145] March 2015


concepts are supported by illustrative real-world case studies that [12], [21]–[23] and tutorial papers [24]–[31] covering various
highlight the benefits of the tensor framework as efficient and aspects of multiway analysis.
promising tools, inter alia, for modern signal processing, data ana-
lysis, and machine-learning applications; moreover, these benefits FROM A MATRIX TO A TENSOR
also extend to vector/matrix data through tensorization. Approaches to two-way (matrix) component analysis are well estab-
lished and include principal component analysis (PCA), ICA, non-
HISTORICAL NOTES negative matrix factorization (NMF), and sparse component analysis
The roots of multiway analysis can be traced back to studies of (SCA) [12], [21], [32]. These techniques have become standard tools
homogeneous polynomials in the 19th century, with contributors for, e.g., blind source separation (BSS), feature extraction, or classifi-
including Gauss, Kronecker, Cayley, Weyl, and Hilbert. In the cation. On the other hand, large classes of data arising from modern
modern-day interpretation, these are fully symmetric tensors. heterogeneous sensor modalities have a multiway character and are,
Decompositions of nonsymmetric tensors have been studied since therefore, naturally represented by multiway arrays or tensors (see
the early 20th century [1], whereas the benefits of using more the section “Tensorization—Blessing of Dimensionality”).
than two matrices in factor analysis (FA) [2] have been apparent in Early multiway data analysis approaches reformatted the data
several communities since the 1960s. The Tucker decomposition tensor as a matrix and resorted to methods developed for classical
(TKD) for tensors was introduced in psychometrics [3], [4], while two-way analysis. However, such a flattened view of the world and
the canonical polyadic decomposition (CPD) was independently the rigid assumptions inherent in two-way analysis are not always a
rediscovered and put into an application context under the names good match for multiway data. It is only through higher-order ten-
of canonical decomposition (CANDECOMP) in psychometrics [5] sor decomposition that we have the opportunity to develop sophis-
and parallel factor model (PARAFAC) in linguistics [6]. Tensors ticated models capturing multiple interactions and couplings
were subsequently adopted in diverse branches of data analysis instead of standard pairwise interactions. In other words, we can
such as chemometrics, the food industry, and social sciences [7], only discover hidden components within multiway data if the ana-
[8]. When it comes to signal processing, the early 1990s saw a lysis tools account for the intrinsic multidimensional patterns pre-
considerable interest in higher-order statistics (HOS) [9], and it sent, motivating the development of multilinear techniques.
was soon realized that, for multivariate cases, HOS are effectively In this article, we emphasize that tensor decompositions are
higher-order tensors; indeed, algebraic approaches to independent not just matrix factorizations with additional subscripts, multi-
component analysis (ICA) using HOS [10]–[12] were inherently linear algebra is much more structurally rich than linear alge-
tensor based. Around 2000, it was realized that the TKD repre- bra. For example, even basic notions such as rank have a more
sents a multilinear singular value decomposition (MLSVD) [15]. subtle meaning, the uniqueness conditions of higher-order ten-
Generalizing the matrix singular value decomposition (SVD), the sor decompositions are more relaxed and accommodating than
workhorse of numerical linear algebra, the MLSVD spurred the those for matrices [33], [34], while matrices and tensors also
interest in tensors in applied mathematics and scientific comput- have completely different geometric properties [22]. This boils
ing in very high dimensions [16]–[18]. In parallel, CPD was suc- down to matrices representing linear transformations and quad-
cessfully adopted as a tool for sensor array processing and ratic forms, while tensors are connected with multilinear map-
deterministic signal separation in wireless communication [19], pings and multivariate polynomials [31].
[20]. Subsequently, tensors have been used in audio, image and
video processing, machine learning, and biomedical applications, NOTATIONS AND CONVENTIONS
to name but a few areas. The significant interest in tensors and A tensor can be thought of as a multi-index numerical array,
their quickly emerging applications is reflected in books [7], [8], whereby the order of a tensor is the number of its modes or

[TABLE 1] Basic notation.

A, A, a, a tensor, matrix, vector, scalar


A = [a 1, a 2, f, a R] matrix A with column vectors a r
a (: , i 2, i 3, f, i N) fiber of tensor A obtained by fixing all but one index
A (: , : , i 3, f, i N) matrix slice of tensor A obtained by fixing all but two indices
A (: , : , : , i 4, f, i N) tensor slice of A obtained by fixing some indices
A (I 1, I 2, f, I N) subtensor of A obtained by restricting indices to belong to subsets
I n 3 {1, 2, f, I n}
A (n) ! R I n # I 1 I 2 gI n - 1 I n + 1 gI N mode- n matricization of tensor A ! R I 1 # I 2 # g # I N whose entry at row i n and
column (i 1 - 1) I 2 gI n - 1 I n + 1 gI N + g + (i N - 1 - 1) I N + i N is equal to a i 1 i 2 fi N
vec^Ah ! R I N I N - 1 gI 1 vectorization of tensor A ! R I 1 # I 2 # g # I N with the entry at position
i 1 + / k = 2 [(i k - 1) I 1 I 2 gI k - 1] equal to a i 1 i 2 fi N
N

D = diag (m 1, m 2, f, m R) diagonal matrix with d rr = m r


D = diag N (m 1, m 2, f, m R) diagonal tensor of order N with d rrgr = m r
A T , A -1, A @ transpose, inverse, and Moore–Penrose pseudoinverse

IEEE SIGNAL PROCESSING MAGAZINE [146] March 2015


X(:, :, k)
J X(1) J (SVD/PCA)

~ Σ

...
I = U1 V1T (a)

K X(:, j, :) I X(2) I (NMF/SCA)

... K ~
= A2 B2T (b)
I
X(i, :, :) K X(3) K (ICA)
J
J ~
= A3 B3T = S (c)
...

Unfolding

[Fig1] MWCA for a third-order tensor, assuming that the components are (a) principal and orthogonal in the first mode,
(b) nonnegative and sparse in the second mode, and (c) statistically independent in the third mode.

dimensions; these may include space, time, frequency, trials, R

classes, and dictionaries. A real-valued tensor of order N is denoted


X = ADB T + E = / m r a r b Tr + E
r=1
by A ! R I1 # I2 # g # I N and its entries by a i1, i2, f, i N . Then, an N # 1 R

vector a is considered a tensor of order one, and an N # M matrix = / m r a r % b r + E, (1)


r=1
A a tensor of order two. Subtensors are parts of the original data
tensor, created when only a fixed subset of indices is used. Vector-
valued subtensors are called fibers, defined by fixing every index but where D = diag (m 1, m 2, f, m R) is a scaling (normalizing) matrix,
one, and matrix-valued subtensors are called slices, obtained by fix- the columns of B represent the unknown source signals (factors or
ing all but two indices (see Table 1). The manipulation of tensors latent variables depending on the tasks in hand), the columns of A
often requires their reformatting (reshaping); a particular case of represent the associated mixing vectors (or factor loadings), while
reshaping tensors to matrices is termed matrix unfolding or matri- E is noise due to an unmodeled data part or model error. In other
cization (see Figure 1). Note that a mode- n multiplication of a ten- words, model (1) assumes that the data matrix X comprises hidden
sor A with a matrix B amounts to the multiplication of all components b r ^ r = 1, 2, f, R h that are mixed together in an
mode- n vector fibers with B, and that, in linear algebra, the ten- unknown manner through coefficients A, or, equivalently, that data
sor (or outer) product appears in the expression for a rank-1 mat- contain factors that have an associated loading for every data chan-
rix: ab T = a % b. Basic tensor notations are summarized in Table 1, nel. Figure 3(a) depicts the model (1) as a dyadic decomposition,
various product rules used in this article are given in Table 2, while whereby the terms a r % b r = a r b Tr are rank-1 matrices.
Figure 2 shows two particular ways to construct a tensor. The well-known indeterminacies intrinsic to this model are:
1) arbitrary scaling of components and 2) permutation of the
INTERPRETABLE COMPONENTS rank-1 terms. Another indeterminacy is related to the physical
IN TWO-WAY DATA ANALYSIS meaning of the factors: if the model in (1) is unconstrained, it
The aim of BSS, FA, and latent variable analysis is to decompose admits infinitely many combinations of A and B. Standard
a data matrix X ! R I # J into the factor matrices A = [a 1, matrix factorizations in linear algebra, such as QR-factorization,
a 2, f, a R] ! R I # R and B = [b 1, b 2, f, b R] ! R J # R as eigenvalue decomposition (EVD), and SVD, are only special

[TABLE 2] Definition of products.


C = A # nB mode-n product of A ! R I 1 # I 2 # g # I N and B ! R J n # I n yields C ! R I 1 # g # I n - 1 # J n # I n + 1 # g # I N with entries
c i 1 gi n - 1 j n i n + 1 gi N = / i n = 1 a i 1 gi n - 1 i n i n + 1 gi N b j n i n and matrix representation C (n) = BA (n)
In

C = "A; B (1), B (2), f, B (N), full multilinear product, C = A # 1 B (1) # 2 B (2) g # N B (N)
C = A%B tensor or outer product of A ! R I 1 # I 2 # g # I N and B ! R J1 # J2 # g # J M yields C ! R I 1 # I 2 # g # I N # J1 # J 2 # g # J M with
entries c i 1 i 2 gi N j 1 j 2 gj M = a i 1 i 2 gi N b j 1 j 2 gj M
X = a (1) % a (2) % g % a (N) tensor or outer product of vectors a (n) ! R I n (n = 1, f, N) yields a rank-1 tensor X ! R I 1 # I 2 # g # I N
(1) (2) (N)
with entries x i 1 i 2f i N = a i 1 a i 2 f a i N
C = A7B Kronecker product of A ! R I 1 # I 2 and B ! R J 1 # J 2 yields C ! R I 1 J 1 # I 2 J 2 with entries
c (i 1 - 1) J 1 + j 1,(i 2 - 1) J 2 + j 2 = a i 1 i 2 b j 1 j 2
C = A9B Khatri–Rao product of A = [a 1, f, a R] ! R I # R and B = [b 1, f, b R] ! R J # R yields C ! R IJ # R with columns
cr = ar 7 br

IEEE SIGNAL PROCESSING MAGAZINE [147] March 2015


J x (0) x (1) x (2) gN
K O
I= 26 K x (1) x (2) x (3) gO
H=K O = a b % b, (2)
KK x (2) x (3) x (4) gOO
L h h h P
+ ··· +
where b = [1, z, z 2, f] T. Also, in sensor array processing,
tensor structures naturally emerge when combining snap-
(2 × 2 × 2 × 2 × 2 × 2) shots from identical subarrays [19].
(8 × 8)
(64 × 1) 2) Mathematical construction: Among many such examples,
(a)
the Nth-order moments (cumulants) of a vector-valued random
variable form an Nth-order tensor [9], while in second-order
z0
ICA, snapshots of data statistics (covariance matrices) are effect-
z0 + ∆z
ively slices of a third-order tensor [12], [37]. Also, a (channel#
z0 + 2∆z y
y0 + 2∆y
time) data matrix can be transformed into a (channel#time#
y0 + ∆y frequency) or (channel#time#scale) tensor via time-frequency
y0
z x or wavelet representations, a powerful procedure in multi-
x 0 ∆x

channel electroencephalogram (EEG) analysis in brain sci-


x0
x0

+
+

2∆

ence [21], [38].


x

(b) 3) Experiment design: Multifaceted data can be naturally


stacked into a tensor; for instance, in wireless communica-
[Fig2] Construction of tensors. (a) The tensorization of a vector tions the so-called signal diversity (temporal, spatial, spec-
or matrix into the so-called quantized format; in scientific
computing, this facilitates supercompression of large-scale tral, etc.) corresponds to the order of the tensor [20]. In the
vectors or matrices. (b) The tensor is formed through the same spirit, the standard eigenfaces can be generalized to
discretization of a trivariate function f (x, y, z) . tensor faces by combining images with different illumina-
tions, poses, and expressions [39], while the common modes
cases of (1), and owe their uniqueness to hard and restrictive in EEG recordings across subjects, trials, and conditions are
constraints such as triangularity and orthogonality. On the best analyzed when combined together into a tensor [28].
other hand, certain properties of the factors in (1) can be repre- 4) Natural tensor data: Some data sources are readily gen-
sented by appropriate constraints, making possible the unique erated as tensors [e.g., RGB color images, videos, three-
estimation or extraction of such factors. These constraints dimensional (3-D) light field displays] [40]. Also, in scientific
include statistical independence, sparsity, nonnegativity, expo- computing, we often need to evaluate a discretized multivariate
nential structure, uncorrelatedness, constant modulus, finite function; this is a natural tensor, as illustrated in Figure 2(b) for
alphabet, smoothness, and unimodality. Indeed, the first four a trivariate function f (x, y, z) [23], [29], [30].
properties form the basis of ICA [12]–[14], SCA [32], NMF [21], The high dimensionality of the tensor format is therefore
and harmonic retrieval [35]. associated with blessings, which include the possibilities to obtain
compact representations, the uniqueness of decompositions, the
TENSORIZATION—BLESSING OF DIMENSIONALITY flexibility in the choice of constraints, and the generality of com-
While one-way (vectors) and two-way (matrices) algebraic struc- ponents that can be identified.
tures were, respectively, introduced as natural representations
for segments of scalar measurements and measurements on a CANONICAL POLYADIC DECOMPOSITION
grid, tensors were initially used purely for the mathematical
benefits they provide in data analysis; for instance, it seemed DEFINITION
natural to stack together excitation–emission spectroscopy A polyadic decomposition (PD) represents an Nth-order tensor
matrices in chemometrics into a third-order tensor [7]. X ! R I1 # I2 # g # I N as a linear combination of rank-1 tensors in
The procedure of creating a data tensor from lower-dimen- the form
sional original data is referred to as tensorization, and we propose R

the following taxonomy for tensor generation: X= / m r b (r1) % b (r2) % g % b (rN) . (3)
r=1
1) Rearrangement of lower-dimensional data structures:
Large-scale vectors or matrices are readily tensorized to Equivalently, X is expressed as a multilinear product with a
higher-order tensors and can be compressed through tensor diagonal core
decompositions if they admit a low-rank tensor approxima-
X = D # 1 B (1) # 2 B (2) g # N B (N)
= "D; B (1), B (2), f, B (N), , (4)
tion; this principle facilitates big data analysis [23], [29], [30]
[see Figure 2(a)]. For instance, a one-way exponential signal
x (k) = az k can be rearranged into a rank-1 Hankel matrix or where D = diag N (m 1, m 2, f, m R) [cf. the matrix case in (1)].
a Hankel tensor [36] Figure 3 illustrates these two interpretations for a third-order

IEEE SIGNAL PROCESSING MAGAZINE [148] March 2015


λ1 λR

b1 bR D BT

= + ··· + = A
X
a1 aR ar br

(I × J ) (I × R ) (R × R ) (R × J )
(a)

c1 cR
C (K × R )
λ1 λR cr
∼ + ··· + =
= b1 bR BT
A
a1 aR br
ar
(I × J × k ) (I × R ) (R × R × R ) (R × J )
(b)

[Fig3] The analogy between (a) dyadic decompositions and (b) PDs; the Tucker format has a diagonal core. The uniqueness of these
decompositions is a prerequisite for BSS and latent variable analysis.

tensor. The tensor rank is defined as the smallest value of R for UNIQUENESS
which (3) holds exactly; the minimum rank PD is called canoni- Uniqueness conditions give theoretical bounds for exact tensor
cal PD (CPD) and is desired in signal separation. The term CPD decompositions. A classical uniqueness condition is due to Kruskal
may also be considered as an abbreviation of CANDECOMP/­ [33], which states that for third-order tensors, the CPD is unique up
PARAFAC decomposition, see the “Historical Notes” section. The to unavoidable scaling and permutation ambiguities, provided that
matrix/vector form of CPD can be obtained via the Khatri–Rao k B (1) + k B (2) + k B (3) $ 2R + 2, where the Kruskal rank k B of a matrix
products (see Table 2) as B is the maximum value ensuring that any subset of k B columns is
linearly independent. In sparse modeling, the term (k B + 1) is also
T known as the spark [32]. A generalization to Nth-order tensors is
X (n) = B (n) D ^B (N) 9 g 9 B (n + 1) 9 B (n - 1) 9 g 9 B (1) h , 
due to Sidiropoulos and Bro [45] and is given by
vec (X) = [B (N) 9 B (N - 1) 9 g 9 B (1)]d, (5)
N
/ kB (n) $ 2R + N - 1. (6)
where d = [m 1, m 2, f, m R] T . n=1

RANK More relaxed uniqueness conditions can be obtained when one


As mentioned earlier, the rank-related properties are very factor matrix has full-column rank [46]–[48]; for a thorough
different for matrices and tensors. For instance, the number of study of the third-order case, we refer to [34]. This all shows that,
complex-valued rank-1 terms needed to represent a higher-order compared to matrix decompositions, CPD is unique under more
tensor can be strictly smaller than the number of real-valued natural and relaxed conditions, which only require the compo-
rank-1 terms [22], while the determination of tensor rank is in gen- nents to be sufficiently different and their number not unreason-
eral NP-hard [41]. Fortunately, in signal processing applications, ably large. These conditions do not have a matrix counterpart and
rank estimation most often corresponds to determining the num- are at the heart of tensor-based signal separation.
ber of tensor components that can be retrieved with sufficient
accuracy, and often there are only a few data components present. COMPUTATION
A pragmatic first assessment of the number of components may be Certain conditions, including Kruskal’s, enable explicit computa-
through inspection of the multilinear singular value spectrum (see tion of the factor matrices in (3) using linear algebra [essentially,
the “Tucker Decomposition” section), which indicates the size of by solving sets of linear equations and computing (generalized)
the core tensor in the right-hand side of Figure 3(b). The existing EVD] [6], [47], [49], [50]. The presence of noise in data means
techniques for rank estimation include the core consistency diag- that CPD is rarely exact, and we need to fit a CPD model to the
nostic (CORCONDIA) algorithm, which checks whether the core data by minimizing a suitable cost function. This is typically
tensor is (approximately) diagonalizable [7], while a number of achieved by minimizing the Frobenius norm of the difference
techniques operate by balancing the approximation error versus between the given data tensor and its CP approximation, or, alter-
the number of degrees of freedom for a varying number of rank-1 natively, by least absolute error fitting when the noise is Lapla-
terms [42]–[44]. cian [51]. The theoretical Cramér–Rao lower bound and

IEEE SIGNAL PROCESSING MAGAZINE [149] March 2015


Cramér–Rao induced bound for the assessment of CPD perform- conditions, and even simplify computation [64]–[66]. Moreover, the
ance were derived in [52] and [53]. orthogonality and nonnegativity constraints ensure the existence of
Since the computation of CPD is intrinsically multilinear, we the minimum of the optimization criterion used [63], [64], [67].
can arrive at the solution through a sequence of linear subprob-
lems as in the alternating least squares (ALS) framework, APPLICATIONS
whereby the least squares (LS) cost function is optimized for The CPD has already been established as an advanced tool for sig-
one component matrix at a time, while keeping the other com- nal separation in vastly diverse branches of signal processing and
ponent matrices fixed [6]. As seen from (5), such a conditional data analysis, such as in audio and speech processing, biomedical
update scheme boils down to solving overdetermined sets of engineering, chemometrics, and machine learning [7], [24], [25],
­linear equations. [28]. Note that algebraic ICA algorithms are effectively based on
While the ALS is attractive for its simplicity and satisfactory the CPD of a tensor of the statistics of recordings; the statistical
performance for a few well-separated components and at suffi- independence of the sources is reflected in the diagonality of the
ciently high signal-to-noise ratio (SNR), it also inherits the core tensor in Figure 3, i.e., in vanishing cross-statistics [11], [12].
problems of alternating algorithms and is not guaranteed to The CPD is also heavily used in exploratory data analysis, where
converge to a stationary point. This can be rectified by only the rank-1 terms capture the essential properties of dynamically
updating the factor matrix for which the cost function has most complex signals [8]. Another example is in wireless communica-
decreased at a given step [54], but this results in an N-times tion, where the signals transmitted by different users correspond
increase in computational cost per iteration. The convergence to rank-1 terms in the case of line-of-sight propagation [19]. Also,
of ALS is not yet completely understood—it is quasilinear close in harmonic retrieval and direction of arrival type applications,
to the stationary point [55], while it becomes rather slow for ill- real or complex exponentials have a rank-1 structure, for which
conditioned cases; for more details, we refer to [56] and [57]. the use of CPD is natural [36], [65].
The conventional all-at-once algorithms for numerical optimi-
zation, such as nonlinear conjugate gradients, quasi-Newton, or Example 1
nonlinear least squares (NLS) [58], [59], have been shown to often Consider a sensor array consisting of K displaced but otherwise
outperform ALS for ill-conditioned cases and to be typically more identical subarrays of I sensors, with uI = KI sensors in total.
robust to overfactoring. However, these come at the cost of a much For R narrowband sources in the far field, the baseband equiva-
higher computational load per iteration. More sophisticated ver- lent model of the array output becomes X = AS T + E, where
u
sions use the rank-1 structure of the terms within CPD to perform A ! C I # R is the global array response, S ! C J # R contains J
efficient computation and storage of the Jacobian and (approxi- snapshots of the sources, and E is the noise. A single source
mate) Hessian; their complexity is on par with ALS while, for ill- (R = 1) can be obtained from the best rank-1 approximation of
conditioned cases, the performance is often superior [60], [61]. the matrix X; however, for R 2 1, the decomposition of X is
An important difference between matrices and tensors is that not unique, and, hence, the separation of sources is not possible
the existence of a best rank- R approximation of a tensor of rank without incorporating additional information. The constraints
greater than R is not guaranteed [22], [62] since the set of ten- on the sources that may yield a unique solution are, for instance,
sors whose rank is at most R is not closed. As a result, the cost constant modulus and statistical independence [12], [68].
u
functions for computing factor matrices may only have an infi- Consider a row-selection matrix J k ! C I # I that extracts the
mum (instead of a minimum) so that their minimization will rows of X corresponding to the kth subarray, k = 1, f, K. For
approach the boundary of that set without ever reaching the two identical subarrays, the generalized EVD of the matrices
boundary point. This will cause two or more rank-1 terms go to J 1 X and J 2 X corresponds to the well-known estimation of sig-
infinity upon convergence of an algorithm; however, numerically, nal parameters via rotational invariance techniques (ESPRIT)
the diverging terms will almost completely cancel one another [69]. For the case K 2 2, we shall consider J k X as slices of the
while the overall cost function will still decrease along the itera- tensor X ! C I # J # K (see the section “Tensorization—Blessing
tions [63]. These diverging terms indicate an inappropriate data of Dimensionality”). It can be shown that the signal part of
model: the mismatch between the CPD and the original data ten- X admits a CPD as in (3) and (4), with m 1 = g = m R = 1,
(3) (3)
sor may arise because of an underestimated number of compo- J k A = B (1) diag (b k1 , f, b kR), and B (2) = S [19], and the conse-
nents, not all tensor components having a rank-1 structure, or quent source separation under rather mild conditions—its
data being too noisy. uniqueness does not require constraints such as statistical inde-
pendence or constant modulus. Moreover, the decomposition is
CONSTRAINTS unique even in cases when the number of sources, R, exceeds the
As mentioned earlier, under quite mild conditions, the CPD is number of subarray sensors, I, or even the total number of sen-
unique by itself, without requiring additional constraints. However, sors, uI. Note that particular array geometries, such as linearly
to enhance the accuracy and robustness with respect to noise, prior and uniformly displaced subarrays, can be converted into a con-
knowledge of data properties (e.g., statistical independence, spars- straint on CPD, yielding a further relaxation of the uniqueness
ity) may be incorporated into the constraints on factors so as to conditions, reduced sensitivity to noise, and often faster
facilitate their physical interpretation, relax the uniqueness ­computation [65].

IEEE SIGNAL PROCESSING MAGAZINE [150] March 2015


TUCKER DECOMPOSITION
Figure 4 illustrates the principle of TKD, which treats a tensor (l3 × R3)
X ! R I1 # I2 # g # I N as a multilinear transformation of a (typically
C
dense but small) core tensor G ! R R 1 # R 2 # g # R N by the factor
(n) (n)
matrices B (n) = [b 1 , b 2 , f, b (Rnn)] ! R I n # R n, n = 1, 2, f, N [3],

= BT
[4], given by
A
R1 R2 RN (R1 × R2 × R3)
X= / /g/ g r1 r2 grN ^ b (r11 ) % b (r22 ) % g % b (rNN )h, (7) (I1 × I2 × I3) (I × R1) (R2 × l2)
r1 = 1 r2 = 1 rN = 1

or equivalently [Fig4] The Tucker decompostion of a third-order tensor. The


column spaces of A, B, and c represent the signal subspaces
X = G # 1 B (1) # 2 B (2) g # N B (N) for the three modes. The core tensor G is nondiagonal,
= "G; B (1), B (2), f, B (N), . (8)
accounting for the possibly complex interactions among tensor
components.
Via the Kronecker products (see Table 2), TKD can be expressed
in a matrix/vector form as realizing that any factor matrix in (8) can be postmultiplied by any
nonsingular (rotation) matrix; in turn, this multiplies the core
X (n) = B (n) G (n) (B (N) 7 g 7 B (n + 1) 7 B (n - 1) 7 g 7 B (1)) T tensor by its inverse, i.e.,
vec (X) = [B (N) 7 B (N - 1) 7 g 7 B (1)] vec (G). 
X = "G; B (1), B (2), f, B (N),
Although Tucker initially used the orthogonality and ordering = "H; B (1) R (1), B (2) R (2), f, B (N) R (N),,
H = "G; R (1) , R (2) , f, R (N) , , (9)
-1 -1 -1
constraints on the core tensor and factor matrices [3], [4], we
can also employ other meaningful constraints.
where the matrices R (n) are invertible.
MULTILINEAR RANK
For a core tensor of minimal size, R 1 is the column rank (the MULTILINEAR SVD
dimension of the subspace spanned by mode-1 fibers), R 2 is the Orthonormal bases in a constrained Tucker representation can
row rank (the dimension of the subspace spanned by mode-2 be obtained via the SVD of the mode- n matricized tensor
fibers), and so on. A remarkable difference from matrices is that X (n) = U n R n V Tn (i.e., B (n) = U n, n = 1, 2, f, N) . Because of the
the values of R 1, R 2, f, R N can be different for N $ 3. The orthonormality, the corresponding core tensor becomes
N-tuple (R 1, R 2, f, R N ) is consequently called the multilinear
rank of the tensor X. S = X # 1 U T1 # 2 U T2 g # N U TN . (10)

LINKS BETWEEN CPD AND tucker decompostion


TKD can be considered an expansion in rank-1 terms (polyadic but [TABLE 3] Different forms of CPD and Tucker
representations of a third-order tensor X ! R I # J # K .
not necessary canonical), as shown in (7), while (4) represents
CPD as a multilinear product of a core tensor and factor matrices CPD TKD
(but the core is not necessary minimal); Table 3 shows various Tensor representation, outer products

other connections. However, despite the obvious interchangeabil- X=


R
/ mr ar % br % cr X=
R1
/ / /
R2 R3
g r1 r2 r3 a r1 % b r2 % c r3
ity of notation, the CPD and TKD serve different purposes. In gen- r=1 r1 = 1 r2 = 1 r3 = 1

eral, the Tucker core cannot be diagonalized, while the number of Tensor representation, multilinear products

CPD terms may not be bounded by the multilinear rank. Conse- X = D # 1A # 2B # 3C X = G # 1A # 2B # 3C


quently, in signal processing and data analysis, CPD is typically Matrix representations

used for factorizing data into easy to interpret components (i.e., X (1) = A D (C 9 B) T X (1) = A G (1) (C 7 B) T
the rank-1 terms), while the goal of unconstrained TKD is most X (2) = B D (C 9 A) T
X (2) = B G (2) (C 7 A) T
often to compress data into a tensor of smaller size (i.e., the core X (3) = C D (B 9 A) T X (3) = C G (3) (B 7 A) T
tensor) or to find the subspaces spanned by the fibers (i.e., the col- Vector representation
umn spaces of the factor matrices). vec (X) = (C 9 B 9 A) d vec (X) = (C 7 B 7 A) vec (G)
Scalar representation
UNIQUENESS R R1 R2 R3

The unconstrained TKD is in general not unique, i.e., factor matri- x ijk = / m r a ir b jr c kr x ijk = / / / g r1 r2 r3 a ir1 b jr2 c kr3
r=1 r1 = 1 r2 = 1 r3 = 1
ces B (n) are rotation invariant. However, physically, the subspaces
Matrix slices X k = X (: , : , k)
defined by the factor matrices in TKD are unique, while the bases
X k = A diag (c k1, c k2, f, c kR) B T R3
in these subspaces may be chosen arbitrarily—their choice is Xk = A / c kr 3 G (: , : , r3)B T
r3 = 1
compensated for within the core tensor. This becomes clear upon

IEEE SIGNAL PROCESSING MAGAZINE [151] March 2015


OTHER APPLICATIONS
(K ) c1 (K ) cR We have shown that TKD may be considered a multilinear
extension of PCA [8]; it therefore generalizes signal subspace
~
= B1T BRT
A1
+ ··· + techniques, with applications including classification, feature
AR
extraction, and subspace-based harmonic retrieval [27], [39],
(I × J × K ) (I × L1) (L1 × J ) (I × LR) (LR × J ) [75], [76]. For instance, a low multilinear rank approximation
(a) achieved through TKD may yield a higher SNR than the SNR in
the original raw data tensor, making TKD a very natural tool for
(K × N1)
C1 CR compression and signal enhancement [7], [8], [26].
~
= A B1T + ··· + BRT
1 1 AR R BLOCK TERM DECOMPOSITIONS
(M1 × J ) We have already shown that CPD is unique under quite mild con-
(I × J × K ) (I × L1) (LR × MR × NR)
(b) ditions. A further advantage of tensors over matrices is that it is
even possible to relax the rank-1 constraint on the terms, thus
[Fig5] BTDs find data components that are structurally more opening completely new possibilities in, e.g., BSS. For clarity, we
complex than the rank-1 terms in CPD. (a) Decomposition into shall consider the third-order case, whereby, by replacing the
terms with multilinear rank (L r , L r , 1) . (b) Decomposition into rank-1 matrices b (r1) % b (r2) = b (r1) b (r2) T in (3) by low-rank matrices
terms with multilinear rank (L r , M r , N r ) .
A r B Tr , the tensor X can be represented as [Figure 5(a)]
R
Then, the singular values of X (n) are the Frobenius norms of X= / (A r B Tr ) % c r . (11)
r=1
the corresponding slices of the core tensor S: (R n) rn, rn =
S (: , : , f, rn, : , f, :) F , with slices in the same mode being Figure 5(b) shows that we can even use terms that are only
mutually orthogonal, i.e., their inner products are zero. The col- required to have a low multilinear rank (see the “Tucker Decom-
umns of U n may thus be seen as multilinear singular vectors, position” section) to give
while the norms of the slices of the core are multilinear singular R
values [15]. As in the matrix case, the multilinear singular values X= / G r # 1 A r # 2 B r # 3 C r . (12)
r=1
govern the multilinear rank, while the multilinear singular vectors
allow, for each mode separately, an interpretation as in PCA [8]. These so-called block term decompositions (BTDs) in (11) and
(12) admit the modeling of more complex signal components
LOW MULTILINEAR RANK APPROXIMATION than CPD and are unique under more restrictive but still fairly
Analogous to PCA, a large-scale data tensor X can be approxi- natural ­conditions [77]–[79].
mated by discarding the multilinear singular vectors and slices of
the core tensor that correspond to small multilinear singular val- Example 2
ues, i.e., through truncated matrix SVDs. Low multilinear rank To compare some standard and tensor approaches for the separa-
approximation is always well posed; however, the truncation is not tion of short duration correlated sources, BSS was performed on
necessarily optimal in the LS sense, although a good estimate can five linear mixtures of the sources s 1 (t) = sin (6rt) and
often be made as the approximation error corresponds to the s 2 (t) = exp (10t) sin (20rt), which were contaminated by white
degree of truncation. When it comes to finding the best approxi- Gaussian noise, to give the mixtures X = AS + E ! R 5 # 60, where
mation, the ALS-type algorithms exhibit similar advantages and S (t) = [s 1 (t), s 2 (t)] T and A ! R 5 # 2 was a random matrix whose
drawbacks to those used for CPD [8], [70]. Optimization-based columns (mixing vectors) satisfy a T1 a 2 = 0.1, a 1 2 = a 2 2 = 1.
algorithms exploiting second-order information have also been The 3-Hz sine wave did not complete a full period over the 60 sam-
proposed [71], [72]. ples so that the two sources had a correlation degree of
(| s T1 s 2 |) / ( s 1 2 s 2 2) = 0.35. The tensor approaches, CPD, TKD,
CONSTRAINTS AND TUCKER-BASED and BTD employed a third-order tensor X of size 24 # 37 # 5
MULTIWAY COMPONENT ANALYSIS generated from five Hankel matrices whose elements obey
Besides orthogonality, constraints that may help to find unique X (i, j, k) = X (k, i + j - 1) (see the section “Tensorization—
basis vectors in a Tucker representation include statistical inde- Blessing of Dimensionality”). The average squared angular error
pendence, sparsity, smoothness, and nonnegativity [21], [73], [74]. (SAE) was used as the performance measure. Figure 6 shows the
Components of a data tensor seldom have the same properties in simulation results, illustrating the following.
its modes, and for physically meaningful representation, different ■■ PCA failed since the mixing vectors were not orthogonal
constraints may be required in different modes so as to match the and the source signals were correlated, both violating the
properties of the data at hand. Figure 1 illustrates the concept of assumptions for PCA.
multiway component analysis (MWCA) and its flexibility in choos- ■■ The ICA [using the joint approximate diagonalization of
ing the modewise constraints; a Tucker representation of MWCA eigenmatrices (JADE) algorithm [10]] failed because the sig-
naturally accommodates such diversities in different modes. nals were not statistically independent, as assumed in ICA.

IEEE SIGNAL PROCESSING MAGAZINE [152] March 2015


0.1 0.1

0 0

s1
s1

−0.1
−0.1
−0.2
−0.2
−0.3
0.05 0.1 0.15 0.2 0.05 0.1 0.15 0.2
Time (s) Time (s)
s ŝPCA ŝICA ŝCPD s ŝCPD ŝTKD ŝBTD
(a) (b)

0.3 60

0.2
40

SAE (dB)
0.1
s2

0
20
−0.1

−0.2
0
0.05 0.1 0.15 0.2 0 10 20 30 40
Time (s) SNR (dB)
s ŝCPD ŝTKD ŝBTD PCA ICA CPD TKD BTD
(c) (d)

[Fig6] The blind separation of the mixture of a pure sine wave and an exponentially modulated sine wave using PCA, ICA, CPD, TKD,
and BTD. The sources s 1 and s 2 are correlated and of short duration; the symbols st 1 and st 2 denote the estimated sources. (a)–(c)
Sources s 1 (t) and s 2 (t) and their estimates using PCA, ICA, CPD, TKD, and BTD; (d) average squared angular errors (SAE) in estimation
of the sources.

■■ Low-rank tensor approximation via a rank-2 CPD was used of the problem converted into constraints. For example, a two-
to estimate A as the third factor matrix, which was then dimensional image X ! R I1 # I2 can be vectorized as a long vector
inverted to yield the sources. The accuracy of CPD was com- x = vec (X) ! R I (I = I 1 I 2) that admits sparse representation in a
promised as the components of tensor X cannot be repre- known dictionary B ! R I # I so that x = Bg, where the matrix B
sented by rank-1 terms. may be a wavelet or discrete cosine transform dictionary. Then,
■■ Low multilinear rank approximation via TKD for the mul- faithful recovery of the original signal x requires finding the spars-
tilinear rank (4, 4, 2) was able to retrieve the column space of est vector g such that
the mixing matrix but could not find the individual mixing
y = Wg, with g 0 # K, W = UB, (13)
vectors because of the nonuniqueness of TKD.
■■ BTD in multilinear rank-(2, 2, 1) terms matched the data where · 0 is the , 0 -norm (number of nonzero entries) and
structure [78]; it is remarkable that the sources were recov- K % I.
ered using as few as six samples in the noise-free case. Since the , 0 -norm minimization is not practical, alternative
solutions involve iterative refinements of the estimates of vector g
HIGHER-ORDER COMPRESSED SENSING (ho-cs) using greedy algorithms such as the orthogonal matching pur-
The aim of CS is to provide a faithful reconstruction of a signal of suit (OMP) algorithm, or the , 1-norm minimization algorithms
^ g 1 = / i = 1 g i j [83]. Low coherence of the composite dictionary
I
interest, even when the set of available measurements is (much)
smaller than the size of the original signal [80]–[83]. Formally, we matrix W is a prerequisite for a satisfactory recovery of g (and
have available M (compressive) data samples y ! R M , which are hence x) —we need to choose U and B so that the correlation
assumed to be linear transformations of the original signal x ! R I between the columns of W is minimum [83].
(M 1 I) . In other words, y = Ux, where the sensing matrix When extending the CS framework to tensor data, we face
U ! R M # I is usually random. Since the projections are of a lower two obstacles:
dimension than the original data, the reconstruction is an ill-posed ■■ loss of information, such as spatial and contextual relation-
inverse problem whose solution requires knowledge of the physics ships in data, when a tensor X ! R I1 # I2 # g # I N is vectorized.

IEEE SIGNAL PROCESSING MAGAZINE [153] March 2015


Sparse Vector Representation (Kronecker-CS) (M1 × M2 × M3) (l1 × l2 × l3) Φ(3) (M3 × I3)
32
Measurement Vector (CS) I3 =
Sparse M3 = 32

I1 = 1,024

Φ(2)T
W Vector
M1 = 585 =
g
y ~
= M2 = 585 I2 = 1,0 (I2 × M2)
W(3) ⊗ W(2) ⊗ W(1) 24
Φ(1)
(M1M2M3) (M1M2M3 × I1I2I3) (I1I2I3) (M1 × I1)
(a) Vector Representation (a)
(1,024 × 1,024 × 32) (256 × 256 × 32)
Block Sparse Tucker Representation
Measurement Tensor (CS) (M3 × I3)
Block Sparse W(3)
Core Tensor

~
= W(1) W(2)

(M1 × M2 × M3) (M1 × l1) (l1 × l2 × l3) (M2 × l2)


(b)
(b) Tensor Representation (1,024 × 1,024 × 32) (256 × 256 × 32)

[Fig7] CS with a Kronecker-structured dictionary. OMP can


perform faster if the sparse entries belong to a small subtensor,
up to permutation of the columns of W (1), W (2), and W (3) .

■■ data handling since the size of vectorized data and the


associated dictionary B ! R I # I easily becomes prohibitively
large (see the section “Large-Scale Data and the Curse of
Dimensionality”), especially for tensors of high order. (c)
Fortunately, tensor data are typically highly structured, a per- [Fig8] The multidimensional CS of a 3-D hyperspectral image
fect match for compressive sampling, so that the CS framework using Tucker representation with a small sparse core in wavelet
bases. (a) The Kronecker-CS of a 32-channel hyperspectral image.
relaxes data acquisition requirements, enables compact storage,
(b) The original hyperspectral image-RGB display. (c) The
and facilitates data completion (i.e., inpainting of missing samples reconstruction (SP = 33%, PSNR = 35.51 dB)-RGB display.
due to a faulty sensor or unreliable measurement).

KRONECKER-CS FOR FIXED DICTIONARIES Y , G # 1 W (1) # 2 W (2) g # N W (N), (14)


In many applications, the dictionary and the sensing matrix
admit a Kronecker structure (Kronecker-CS model), as illustrated with G 0 # K, for a given set of modewise dictionaries B (n) and
in Figure 7(a) [84]. In this way, the global composite dictionary sensing matrices U (n) (n = 1, 2, f, N) . Working with several
matrix becomes W = W (N) 7 W (N - 1) 7 g 7 W (1), where each small dictionary matrices, appearing in a Tucker representation,
term W (n) = U (n) B (n) has a reduced dimensionality since instead of a large global dictionary matrix, is an example of the
B (n) ! R I n # I n and U (n) ! R M n # I n . Denote M = M 1 M 2 gM N and use of tensor structure for efficient representation; see also the
I = I 1 I 2 gI N , then, since M n # I n, n = 1, 2, f, N, this reduces section “Large-Scale Data and the Curse of Dimensionality.”
storage requirements by a factor of (R n I n M n) / (MI) . The compu- A higher-order extension of the OMP algorithm, referred to as
tation of Wg is affordable since g is sparse; however, computing the Kronecker-OMP algorithm [85], requires K iterations to find
W T y is expensive but can be efficiently implemented through a the K nonzero entries of the core tensor G. Additional computa-
sequence of products involving much smaller matrices W (n) [85]. tional advantages can be gained if it can be assumed that the K
We refer to [84] for links between the coherence of factor matri- nonzero entries belong to a small subtensor of G, as shown in
ces W (n) and the coherence of the global composite dictionary Figure 7(b); such a structure is inherent to, e.g., hyperspectral
matrix W. imaging [85], [86] and 3-D astrophysical signals. More precisely, if
Figure 7 and Table 3 illustrate that the Kronecker-CS model the K = LN nonzero entries are located within a subtensor of size
is effectively a vectorized TKD with a sparse core. The tensor (L # L # g # L), where L % I n, then, by exploiting the block-
equivalent of the CS paradigm in (13) is therefore to find the tensor structure, the so-called N-way block OMP algorithm
sparsest core tensor G such that (N-BOMP) requires at most NL iterations, which is linear in N

IEEE SIGNAL PROCESSING MAGAZINE [154] March 2015


[85]. The Kronecker-CS model has been applied in magnetic res-
[TABLE 4] Storage cost of tensor models for an
onance imaging, hyperspectral imaging, and in the inpainting of N th-order tensor X ! R I # I # g # I for which the storage
multiway data [86], [84]. REQUIREMENT for raw data is O (I N) .

1) canonical polyadic decomposition O (NIR)


APPROACHES WITHOUT FIXED DICTIONARIES 2) Tucker O (NIR + R N)
In Kronecker-CS, the modewise dictionaries B (n) ! R I n # I n can be 3) tensor train O (NIR 2)
chosen so as best to represent the physical properties or prior 4) quantized tensor train O (NR 2 log 2 (I))
knowledge about the data. They can also be learned from a large
ensemble of data tensors, for instance, in an ALS-type fashion
[86]. Instead of the total number of sparse entries in the core ten- Nth-order (I # I # g # I) tensor, I N , scales exponentially with
sor, the size of the core (i.e., the multilinear rank) may be used as the tensor order N. For example, the number of values of a discre-
a measure for sparsity so as to obtain a low-complexity represen- tized function in Figure 2(b) quickly becomes unmanageable in
tation from compressively sampled data [87], [88]. Alternatively, a terms of both computations and storing as N increases. In addi-
CPD representation can be used instead of a Tucker representa- tion to their standard use (signal separation, enhancement, etc.),
tion. Indeed, early work in chemometrics involved excitation– tensor decompositions may be elegantly employed in this context
emission data for which part of the entries was unreliable because as efficient representation tools. The first question is, which type
of scattering; the CPD of the data tensor is then computed by of tensor decomposition is appropriate?
treating such entries as missing [7]. While CS variants of several
CPD algorithms exist [59], [89], the oracle properties of tensor- EFFICIENT DATA HANDLING
based models are still not as well understood as for their standard If all computations are performed on a CP representation and not
models; a notable exception is CPD with sparse factors [90]. on the raw data tensor itself, then, instead of the original I N raw
data entries, the number of parameters in a CP representation
Example 3 reduces to NIR, which scales linearly in N (see Table 4). This
Figure 8 shows an original 3-D (1,024#1,024#32) hyperspectral effectively bypasses the curse of dimensionality, while giving us the
image X, which contains scene reflectance measured at 32 differ- freedom to choose the rank, R, as a function of the desired accuracy
ent frequency channels, acquired by a low-noise Peltier-cooled dig- [16]; on the other hand, the CP approximation may involve numer-
ital camera in the wavelength range of 400–720 nm [91]. Within ical problems (see the section “Canonical Polyadic Decomposition”).
the Kronecker-CS setting, the tensor of compressive measure- Compression is also inherent to TKD as it reduces the size of a
ments Y was obtained by multiplying the frontal slices given data tensor from the original I N to (NIR + R N ), thus exhib-
by random Gaussian sensing matrices U (1) ! R M1 # 1024 and iting an approximate compression ratio of (I/R) N. We can then
U (2) ! R M2 # 1024 (M 1, M 2 1 1, 024) in the first and second mode, benefit from the well understood and reliable approximation by
respectively, while U (3) ! R 32 # 32 was the identity matrix [see means of matrix SVD; however, this is only useful for low N.
­Figure 8(a)]. We used Daubechies wavelet factor matrices
B (1) = B (2) ! R 1024 # 1024 and B (3) ! R 32 # 32, and employed the TENSOR NETWORKS
N-way block tensor N-BOMP to recover the small sparse core tensor A numerically reliable way to tackle curse of dimensionality is
and, subsequently, reconstruct the original 3-D image, as shown through a concept from scientific computing and quantum infor-
in Figure 8(b). For the sampling ratio SP=33% (M 1 = M 2 = 585) mation theory, termed tensor networks, which represents a tensor
this gave the peak SNR (PSNR) of 35.51 dB, while taking 71 min of a possibly very high order as a set of sparsely interconnected
for N iter = 841 iterations needed to detect the subtensor which matrices and core tensors of low order (typically, order 3). These
contains the most significant entries. For the same quality of low-dimensional cores are interconnected via tensor contractions
­reconstruction (PSNR = 35.51 dB), the more conventional to provide a highly compressed representation of a data tensor. In
­Kronecker-OMP algorithm found 0.1% of the wavelet coefficients addition, existing algorithms for the approximation of a given ten-
as significant, thus requiring N iter = K = 0.001 # (1, 024 # sor by a tensor network have good numerical properties, making it
1, 024 # 32) = 33, 555 iterations and days of computation time.

LARGE-SCALE DATA AND THE CURSE OF DIMENSIONALITY


The sheer size of tensor data easily exceeds the memory or satu- A (2) (3)
(1) B
rates the processing capability of standard computers; it is, there-
fore, natural to ask ourselves how tensor decompositions can be (l1 × R1) (R2 × l3 × R3) (R4 × l5)
(R1 × l2 × R2) (R3 × l4 × R4)
computed if the tensor dimensions in all or some modes are large
or, worse still, if the tensor order is high. The term curse of
[Fig9] The TT decomposition of a fifth-order tensor X ! R I 1 # I 2 # g # I 5,
dimensionality, in a general sense, was introduced by Bellman to consisting of two matrix carriages and three third-order tensor
refer to various computational bottlenecks when dealing with carriages. The five carriages are connected through tensor
high-dimensional settings. In the context of tensors, the curse of contractions, which can be expressed in a scalar form as x i 1, i 2, i 3, i 4, i 5 =
dimensionality refers to the fact that the number of elements of an / Rr11= 1 / Rr22= 1 f / Rr55= 1 a i1,r1 g (r11,)i2,r2 g (r22,)i3,r3 g (r33,)i4,r5 b r4,i5 .

IEEE SIGNAL PROCESSING MAGAZINE [155] March 2015


C(1)
(1) ∼
=
A(1) (1)
B(1)T

...
C
C(k)

(k)

=
(1)
BT
A(k) B(k )T
(k )

...
(K) C(K ) A
(K ) ∼
=
(k) A(K ) B(K )T
(K )

[Fig10] Efficient computation of CPD and TKD, whereby tensor decompositions are computed in parallel for sampled blocks. These are
then merged to obtain the global components A, B, and C, and a core tensor G.

possible to control the error and achieve any desired accuracy of COMPUTATION OF THE
approximation. For example, tensor networks allow for the DECOMPOSITION/REPRESENTATION
representation of a wide class of discretized multivariate functions Now that we have addressed the possibilities for efficient tensor rep-
even in cases where the number of function values is larger than resentation, the question that needs to be answered is how these
the number of atoms in the universe [23], [29], [30]. representations can be computed from the data in an efficient man-
Examples of tensor networks are the hierarchical TKD and ten- ner. The first approach is to process the data in smaller blocks
sor trains (TTs) (see Figure 9) [17], [18]. The TTs are also known as rather than in a batch manner [95]. In such a divide-and-conquer
matrix product states and have been used by physicists for more approach, different blocks may be processed in parallel, and their
than two decades (see [92] and [93] and references therein). The decompositions may be carefully recombined (see Figure 10) [95],
PARATREE algorithm was developed in signal processing and fol- [96]. In fact, we may even compute the decomposition through
lows a similar idea; it uses a polyadic representation of a data ten- recursive updating as new data arrive [97]. Such recursive tech-
sor (in a possibly nonminimal ­number of terms), whose niques may be used for efficient computation and for tracking
computation then requires only the matrix SVD [94]. decompositions in the case of nonstationary data.
For very large-scale data that exhibit a well-defined structure, The second approach would be to employ CS ideas (see the sec-
an even more radical approach to achieve a parsimonious tion “Higher-Order Compressed Sensing (HO-CS)”) to fit an alge-
representation may be through the concept of quantized or quan- braic model with a limited number of parameters to possibly large
tic tensor networks (QTNs) [29], [30]. For example, a huge vector data. In addition to enabling data completion (interpolation of
x ! R I with I = 2 L elements can be quantized and tensorized missing data), this also provides a significant reduction of the cost
into a (2 # 2 # g # 2) tensor X of order L, as illustrated in Fig- of data acquisition, manipulation, and storage, breaking the curse
ure 2(a). If x is an exponential signal, x (k) = az k, then X is a of dimensionality being an extreme case.
symmetric rank-1 tensor that can be represented by two parame- While algorithms for this purpose are available both for low-
ters: the scaling factor a and the generator z (cf. (2) in the sec- rank and low multilinear rank representation [59], [87], an even
tion “Tensorization—Blessing of Dimensionality”). Nonsymmetric more drastic approach would be to directly adopt sampled fibers
terms provide further opportunities, beyond the sum-of-exponen- as the bases in a tensor representation. In the TKD setting, we
tial representation by symmetric low-rank tensors. Huge matrices would choose the columns of the factor matrices B (n) as
and tensors may be dealt with in the same manner. For instance, mode-n fibers of the tensor, which requires us to address the fol-
an Nth-order tensor X ! R I1 # g # I N , with I n = q L n, can be quan- lowing two problems: 1) how to find fibers that allow us to accurately
tized in all modes simultaneously to yield a (q # q # g # q) represent the tensor and 2) how to compute the corresponding core
quantized tensor of higher order. In QTN, q is small, typically tensor at a low cost (i.e., with minimal access to the data). The mat-
q = 2, 3, 4, e.g., the binary encoding ^ q = 2 h reshapes an Nth rix counterpart of this problem (i.e., representation of a large
-order tensor with (2 L 1 # 2 L2 # g # 2 L N ) elements into a tensor matrix on the basis of a few columns and rows) is referred to as
of order (L 1 + L 2 + g + L N ) with the same number of elements. the pseudoskeleton approximation [98], where the optimal
The TT decomposition applied to quantized tensors is referred to representation corresponds to the columns and rows that inter-
as the quantized TT (QTT); variants for other tensor representa- sect in the submatrix of maximal volume (maximal absolute
tions have also been derived [29], [30]. In scientific computing, value of the determinant). Finding the optimal submatrix is
such formats provide the so-called supercompression—a logarith- computationally hard, but quasioptimal submatrices may be
mic reduction of storage requirements: O (I N ) " O (N log q (I)). found by heuristic so-called cross-approximation methods that

IEEE SIGNAL PROCESSING MAGAZINE [156] March 2015


only require a limited, partial exploration of the data matrix.
Tucker variants of this approach have been derived in [99]–[101] Entry of Maximum Absolute
Value Within a Fiber in the C(3) Two-Way CA:
and are illustrated in Figure 11, while a cross-approximation for
Residual Tensor PCA, ICA,
the TT format has been derived in [102]. Following a somewhat NMF, . . .
different idea, a tensor generalization of the CUR decomposition C(1)
of matrices samples fibers on the basis of statistics derived from
~
=
the data [103].
C(2)
MULTIWAY REGRESSION—HIGHER-ORDER PARTIAL LS

[Fig11] The Tucker representation through fiber sampling and


MULTIVARIATE REGRESSION cross-approximation: the columns of factor matrices are sampled
Regression refers to the modeling of one or more dependent from the fibers of the original data tensor X. Within MWCA, the
variables (responses), Y, by a set of independent data (predic- selected fibers may be further processed using BSS algorithms.
tors), X. In the simplest case of conditional mean square esti-
mation (MSE), whereby yt = E (y | x), the response y is a linear continue deflating until the rank of the X-block is exhausted so as to
combination of the elements of the vector of predictors x; for balance between prediction accuracy and model order.
multivariate data, the multivariate linear regression (MLR) uses The PLS concept can be generalized to tensors in the follow-
a matrix model, Y = XP + E, where P is the matrix of coeffi- ing ways:
cients (loadings) and E is the residual matrix. The MLR solu- 1) Unfolding multiway data. For example, tensors X (I # J # K)
tion gives P = (X T X) -1 X T Y and involves inversion of the and Y (I # M # N) can be flattened into long matrices X (I # JK)
moment matrix X T X. A common technique to stabilize the and Y (I # MN) so as to admit matrix-PLS (see ­Figure 12).
inverse of the moment matrix X T X is the principal component However, such flattening prior to standard bilinear PLS obscures
regression (PCR), which employs low-rank approximation of X. the structure in multiway data and compromises the interpret-
ation of latent components.
MODELING STRUCTURE IN DATA—THE PARTIAL LS 2) Low-rank tensor approximation. The so-called N-PLS
Note that in stabilizing multivariate regression, PCR uses only attempts to find score vectors having maximal covariance
information in the X variables, with no feedback from the Y varia- with response variables, under the constraints that tensors X
bles. The idea behind the partial LS (PLS) method is to account for and Y are decomposed as a sum of rank-1 tensors [104].
structure in data by assuming that the underlying system is gov- 3) A BTD-type approximation. As in the higher-order PLS
erned by a small number, R, of specifically constructed latent vari- (HOPLS) model shown in Figure 13 [105], the use of block
ables, called scores, that are shared between the X and Y variables; terms within HOPLS equips it with additional flexibility,
in estimating the number R, PLS compromises between fitting X together with a more physically meaningful analysis than
and predicting Y. Figure 12 illustrates that the PLS procedure: unfolding-PLS and N-PLS.
1) uses eigenanalysis to perform contraction of the data matrix X The principle of HOPLS can be formalized as a set of sequen-
to the principal eigenvector score matrix T = [t 1, f, t R] of rank R tial approximate decompositions of the independent tensor
and 2) ensures that the t r components are maximally correlated X ! R I1 # I2 # g # I N and the dependent tensor Y ! R J1 # J2 # g # J M
with the u r components in the approximation of the responses Y, (with I 1 = J 1) so as to ensure maximum similarity (correlation)
this is achieved when the u r\ s are scaled versions of the t r\s. The between the scores t r and u r within the matrices T and U,
Y-variables are then regressed on the matrix U = [u 1, f, u R] . based on
Therefore, PLS is a multivariate model with inferential ability that
aims to find a representation of X (or a part of X) that is relevant
R
for predicting Y, using the model PT pr
X ~
= T =
R (R × N ) r=1 tr
X = TP T + E = / t r p Tr + E, (15)
r=1 (I × N ) (I × R )
R
Y = UQ T + F = / u r q Tr + F. (16) R
r=1 QT qr
~
= U =
Y
The score vectors t r provide an LS fit of X-data, while at the (R × M ) r=1 ur
same time, the maximum correlation between t and u scores
(I × M ) (I × R )
ensures a good predictive model for Y variables. The predicted
responses Ynew are then obtained from new data X new and the [Fig12] The basic PLS model performs joint sequential low-rank
loadings P and Q. approximation of the matrix of predictors X and the matrix of
In practice, the score vectors t r, are extracted sequentially, by a responses Y so as to share (up to the scaling ambiguity) the
latent components—columns of the score matrices T and U. The
series of orthogonal projections followed by the deflation of X. Since matrices p and Q are the loading matrices for predictors and
the rank of Y is not necessarily decreased with each new t r, we may responses, and e and F are the corresponding residual matrices.

IEEE SIGNAL PROCESSING MAGAZINE [157] March 2015


the training stage is to identify the HOPLS parameters:
P(2) (I3 × L3) P(2) (I3 × L3) G (Xr), G (Yr), P (rn), Q (rn) (see Figure 13). In the test stage, the move-
1 R
ment trajectories, Y *, for the new ECoG data, X *, are predicted
~
= through multilinear projections: 1) the new scores, t *r , are found
+ ··· +
from new data, X *, and the existing model p­ arameters:
t1 P(1)T tR P(1)T
G (X), P (r1), P (r2), P (r3), and 2) the predicted trajectory is calculated as
r
1 R
(I1 × I2 × I3) (L2 × I2) (L2 × I2)
Y * . / rR= 1 G Y # 1 t *r # 2 Q (r1) # 3 Q (r2) # 4 Q (r3) . In the simulations,
(I ) (I ) (r)

standard PLS was applied in the same way to the unfolded tensors.
P(2) (I3 × RL3)
Figure 14(c) shows that although the standard PLS was able
... to predict the movement corresponding to each marker indi-
= T
X
P(1)T vidually, such a prediction is quite crude as the two-way PLS
does not adequately account for mutual information among the
(I1 × R) (R × RL2 × RL3) (RL2 × I2)
four markers. The enhanced predictive performance of the BTD-
based HOPLS [the red line in Figure 14(c)] is therefore attrib-
Q(2) (J3 × L3) Q(2) (J3 × L3) uted to its ability to model interactions between complex latent
1 R
components of both predictors and responses.
~
= + ··· +
u1 LINKED MULTIWAY COMPONENT ANALYSIS
Q(1)T uR Q(1)T
1 R
(I1 × J2 × J3) (J ) AND TENSOR DATA FUSION
(L2 × J2) (J ) (L2 × J2)
Data fusion concerns the joint analysis of an ensemble of data
Q(2) (J3 × RL3) sets, such as multiple views of a particular phenomenon, where
some parts of the scene may be visible in only one or a few data
... sets. Examples include the fusion of visual and thermal images
= U Q(1)T
Y in low-visibility conditions and the analysis of human electro-
(J1 × R) (R × RL2 × RL3) (RL2 × J2) physiological signals in response to a certain stimulus but from
different subjects and trials; these are naturally analyzed
together by means of matrix/tensor factorizations. The coupled
[Fig13] The principle of HOPLS for third-order tensors. The core
tensors G X and G Y are block-diagonal. The BTD-type structure nature of the analysis of such multiple data sets ensures that we
allows for the modeling of general components that are highly are able to account for the common factors across the data sets
correlated in the first mode. and, at the same time, to guarantee that the individual compo-
nents are not shared (e.g., processes that are independent of exci-
R
X, / G (Xr) # 1 t r # 2 P (r1) g # N P (rN - 1) (17) tations or stimuli/tasks).
r=1 The linked multiway component analysis (LMWCA) [106],
R
Y, / G (Yr) # 1 u r # 2 Q (r1) g # N Q (rM - 1) . (18) shown in Figure 15, performs such a decomposition into shared
r=1 and individual factors and is formulated as a set of approxi-
mate joint TKD of a set of data tensors X (k) ! R I1 # I2 # g # I N ,
A number of data-analytic problems can be reformulated as either (k = 1, 2, f, K)
regression or similarity analysis [analysis of variance (ANOVA),
autoregressive moving average modeling (ARMA), linear discri- X (k) , G (k) # 1 B (1, k) # 2 B (2, k) g # N B (N, k), (19)
minant analysis (LDA), and canonical correlation analysis (CCA)],
(n)
so that both the matrix and tensor PLS solutions can be general- where each factor matrix B (n, k) = [B C , B (In, k)] ! R I n # R n has
(n)
ized across exploratory data analysis. 1) components B C ! R I n # C n (with 0 # C n # R n) that are common
(i.e., maximally correlated) to all tensors and 2) components
B (I , ) ! R I n # (R n - C n) that are tensor specific. The objective is to esti-
nk
Example 4
(n)
The predictive power of tensor-based PLS is illustrated on a real- mate the common components B C , the individual components
(n, k)
world example of the prediction of arm movement trajectory from B I , and, via the core tensors G (k), their mutual interactions. As
the electrocorticogram (ECoG). Figure 14(a) illustrates the experi- in MWCA (see the section “Tucker Decomposition”), constraints
mental setup, whereby the 3-D arm movement of a monkey was may be imposed to match data properties [73], [76]. This enables a
captured by an optical motion capture system with reflective more general and flexible framework than group ICA and independ-
markers affixed to the left shoulder, elbow, wrist, and hand; for full ent vector analysis, which also performs linked analysis of multiple
details, see http://neurotycho.org. The predictors (32 ECoG chan- data sets but assume that 1) there exist only common components
nels) naturally build a fourth-order tensor X (time#channel_no and 2) the corresponding latent variables are statistically independ-
#epoch_length#frequency) while the movement trajectories for ent [107], [108]. Both are quite stringent and limiting assumptions.
the four markers (response) can be represented as a third-order As an alternative to TKD, coupled tensor decompositions may be of
tensor Y (time#3D_marker_position#marker_no). The goal of a polyadic or even block term type [89], [109].

IEEE SIGNAL PROCESSING MAGAZINE [158] March 2015


Data Acquisition Training Prediction
Prediction of 3-D Hand Trajectory
Trajectory HOPLS PLS

X-Position
6
Tensor (Limb Trajectories) 4
2
Motion Capture Tensorization 0
20 40 60 80 100
Time (s)

Time
1

Z-Position Y-Position
0
–1
–2
X(t ) –3
20 40 60 80 100
Y(t ) ker
Mar Time (s)
Marker Coordinates Z(t ) 2
1
0
–1
20 40 60 80 100
Time (s)

Model
ECoG
Parameters
18 17 16 15

Layout HOPLS
5

Predictor
11 10 9

21 20 19
27 26 25
14 13 12
8

30 29 28
24 23 22
1
3
6

32 31
2
4
7

Tensor (Regressions) Data Tensor from New Recordings

ECoG Recordings
Ch1
Ch2 Tensorization
Time

Time
Ch3
...

...

...

Fr Fre
Ch32 e qu l q l
en
han
ne ue nne
0 5 cy C nc
y Cha
Time (s)
(a) (b) (c)

[Fig14] The prediction of arm movement from brain electrical responses. (a) The experiment setup. (b) The construction of the
data and response tensors and training. (c) The new data tensor (bottom) and the predicted 3-D arm movement trajectories
(X, Y, Z coordinates) obtained by tensor-based HOPLS and standard matrix-based PLS (top).

Example 5 are attributed to the fact that the classification makes use of only
We employed LWCA for classification based on common and dis- the common components and is not hindered by components that
tinct features of natural objects from the ETH-80 database (http:// are not shared across objects or views.
www.d2.mpi-inf.mpg.de/Data sets/ETH80) whereby the discrimi-
nation among objects was performed using only the common fea- SOFTWARE
tures. This data set c­ onsists of 3,280 images in eight categories, The currently available software resources for tensor decompo-
each containing ten objects with 41 views per object. For each cat- sitions include:
egory, the training data were organized in two distinct fourth- ■■ The tensor toolbox, a versatile framework for basic opera-
order (128 # 128 # 3 # I 4) tensors, where I 4 = 10 # 41 # 0.5p, tions on sparse and dense tensors, including CPD and Tucker
where p denotes the fraction of training data. LMWCA was applied formats [111].
to these two tensors to find the common and individual features, ■■ The TDALAB and TENSORBOX, which provide a user-
with the number of common features set to 80% of I 4 . In this friendly interface and advanced algorithms for CPD, nonneg-
way, eight sets of common features were obtained for each cat- ative TKD, and MWCA [112], [113].
egory. The test sample label was assigned to the category whose ■■ The Tensorlab toolbox builds upon the complex optimiza-
common features matched the new sample best (evaluated by tion framework and offers numerical algorithms for comput-
canonical correlations) [110]. F ­ igure 16 compares LMWCA with ing the CPD, BTD, and TKD; the toolbox includes a library of
the standard K-nearest neighbors (K-NNs) and LDA classifiers constraints (e.g., nonnegativity and orthogonality) and the
(using 50 principal components as features), all averaged over 50 possibility to combine and jointly factorize dense, sparse, and
Monte Carlo runs. The enhanced classification results for LMWCA incomplete tensors [89].

IEEE SIGNAL PROCESSING MAGAZINE [159] March 2015


B(3) B(3, 1)
C I

B(1) B(1, 1) B(3, 1)


C I
B(2)T
~
= C
(1) (1) B(2, 1)T B(2, 1)T Sample Images from Different and Same Categories
B(1, 1) I

Training Data Common Features


for Each Category


LMWCA
B(3) B(3, K )

Apple
C I

(l3 × R3) Test Sample


BC(1) B(1, K ) B(3, K )
I
B(2)T
C
~
=
(K ) (K ) B(2, K )T B(2, K )T
B(1, K ) I
LMWCA Find the Class
Whose Common

Cow
(l1 × I2 × I3) (l1 × R1) (R2 × I2) Features Best
(R1 × R2 × R3) Match the Test
Sample

[Fig15] Coupled TKD for LMWCA. The data tensors have (a)
both shared and individual components. Constraints such 95
as orthogonality, statistical independence, sparsity, and
nonnegativity may be imposed where appropriate. 90

85
Accuracy (%)

■■ The N-way toolbox, which includes (constrained) CPD,


TKD, and PLS in the context of chemometrics applications 80
[114]; many of these methods can handle constraints (e.g.,
nonnegativity and orthogonality) and missing elements. 75
LWCA
■■ The TT toolbox, the Hierarchical Tucker toolbox, and the
70 KNN−PCA
Tensor Calculus library provide tensor tools for scientific LDA−PCA
computing [115]–[117]. 65
■■ Code developed for multiway analysis is also available from 10 20 30 40 50
Proportion of Training Data Used (%)
the Three-Mode Company [118].
(b)

CONCLUSIONS AND FUTURE DIRECTIONS [Fig16] The classification of color objects belonging to different
We live in a world overwhelmed by data, from multiple pictures categories. By using only common features, LMWCA achieves a
high classification rate, even when the training set is small. (a)
of Big Ben on various social Web links to terabytes of data in Classification based on LMWCA. (b) Performance comparison.
multiview medical imaging, while we may also need to repeat
the scientific experiments many times to obtain the ground
truth. Each snapshot gives us a somewhat incomplete view of sensors. We have also discussed multilinear variants of several
the same object and involves different angles, illumination, standard signal processing tools such as multilinear SVD, ICA,
lighting conditions, facial expressions, and noise. NMF, and PLS and have shown that tensor methods can operate
We have shown that tensor decompositions are a perfect in a deterministic way on signals of very short duration.
match for exploratory analysis of such multifaceted data sets At present, the uniqueness conditions of standard tensor
and have illustrated their applications in multisensor and multi- models are relatively well understood and efficient computation
modal signal processing. Our emphasis has been to show that algorithms do exist. However, for future applications, several
tensor decompositions and multilinear algebra open up com- challenging problems remain to be addressed in more depth.
pletely new possibilities for component analysis, as compared ■■ A whole new area emerges when several decompositions
with the flat view of standard two-way methods. that operate on different data sets are coupled, as in multi-
Unlike matrices, tensors are multiway arrays of data samples view data where some details of interest are visible in, e.g.,
whose representations are typically overdetermined (fewer only one mode. Such techniques need theoretical support in
parameters in the decomposition than the number of data terms of existence, uniqueness, and numerical properties.
entries). This gives us an enormous flexibility in finding hidden ■■ As the complexity of advanced models increases, their
components in data and the ability to enhance both robustness computation requires efficient iterative algorithms, extend-
to noise and tolerance to missing data samples and faulty ing beyond the ALS class.

IEEE SIGNAL PROCESSING MAGAZINE [160] March 2015


■■ The estimation of the number of components in data and 2011. He worked as a deputy head of the Research and Develop-
the assessment of their dimensionality would benefit from ment Department, Broadcast Research and Application Center,
automation, especially in the presence of noise and outliers. Vietnam Television, and is currently a research scientist at the
■■ Both new theory and algorithms are needed to further Laboratory for Advanced Brain Signal Processing and a visiting
extend the flexibility of tensor models, e.g., for the con- research scientist at the Toyota Collaboration Center, RIKEN
straints to be combined in many ways and tailored to the par- Brain Science Institute, Japan. He has served on the editorial
ticular signal properties in different modes. board of International Journal of Computational Mathematics.
■■ Work on efficient techniques for saving and/or fast process- His research interests include multilinear algebra, tensor compu-
ing of ultra-large-scale tensors is urgent; these now routinely tation, blind source separation, and brain–computer interfaces.
occupy terabytes, and will soon require petabytes of memory. Cesar F. Caiafa ([email protected]) received the Ph.D.
■■ Tools for rigorous performance analysis and rule of thumb degree in engineering from the Faculty of Engineering, Univer-
performance bounds need to be further developed across ten- sity of Buenos Aires, in 2007. He is currently an adjunct
sor decomposition models. researcher with the Argentinean Radioastronomy Institute
■■ Our discussion has been limited to tensor models in which (IAR)—CONICET and an assistant professor with Faculty of
all entries take values independently of one another. Probabil- Engineering, the University of Buenos Aires. He is also a visiting
istic versions of tensor decompositions incorporate prior scientist at the Laboratory for Advanced Brain Signal Process-
knowledge about complex variable interaction, various data ing, BSI—RIKEN, Japan.
alphabets, or noise distributions, and so promise to model Guoxu Zhou ([email protected]) received the Ph.D.
data more accurately and efficiently [119], [120]. degree in intelligent signal and information processing from the
■■ The future computational, visualization, and interpret- South China University of Technology, Guangzhou, in 2010. He is
ation tools will be important next steps in supporting the dif- currently a research scientist at the Laboratory for Advanced Brain
ferent communities working on large-scale and big data Signal Processing at RIKEN Brain Science Institute, Japan. His
analysis problems. research interests include statistical signal processing, tensor ana-
It is fitting to conclude with a quote from the French novelist lysis, intelligent information processing, and machine learning.
Marcel Proust: “The voyage of discovery is not in seeking new Qibin Zhao ([email protected]) received the Ph.D. degree
landscapes but in having new eyes.” We hope to have helped to from the Department of Computer Science and Engineering,
bring to the eyes of the signal processing community the multi- Shanghai Jiao Tong University, China, in 2009. He is currently a
disciplinary developments in tensor decompositions and to have research scientist at the Laboratory for Advanced Brain Signal
shared our enthusiasm about tensors as powerful tools to dis- Processing in RIKEN Brain Science Institute, Japan, and a visit-
cover new landscapes. ing research scientist in the BSI Toyota Collaboration Center,
RIKEN-BSI. His research interests include multiway data ana-
AUTHORS lysis, brain–computer interface, and machine learning.
Andrzej Cichocki ([email protected]) received the Ph.D. and Lieven De Lathauwer (Lieven.DeLathauwer@kuleuven-­kulak.be)
Dr.Sc. (habilitation) degrees all in electrical engineering from the received the Ph.D. degree from the Faculty of Engineering, KU Leu-
Warsaw University of Technology, Poland. He is currently a senior ven, Belgium, in 1997. From 2000 to 2007, he was a research associ-
team leader of the Laboratory for Advanced Brain Signal Process- ate with the Centre National de la Recherche Scientifique, France.
ing at RIKEN Brain Science Institute, Japan, and a professor at the He is currently a professor with KU Leuven. He is affiliated with the
Systems Research Institute, Polish Academy of Science, Poland. group Science, Engineering, and Technology of Kulak, the Stadius
He has authored more than 400 publications and four mono- Center for Dynamical Systems, Signal Processing, and Data Analytics
graphs in the areas of signal processing and computational neuro- of the Electrical Engineering Department (ESAT), and iMinds Future
science. He is an associate editor of IEEE Transactions on Signal Health Department. He is an associate editor of SIAM Journal on
Processing and Journal of Neuroscience Methods. Matrix Analysis and Applications and was an associate editor of
Danilo P. Mandic ([email protected]) is a professor of IEEE Transactions on Signal Processing. His research focuses on
signal processing at Imperial College London, United Kingdom, and the development of tensor tools for engineering applications.
has been working in the area of nonlinear and multidimensional
adaptive signal processing and time-frequency analysis. His publica-
References
tion record includes two research monographs, Recurrent Neural [1] F. L. Hitchcock, “Multiple invariants and generalized rank of a p-way matrix or
tensor,” J. Math. Phys., vol. 7, no. 1, pp. 39–79, 1927.
Networks for Prediction and Complex Valued Nonlinear Adaptive
[2] R. Cattell, “Parallel proportional profiles and other principles for determining
Filters: Noncircularity, Widely Linear and Neural Models, an edited the choice of factors by rotation,” Psychometrika, vol. 9, pp. 267–283, 1944.
book, Signal Processing for Information Fusion, and more than [3] L. R. Tucker, “The extension of factor analysis to three-dimensional matrices,”
200 publications on signal and image processing. He has been a in Contributions to Mathematical Psychology, H. Gulliksen and N. Frederiksen,
Eds. New York: Holt, Rinehart and Winston, 1964, pp. 110–127.
guest professor at KU Leuven, Belgium, and a frontier researcher at [4] L. R. Tucker, “Some mathematical notes on three-mode factor analysis,” Psy-
RIKEN Brain Science Institute, Tokyo, Japan. chometrika, vol. 31, no. 3, pp. 279–311, Sept. 1966.
Anh Huy Phan ([email protected]) received the Ph.D. [5] J. Carroll and J.-J. Chang, “Analysis of individual differences in multidimen-
sional scaling via an n -way generalization of ‘Eckart-Young’ decomposition,” Psy-
degree from the Kita Kyushu Institute of Technology, Japan, in chometrika, vol. 35, no. 3, pp. 283–319, Sept. 1970.

IEEE SIGNAL PROCESSING MAGAZINE [161] March 2015


[6] R. A. Harshman, “Foundations of the PARAFAC procedure: Models and condi- [37] A. Belouchrani, K. Abed-Meraim, J.-F. Cardoso, and É. Moulines, “A blind
tions for an explanatory multimodal factor analysis,” UCLA Working Pap. Phonet., source separation technique using second-order statistics,” IEEE Trans. Signal
vol. 16, pp. 1–84, 1970. Processing, vol. 45, no. 2, pp. 434–444, 1997.
[7] A. Smilde, R. Bro, and P. Geladi, Multi-Way Analysis: Applications in the [38] F. Miwakeichi, E. Martnez-Montes, P. Valds-Sosa, N. Nishiyama, H. Mizu-
Chemical Sciences. Hoboken, NJ: Wiley, 2004. hara, and Y. Yamaguchi, “Decomposing EEG data into space–time–frequency
components using parallel factor analysis,” NeuroImage, vol. 22, no. 3, pp. 1035–
[8] P. Kroonenberg, Applied Multiway Data Analysis. Hoboken, NJ: Wiley, 2008. 1045, 2004.
[9] C. Nikias and A. Petropulu, Higher-Order Spectra Analysis: A Nonlinear Sig- [39] M. Vasilescu and D. Terzopoulos, “Multilinear analysis of image ensembles:
nal Processing Framework. Englewood Cliffs, NJ: Prentice Hall, 1993. Tensorfaces,” in Proc. European Conf. on Computer Vision (ECCV), Copenhagen,
[10] J.-F. Cardoso and A. Souloumiac, “Blind beamforming for non-Gaussian sig- Denmark, May 2002, vol. 2350, pp. 447–460.
nals,” in IEE Proc. F (Radar and Signal Processing), vol. 140, no. 6, IET, 1993, [40] M. Hirsch, D. Lanman, G.Wetzstein, and R. Raskar, “Tensor displays,” in Proc.
pp. 362–370. Int. Conf. Computer Graphics and Interactive Techniques, SIGGRAPH 2012, Los
[11] P. Comon, “Independent component analysis: A new concept?” Signal Pro- Angeles, CA, USA, Aug. 5-9, 2012, Emerging Technologies Proc., 2012, pp. 24–42.
cess., vol. 36, no. 3, pp. 287–314, 1994. [41] J. Håstad, “Tensor rank is NP-complete,” J. Algorithms, vol. 11, no. 4, pp. 644–
[12] P. Comon and C. Jutten, Eds., Handbook of Blind Source Separation: Inde- 654, 1990.
pendent Component Analysis and Applications. New York, Academic, 2010. [42] M. Timmerman and H. Kiers, “Three mode principal components analysis:
Choosing the numbers of components and sensitivity to local optima,” Br. J. Math.
[13] A. Cichocki and S. Amari, Adaptive Blind Signal and Image Processing.
Stat. Psychol., vol. 53, no. 1, pp. 1–16, 2000.
Hoboken, NJ: Wiley, 2003.
[43] E. Ceulemans and H. Kiers, “Selecting among three-mode principal compo-
[14] A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component Analysis. nent models of different types and complexities: A numerical convex-hull based
New York: Wiley, 2001. method,” Br. J. Math Stat Psychol., vol. 59, no. 1, pp. 133–150, May 2006.
[15] L. De Lathauwer, B. De Moor, and J. Vandewalle, “A multilinear singular value [44] M. Mørup and L. K. Hansen, “Automatic relevance determination for multiway
decomposition,” SIAM J. Matrix Anal. Appl., vol. 21, no. 4, pp. 1253–1278, 2000. models,” J. Chemomet., Special Issue: In Honor of Professor Richard A. Harsh-
[16] G. Beylkin and M. Mohlenkamp, “Algorithms for numerical analysis in high man, vol. 23, no. 7–8, pp. 352–363, 2009.
dimensions,” SIAM J. Sci. Comput., vol. 26, no. 6, pp. 2133–2159, 2005. [45] N. Sidiropoulos and R. Bro, “On the uniqueness of multilinear decomposition
[17] J. Ballani, L. Grasedyck, and M. Kluge, “Black box approximation of tensors of N-way arrays,” J. Chemomet., vol. 14, no. 3, pp. 229–239, 2000.
in hierarchical Tucker format,” Linear Algebr. Appl., vol. 433, no. 2, pp. 639–657, [46] T. Jiang and N. D. Sidiropoulos, “Kruskal’s permutation lemma and the identi-
2011. fication of CANDECOMP/PARAFAC and bilinear models,” IEEE Trans. Signal Pro-
[18] I. V. Oseledets, “Tensor-train decomposition,” SIAM J. Sci. Comput., vol. 33, cessing, vol. 52, no. 9, pp. 2625–2636, 2004.
no. 5, pp. 2295–2317, 2011. [47] L. De Lathauwer, “A link between the canonical decomposition in multilinear
[19] N. Sidiropoulos, R. Bro, and G. Giannakis, “Parallel factor analysis in sensor ar- algebra and simultaneous matrix diagonalization,” SIAM J. Matrix Anal. Appl., vol. 28,
ray processing,” IEEE Trans. Signal Processing, vol. 48, no. 8, pp. 2377–2388, 2000. no. 3, pp. 642–666, 2006.

[20] N. Sidiropoulos, G. Giannakis, and R. Bro, “Blind PARAFAC receivers for DS- [48] A. Stegeman, “On uniqueness conditions for Candecomp/Parafac and
CDMA systems,” IEEE Trans. Signal Processing, vol. 48, no. 3, pp. 810–823, 2000. Indscal with full column rank in one mode,” Linear Algebr. Appl., vol. 431, no. 1–2,
pp. 211–227, 2009.
[21] A. Cichocki, R. Zdunek, A.-H. Phan, and S. Amari, Nonnegative Matrix and [49] E. Sanchez and B. Kowalski, “Tensorial resolution: A direct trilinear decompo-
Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and sition,” J. Chemomet., vol. 4, no. 1, pp. 29–45, 1990.
Blind Source Separation. Hoboken, NJ: Wiley, 2009.
[50] I. Domanov and L. De Lathauwer, “Canonical polyadic decomposition of third-
[22] J. Landsberg, Tensors: Geometry and Applications. AMS, 2012. order tensors: Reduction to generalized eigenvalue decomposition,” SIAM Anal.
[23] W. Hackbusch, Tensor Spaces and Numerical Tensor Calculus (ser. Springer Appl., vol. 35, no. 2, pp. 636–660, 2014.
series in computational mathematics). Heidelberg: Springer, 2012, vol. 42. [51] S. Vorobyov, Y. Rong, N. Sidiropoulos, and A. Gershman, “Robust iterative
[24] E. Acar and B. Yener, “Unsupervised multiway data analysis: A literature sur- fitting of multilinear models,” IEEE Trans. Signal Processing, vol. 53, no. 8,
vey,” IEEE Trans. Knowledge Data Eng., vol. 21, no. 1, pp. 6–20, 2009. pp. 2678–2689, 2005.
[25] T. Kolda and B. Bader, “Tensor decompositions and applications,” SIAM Rev., [52] X. Liu and N. Sidiropoulos, “Cramér-Rao lower bounds for low-rank decompo-
vol. 51, no. 3, pp. 455–500, Sept. 2009. sition of multidimensional arrays,” IEEE Trans. Signal Processing, vol. 49, no. 9,
pp. 2074–2086, Sept. 2001.
[26] P. Comon, X. Luciani, and A. L. F. de Almeida, “Tensor decompositions, alter-
nating least squares and other tales,” J. Chemomet., vol. 23, no. 7–8, pp. 393–405, [53] P. Tichavský, A.-H. Phan, and Z. Koldovský, “Cramér-Rao-induced bounds for
2009. CANDECOMP/PARAFAC tensor decomposition,” IEEE Trans. Signal Processing,
vol. 61, no. 8, pp. 1986–1997, 2013.
[27] H. Lu, K. Plataniotis, and A. Venetsanopoulos, “A survey of multilinear subspace
learning for tensor data,” Pattern Recognit., vol. 44, no. 7, pp. 1540–1551, 2011. [54] B. Chen, S. He, Z. Li, and S. Zhang, “Maximum block improvement and poly-
nomial optimization,” SIAM J. Optim., vol. 22, no. 1, pp. 87–107, 2012.
[28] M. Mørup, “Applications of tensor (multiway array) factorizations and decom- [55] A. Uschmajew, “Local convergence of the alternating least squares algorithm
positions in data mining,” Wiley Interdisc. Rew.: Data Mining Knowled. Discov., for canonical tensor approximation,” SIAM J. Matrix Anal. Appl., vol. 33, no. 2,
vol. 1, no. 1, pp. 24–40, 2011. pp. 639–652, 2012.
[29] B. Khoromskij, “Tensors-structured numerical methods in scientific com- [56] M. J. Mohlenkamp, “Musings on multilinear fitting,” Linear Algebr. Appl.,
puting: Survey on recent advances,” Chemomet. Intell. Lab. Syst., vol. 110, no. 1, vol. 438, no. 2, pp. 834–852, 2013.
pp. 1–19, 2011.
[57] M. Razaviyayn, M. Hong, and Z.-Q. Luo, “A unified convergence analysis of
[30] L. Grasedyck, D. Kressner, and C. Tobler, “A literature survey of low-rank tensor block successive minimization methods for nonsmooth optimization,” SIAM J. Op-
approximation techniques,” CGAMM-Mitteilungen, vol. 36, no. 1, pp. 53–78, 2013. tim., vol. 23, no. 2, pp. 1126–1153, 2013.
[31] P. Comon, “Tensors: A brief introduction,” IEEE Signal Processing Mag., [58] P. Paatero, “The multilinear engine: A table-driven least squares program for
vol. 31, no. 3, pp. 44–53, May 2014. solving multilinear problems, including the n-way parallel factor analysis model,” J.
[32] A. Bruckstein, D. Donoho, and M. Elad, “From sparse solutions of systems Computat. Graph. Stat., vol. 8, no. 4, pp. 854–888, Dec. 1999.
of equations to sparse modeling of signals and images,” SIAM Rev., vol. 51, no. 1, [59] E. Acar, D. Dunlavy, T. Kolda, and M. Mørup, “Scalable tensor factorizations
pp. 34–81, 2009. for incomplete data,” Chemomet. Intell. Lab. Syst., vol. 106, no. 1, pp. 41–56, 2011.
[33] J. Kruskal, “Three-way arrays: Rank and uniqueness of trilinear decomposi- [60] A.-H. Phan, P. Tichavský, and A. Cichocki, “Low complexity damped Gauss-
tions, with application to arithmetic complexity and statistics,” Linear Algebr. Newton algorithms for CANDECOMP/PARAFAC,” SIAM J. Matrix Anal. Appl. (SI-
Appl., vol. 18, no. 2, pp. 95–138, 1977. MAX), vol. 34, no. 1, pp. 126–147, 2013.
[34] I. Domanov and L. De Lathauwer, “On the uniqueness of the canonical poly- [61] L. Sorber, M. Van Barel, and L. De Lathauwer, “Optimization-based ­algorithms
adic decomposition of third-order tensors—Part I: Basic results and uniqueness of for tensor decompositions: Canonical Polyadic Decomposition, decomposition in
one factor matrix and part II: Uniqueness of the overall decomposition,” SIAM J. rank-(L r, L r, 1) terms and a new generalization,” SIAM J. Optim., vol. 23, no. 2,
Matrix Anal. Appl., vol. 34, no. 3, pp. 855–903, 2013. pp. 695–720, 2013.
[35] M. Elad, P. Milanfar, and G. H. Golub, “Shape from moments—An estimation [62] V. de Silva and L.-H. Lim, “Tensor rank and the ill-posedness of the best low-
theory perspective,” IEEE Trans. Signal Processing, vol. 52, no. 7, pp. 1814–1829, rank approximation problem,” SIAM J. Matrix Anal. Appl., vol. 30, pp. 1084–1127,
2004. Sept. 2008.
[36] N. Sidiropoulos, “Generalizing Caratheodory’s uniqueness of harmonic [63] W. Krijnen, T. Dijkstra, and A. Stegeman, “On the non-existence of optimal
­parameterization to N dimensions,” IEEE Trans. Inform. Theory, vol. 47, no. 4, solutions and the occurrence of “degeneracy” in the Candecomp/Parafac model,”
pp. 1687–1690, 2001. Psychometrika, vol. 73, no. 3, pp. 431–439, 2008.

IEEE SIGNAL PROCESSING MAGAZINE [162] March 2015


[64] M. Sørensen, L. De Lathauwer, P. Comon, S. Icart, and L. Deneire, “Canoni- Info-Media Systems in Asia, SISA-2013, Nagoya, Japan, Oct. 1, 2013, 2013, 30
cal Polyadic Decomposition with orthogonality constraints,” SIAM J. Matrix Anal. pages. [Online]. Available: http://arxiv.org/pdf/1403.2048.pdf
Appl., vol. 33, no. 4, pp. 1190–1213, 2012.
[93] R. Orus, “A practical introduction to tensor networks: Matrix product states
[65] M. Sørensen and L. De Lathauwer, “Blind signal separation via tensor de- and projected entangled pair states,” J. Chem. Phys., 2013.
composition with Vandermonde factor: Canonical polyadic decomposition,” IEEE [94] J. Salmi, A. Richter, and V. Koivunen, “Sequential unfolding SVD for ten-
Trans. Signal Processing, vol. 61, no. 22, pp. 5507–5519, Nov. 2013. sors with applications in array signal processing,” IEEE Trans. Signal Processing,
[66] G. Zhou and A. Cichocki, “Canonical Polyadic Decomposition based on a vol. 57, no. 12, pp. 4719–4733, 2009.
single mode blind source separation,” IEEE Signal Processing Lett., vol. 19, no. 8, [95] A.-H. Phan and A. Cichocki, “PARAFAC algorithms for large-scale problems,”
pp. 523–526, 2012. Neurocomputing, vol. 74, no. 11, pp. 1970–1984, 2011.
[67] L.-H. Lim and P. Comon, “Nonnegative approximations of nonnegative ten- [96] S. K. Suter, M. Makhynia, and R. Pajarola, “TAMRESH: Tensor approximation
sors,” J. Chemomet., vol. 23, nos. 7–8, pp. 432–441, 2009. multiresolution hierarchy for interactive volume visualization,” Comput. Graph.
[68] A. van der Veen and A. Paulraj, “An analytical constant modulus algorithm,” Forum, vol. 32, no. 3, pp. 151–160, 2013.
IEEE Trans. Signal Processing, vol. 44, no. 5, pp. 1136–1155, 1996. [97] D. Nion and N. Sidiropoulos, “Adaptive algorithms to track the PARAFAC de-
[69] R. Roy and T. Kailath, “Esprit—estimation of signal parameters via rota- composition of a third-order tensor,” IEEE Trans. Signal Processing, vol. 57, no. 6,
tional invariance techniques,” IEEE Trans. Acoust., Speech, Signal Processing, pp. 2299–2310, June 2009.
vol. 37, no. 7, pp. 984–995, 1989. [98] S. A. Goreinov, N. L. Zamarashkin, and E. E. Tyrtyshnikov, “Pseudo-skeleton
[70] L. De Lathauwer, B. De Moor, and J. Vandewalle, “On the best rank-1 and approximations by matrices of maximum volume,” Math. Notes, vol. 62, no. 4,
rank- R 1, R 2, f, R N approximation of higher-order tensors,” SIAM J. Matrix Anal. pp. 515–519, 1997.
Appl., vol. 21, no. 4, pp. 1324–1342, 2000. [99] C. Caiafa and A. Cichocki, “Generalizing the column-row matrix decomposi-
[71] B. Savas and L.-H. Lim, “Quasi-Newton methods on Grassmannians and tion to multi-way arrays,” Linear Algebr. Appl., vol. 433, no. 3, pp. 557–573, 2010.
multilinear approximations of tensors,” SIAM J. Sci. Comput., vol. 32, no. 6, [100] S. A. Goreinov, “On cross approximation of multi-index array,” Doklady
pp. 3352–3393, 2010. Math., vol. 420, no. 4, pp. 404–406, 2008.
[72] M. Ishteva, P.-A. Absil, S. Van Huffel, and L. De Lathauwer, “Best low multi- [101] I. Oseledets, D. V. Savostyanov, and E. Tyrtyshnikov, “Tucker dimensionality
linear rank approximation of higher-order tensors, based on the Riemannian trust- reduction of three-dimensional arrays in linear time,” SIAM J. Matrix Anal. Appl.,
region scheme,” SIAM J. Matrix Anal. Appl., vol. 32, no. 1, pp. 115–135, 2011. vol. 30, no. 3, pp. 939–956, 2008.
[73] G. Zhou and A. Cichocki, “Fast and unique Tucker decompositions via multiway [102] I. Oseledets and E. Tyrtyshnikov, “TT-cross approximation for multidimen-
blind source separation,” Bull. Polish Acad. Sci., vol. 60, no. 3, pp. 389–407, 2012. sional arrays,” Linear Algebr. Appl., vol. 432, no. 1, pp. 70–88, 2010.
[74] A. Cichocki, “Generalized component analysis and blind source separation [103] M. W. Mahoney, M. Maggioni, and P. Drineas, “Tensor-CUR decompositions
methods for analyzing mulitchannel brain signals,” in Statistical and Process for tensor-based data,” SIAM J. Matrix Anal. Appl., vol. 30, no. 3, pp. 957–987, 2008.
Models for Cognitive Neuroscience and Aging. Lawrence Erlbaum Associates, [104] R. Bro, “Multiway calibration. Multilinear PLS,” J. Chemomet., vol. 10,
2007, pp. 201–272. no. 1, pp. 47–61, 1996.
[75] M. Haardt, F. Roemer, and G. D. Galdo, “Higher-order SVD based subspace [105] Q. Zhao, C. Caiafa, D. Mandic, Z. Chao, Y. Nagasaka, N. Fujii, L. Zhang,
estimation to improve the parameter estimation accuracy in multi-dimensional and A. Cichocki, “Higher-order partial least squares (HOPLS): A generalized multi-
harmonic retrieval problems,” IEEE Trans. Signal Processing, vol. 56, no. 7, linear regression method,” IEEE Trans. Pattern Anal. Mach. Intell. (PAMI), vol. 35,
pp. 3198–3213, July 2008. no. 7, pp. 1660–1673, 2013.
[76] A.-H. Phan and A. Cichocki, “Tensor decompositions for feature extraction and [106] A. Cichocki, “Tensors decompositions: New concepts for brain data analysis?”
classification of high dimensional data sets,” Nonlinear Theory Appl., IEICE, vol. 1, J. Control, Measure., Syst. Integr. (SICE), vol. 47, no. 7, pp. 507–517, 2011.
no. 1, pp. 37–68, 2010.
[107] V. Calhoun, J. Liu, and T. Adali, “A review of group ICA for fMRI data and
[77] L. De Lathauwer, “Decompositions of a higher-order tensor in block terms— ICA for joint inference of imaging, genetic, and ERP data,” Neuroimage, vol. 45,
Part I and II,” SIAM J. Matrix Anal. Appl. (SIMAX) Special Issue on Tensor Decom- pp. 163–172, 2009.
positions and Applications, vol. 30, no. 3, pp. 1022–1066, 2008.
[108] Y.-O. Li, T. Adali, W. Wang, and V. Calhoun, “Joint blind source separation by
[78] L. De Lathauwer, “Blind separation of exponential polynomials and the decom- multiset canonical correlation analysis,” IEEE Trans. Signal Processing, vol. 57,
position of a tensor in rank- (L r, L r, 1) terms,” SIAM J. Matrix Anal. Appl., vol. 32, no. 10, pp. 3918–3929, Oct. 2009.
no. 4, pp. 1451–1474, 2011.
[109] E. Acar, T. Kolda, and D. Dunlavy, “All-at-once optimization for coupled
[79] L. De Lathauwer, “Block component analysis: A new concept for blind source sep- matrix and tensor factorizations,” in Proc. Mining and Learning with Graphs,
aration,” in Proc. 10th Int. Conf. LVA/ICA, Tel Aviv, Israel, Mar. 12–15, 2012, pp. 1–8. (MLG’11), San Diego, CA, August 2011.
[80] E. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact sig- [110] G. Zhou, A. Cichocki, S. Xie, and D. Mandic. (2013). Beyond canonical
nal reconstruction from highly incomplete frequency information,” IEEE Trans. correlation analysis: Common and individual features analysis. IEEE Trans.
Inform. Theory, vol. 52, no. 2, pp. 489–509, 2006. Pattern Anal. Mach. Intell. [Online]. Available: http://arxiv.org/abs/1212.3913
[81] E. J. Candes and T. Tao, “Near-optimal signal recovery from random projec- [111] B. Bader, T. G. Kolda et al. (2012, Feb.). MATLAB tensor toolbox version
tions: Universal encoding strategies?” IEEE Trans. Inform. Theory, vol. 52, no. 12, 2.5. [Online]. Available: http://www.sandia.gov/ tgkolda/TensorToolbox/
pp. 5406–5425, 2006.
[112] G. Zhou and A. Cichocki. (2013). TDALAB: Tensor decomposition laboratory,
[82] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inform. Theory, vol. 52, LABSP, Wako-shi, Japan. [Online]. Available: http://bsp.brain.riken.jp/TDALAB/
no. 4, pp. 1289–1306, 2006.
[113] A.-H. Phan, P. Tichavský, and A. Cichocki. (2012). TENSORBOX: A MAT-
[83] Y. Eldar and G. Kutyniok, Compressed Sensing: Theory and Applications, LAB package for tensor decomposition, LABSP, RIKEN, Japan. [Online]. Avail-
vol. 20. New York: Cambridge Univ. Press, 2012, p. 12. able: http://www.bsp.brain.riken.jp/ phan/tensorbox.php
[84] M. F. Duarte and R. G. Baraniuk, “Kronecker compressive sensing,” IEEE [114] C. Andersson and R. Bro. (2000). The N-way toolbox for MATLAB. [­Online].
Trans. Image Processing, vol. 21, no. 2, pp. 494–504, 2012. Chemomet. Intell. Lab. Syst., 52(1), pp. 1–4, 2000. Available: http://www.­
models.life.ku.dk/nwaytoolbox
[85] C. Caiafa and A. Cichocki, “Computing sparse representations of multi-
dimensional signals using Kronecker bases,” Neural Computat., vol. 25, no. 1, [115] I. Oseledets. (2012). TT-toolbox 2.2. [Online]. Available: https://github.
pp. 186–220, 2013. com/oseledets/TT-Toolbox
[86] C. Caiafa and A. Cichocki, “Multidimensional compressed sensing and their ap- [116] D. Kressner and C. Tobler. (2012). htucker—A MATLAB toolbox for tensors
plications,” WIREs Data Mining Knowled. Discov., vol. 3, no. 6, pp. 355–380, 2013. in hierarchical Tucker format. MATHICSE, EPF Lausanne. [Online]. Available:
http://anchp.epfl.ch/htucker
[87] S. Gandy, B. Recht, and I. Yamada, “Tensor completion and low-n-rank tensor
recovery via convex optimization,” Inverse Prob., vol. 27, no. 2, pp. 1–19, 2011. [117] M. Espig, M. Schuster, A. Killaitis, N. Waldren, P. Wähnert, S. Hand-
schuh, and H. Auer. (2012). Tensor calculus library. [Online]. Available: http://­
[88] M. Signoretto, Q. T. Dinh, L. De Lathauwer, and J. A. K. Suykens, “Learning gitorious.org/tensorcalculus
with tensors: A framework based on convex optimization and spectral regulariza-
tion,” Mach. Learn., vol. 94, no. 3, pp. 303–351, Mar. 2014. [118] P. Kroonenberg. The three-mode company: A company devoted to creating
three-mode software and promoting three-mode data analysis. [Online]. Avail-
[89] L. Sorber, M. Van Barel, and L. De Lathauwer. (2014, Jan.). Tensorlab v2.0. able: http://three-mode.leidenuniv.nl/.
[Online]. Available: www.tensorlab.net
[119] Z. Xu, F. Yan, and A. Qi, “Infinite Tucker decomposition: Nonparametric
[90] N. Sidiropoulos and A. Kyrillidis, “Multi-way compressed sensing for sparse Bayesian models for multiway data analysis,” in Proc. 29th Int. Conf. Machine
low-rank tensors,” IEEE Signal Processing Lett., vol. 19, no. 11, pp. 757–760, 2012. Learning (ICML-12), ser. ICML’12. Omnipress, July 2012, pp. 1023–1030.
[91] D. Foster, K. Amano, S. Nascimento, and M. Foster, “Frequency of metamer- [120] K. Yilmaz and A. T. Cemgil, “Probabilistic latent tensor factorisation,” in
ism in natural scenes,” J. Opt. Soc. Amer. A, vol. 23, no. 10, pp. 2359–2372, 2006. Proc. Int. Conf. Latent Variable Analysis and Signal Separation, cPCI-S, 2010,
vol. 6365, pp. 346–353.
[92] A. Cichocki, “Era of big data processing: A new approach via tensor networks
and tensor decompositions (invited talk),” in Proc. 2013 Int. Workshop on Smart [SP]

IEEE SIGNAL PROCESSING MAGAZINE [163] March 2015

You might also like