Tensors IEEE SPM March 2015
Tensors IEEE SPM March 2015
Tensor
Decompositions
for Signal Processing
Applications
[From two-way to multiway component analysis]
T
he widespread use of multisensor technology and the emergence of big data
sets have highlighted the limitations of standard flat-view matrix models and
the necessity to move toward more versatile data analysis tools. We show that
higher-order tensors (i.e., multiway arrays) enable such a fundamental para-
digm shift toward models that are essentially polynomial, the uniqueness of
which, unlike the matrix methods, is guaranteed under very mild and natural conditions.
Benefiting from the power of multilinear algebra as their mathematical backbone, data
analysis techniques using tensor decompositions are shown to have great flexibility in the
choice of constraints which match data properties and extract more general latent compo-
nents in the data than matrix-based methods.
A comprehensive introduction to tensor decompositions is provided from a signal process-
ing perspective, starting from the algebraic foundations, via basic canonical polyadic and Tucker
models, to advanced cause-effect and multiview data analysis schemes. We show that tensor
decompositions enable natural generalizations of some commonly used signal processing para-
image licensed by graphic stock digms, such as canonical correlation and subspace techniques, signal separation, linear regres-
sion, feature extraction, and classification. We also cover computational aspects and point out
Digital Object Identifier 10.1109/MSP.2013.2297439 how ideas from compressed sensing (CS) and scientific computing may be used for addressing
Date of publication: 12 February 2015 the otherwise unmanageable storage and manipulation issues associated with big data sets. The
~ Σ
...
I = U1 V1T (a)
... K ~
= A2 B2T (b)
I
X(i, :, :) K X(3) K (ICA)
J
J ~
= A3 B3T = S (c)
...
Unfolding
[Fig1] MWCA for a third-order tensor, assuming that the components are (a) principal and orthogonal in the first mode,
(b) nonnegative and sparse in the second mode, and (c) statistically independent in the third mode.
C = "A; B (1), B (2), f, B (N), full multilinear product, C = A # 1 B (1) # 2 B (2) g # N B (N)
C = A%B tensor or outer product of A ! R I 1 # I 2 # g # I N and B ! R J1 # J2 # g # J M yields C ! R I 1 # I 2 # g # I N # J1 # J 2 # g # J M with
entries c i 1 i 2 gi N j 1 j 2 gj M = a i 1 i 2 gi N b j 1 j 2 gj M
X = a (1) % a (2) % g % a (N) tensor or outer product of vectors a (n) ! R I n (n = 1, f, N) yields a rank-1 tensor X ! R I 1 # I 2 # g # I N
(1) (2) (N)
with entries x i 1 i 2f i N = a i 1 a i 2 f a i N
C = A7B Kronecker product of A ! R I 1 # I 2 and B ! R J 1 # J 2 yields C ! R I 1 J 1 # I 2 J 2 with entries
c (i 1 - 1) J 1 + j 1,(i 2 - 1) J 2 + j 2 = a i 1 i 2 b j 1 j 2
C = A9B Khatri–Rao product of A = [a 1, f, a R] ! R I # R and B = [b 1, f, b R] ! R J # R yields C ! R IJ # R with columns
cr = ar 7 br
+
+
2∆
the following taxonomy for tensor generation: X= / m r b (r1) % b (r2) % g % b (rN) . (3)
r=1
1) Rearrangement of lower-dimensional data structures:
Large-scale vectors or matrices are readily tensorized to Equivalently, X is expressed as a multilinear product with a
higher-order tensors and can be compressed through tensor diagonal core
decompositions if they admit a low-rank tensor approxima-
X = D # 1 B (1) # 2 B (2) g # N B (N)
= "D; B (1), B (2), f, B (N), , (4)
tion; this principle facilitates big data analysis [23], [29], [30]
[see Figure 2(a)]. For instance, a one-way exponential signal
x (k) = az k can be rearranged into a rank-1 Hankel matrix or where D = diag N (m 1, m 2, f, m R) [cf. the matrix case in (1)].
a Hankel tensor [36] Figure 3 illustrates these two interpretations for a third-order
b1 bR D BT
∼
= + ··· + = A
X
a1 aR ar br
(I × J ) (I × R ) (R × R ) (R × J )
(a)
c1 cR
C (K × R )
λ1 λR cr
∼ + ··· + =
= b1 bR BT
A
a1 aR br
ar
(I × J × k ) (I × R ) (R × R × R ) (R × J )
(b)
[Fig3] The analogy between (a) dyadic decompositions and (b) PDs; the Tucker format has a diagonal core. The uniqueness of these
decompositions is a prerequisite for BSS and latent variable analysis.
tensor. The tensor rank is defined as the smallest value of R for UNIQUENESS
which (3) holds exactly; the minimum rank PD is called canoni- Uniqueness conditions give theoretical bounds for exact tensor
cal PD (CPD) and is desired in signal separation. The term CPD decompositions. A classical uniqueness condition is due to Kruskal
may also be considered as an abbreviation of CANDECOMP/ [33], which states that for third-order tensors, the CPD is unique up
PARAFAC decomposition, see the “Historical Notes” section. The to unavoidable scaling and permutation ambiguities, provided that
matrix/vector form of CPD can be obtained via the Khatri–Rao k B (1) + k B (2) + k B (3) $ 2R + 2, where the Kruskal rank k B of a matrix
products (see Table 2) as B is the maximum value ensuring that any subset of k B columns is
linearly independent. In sparse modeling, the term (k B + 1) is also
T known as the spark [32]. A generalization to Nth-order tensors is
X (n) = B (n) D ^B (N) 9 g 9 B (n + 1) 9 B (n - 1) 9 g 9 B (1) h ,
due to Sidiropoulos and Bro [45] and is given by
vec (X) = [B (N) 9 B (N - 1) 9 g 9 B (1)]d, (5)
N
/ kB (n) $ 2R + N - 1. (6)
where d = [m 1, m 2, f, m R] T . n=1
eral, the Tucker core cannot be diagonalized, while the number of Tensor representation, multilinear products
used for factorizing data into easy to interpret components (i.e., X (1) = A D (C 9 B) T X (1) = A G (1) (C 7 B) T
the rank-1 terms), while the goal of unconstrained TKD is most X (2) = B D (C 9 A) T
X (2) = B G (2) (C 7 A) T
often to compress data into a tensor of smaller size (i.e., the core X (3) = C D (B 9 A) T X (3) = C G (3) (B 7 A) T
tensor) or to find the subspaces spanned by the fibers (i.e., the col- Vector representation
umn spaces of the factor matrices). vec (X) = (C 9 B 9 A) d vec (X) = (C 7 B 7 A) vec (G)
Scalar representation
UNIQUENESS R R1 R2 R3
The unconstrained TKD is in general not unique, i.e., factor matri- x ijk = / m r a ir b jr c kr x ijk = / / / g r1 r2 r3 a ir1 b jr2 c kr3
r=1 r1 = 1 r2 = 1 r3 = 1
ces B (n) are rotation invariant. However, physically, the subspaces
Matrix slices X k = X (: , : , k)
defined by the factor matrices in TKD are unique, while the bases
X k = A diag (c k1, c k2, f, c kR) B T R3
in these subspaces may be chosen arbitrarily—their choice is Xk = A / c kr 3 G (: , : , r3)B T
r3 = 1
compensated for within the core tensor. This becomes clear upon
0 0
s1
s1
−0.1
−0.1
−0.2
−0.2
−0.3
0.05 0.1 0.15 0.2 0.05 0.1 0.15 0.2
Time (s) Time (s)
s ŝPCA ŝICA ŝCPD s ŝCPD ŝTKD ŝBTD
(a) (b)
0.3 60
0.2
40
SAE (dB)
0.1
s2
0
20
−0.1
−0.2
0
0.05 0.1 0.15 0.2 0 10 20 30 40
Time (s) SNR (dB)
s ŝCPD ŝTKD ŝBTD PCA ICA CPD TKD BTD
(c) (d)
[Fig6] The blind separation of the mixture of a pure sine wave and an exponentially modulated sine wave using PCA, ICA, CPD, TKD,
and BTD. The sources s 1 and s 2 are correlated and of short duration; the symbols st 1 and st 2 denote the estimated sources. (a)–(c)
Sources s 1 (t) and s 2 (t) and their estimates using PCA, ICA, CPD, TKD, and BTD; (d) average squared angular errors (SAE) in estimation
of the sources.
■■ Low-rank tensor approximation via a rank-2 CPD was used of the problem converted into constraints. For example, a two-
to estimate A as the third factor matrix, which was then dimensional image X ! R I1 # I2 can be vectorized as a long vector
inverted to yield the sources. The accuracy of CPD was com- x = vec (X) ! R I (I = I 1 I 2) that admits sparse representation in a
promised as the components of tensor X cannot be repre- known dictionary B ! R I # I so that x = Bg, where the matrix B
sented by rank-1 terms. may be a wavelet or discrete cosine transform dictionary. Then,
■■ Low multilinear rank approximation via TKD for the mul- faithful recovery of the original signal x requires finding the spars-
tilinear rank (4, 4, 2) was able to retrieve the column space of est vector g such that
the mixing matrix but could not find the individual mixing
y = Wg, with g 0 # K, W = UB, (13)
vectors because of the nonuniqueness of TKD.
■■ BTD in multilinear rank-(2, 2, 1) terms matched the data where · 0 is the , 0 -norm (number of nonzero entries) and
structure [78]; it is remarkable that the sources were recov- K % I.
ered using as few as six samples in the noise-free case. Since the , 0 -norm minimization is not practical, alternative
solutions involve iterative refinements of the estimates of vector g
HIGHER-ORDER COMPRESSED SENSING (ho-cs) using greedy algorithms such as the orthogonal matching pur-
The aim of CS is to provide a faithful reconstruction of a signal of suit (OMP) algorithm, or the , 1-norm minimization algorithms
^ g 1 = / i = 1 g i j [83]. Low coherence of the composite dictionary
I
interest, even when the set of available measurements is (much)
smaller than the size of the original signal [80]–[83]. Formally, we matrix W is a prerequisite for a satisfactory recovery of g (and
have available M (compressive) data samples y ! R M , which are hence x) —we need to choose U and B so that the correlation
assumed to be linear transformations of the original signal x ! R I between the columns of W is minimum [83].
(M 1 I) . In other words, y = Ux, where the sensing matrix When extending the CS framework to tensor data, we face
U ! R M # I is usually random. Since the projections are of a lower two obstacles:
dimension than the original data, the reconstruction is an ill-posed ■■ loss of information, such as spatial and contextual relation-
inverse problem whose solution requires knowledge of the physics ships in data, when a tensor X ! R I1 # I2 # g # I N is vectorized.
I1 = 1,024
Φ(2)T
W Vector
M1 = 585 =
g
y ~
= M2 = 585 I2 = 1,0 (I2 × M2)
W(3) ⊗ W(2) ⊗ W(1) 24
Φ(1)
(M1M2M3) (M1M2M3 × I1I2I3) (I1I2I3) (M1 × I1)
(a) Vector Representation (a)
(1,024 × 1,024 × 32) (256 × 256 × 32)
Block Sparse Tucker Representation
Measurement Tensor (CS) (M3 × I3)
Block Sparse W(3)
Core Tensor
~
= W(1) W(2)
...
C
C(k)
(k)
∼
=
(1)
BT
A(k) B(k )T
(k )
...
(K) C(K ) A
(K ) ∼
=
(k) A(K ) B(K )T
(K )
[Fig10] Efficient computation of CPD and TKD, whereby tensor decompositions are computed in parallel for sampled blocks. These are
then merged to obtain the global components A, B, and C, and a core tensor G.
possible to control the error and achieve any desired accuracy of COMPUTATION OF THE
approximation. For example, tensor networks allow for the DECOMPOSITION/REPRESENTATION
representation of a wide class of discretized multivariate functions Now that we have addressed the possibilities for efficient tensor rep-
even in cases where the number of function values is larger than resentation, the question that needs to be answered is how these
the number of atoms in the universe [23], [29], [30]. representations can be computed from the data in an efficient man-
Examples of tensor networks are the hierarchical TKD and ten- ner. The first approach is to process the data in smaller blocks
sor trains (TTs) (see Figure 9) [17], [18]. The TTs are also known as rather than in a batch manner [95]. In such a divide-and-conquer
matrix product states and have been used by physicists for more approach, different blocks may be processed in parallel, and their
than two decades (see [92] and [93] and references therein). The decompositions may be carefully recombined (see Figure 10) [95],
PARATREE algorithm was developed in signal processing and fol- [96]. In fact, we may even compute the decomposition through
lows a similar idea; it uses a polyadic representation of a data ten- recursive updating as new data arrive [97]. Such recursive tech-
sor (in a possibly nonminimal number of terms), whose niques may be used for efficient computation and for tracking
computation then requires only the matrix SVD [94]. decompositions in the case of nonstationary data.
For very large-scale data that exhibit a well-defined structure, The second approach would be to employ CS ideas (see the sec-
an even more radical approach to achieve a parsimonious tion “Higher-Order Compressed Sensing (HO-CS)”) to fit an alge-
representation may be through the concept of quantized or quan- braic model with a limited number of parameters to possibly large
tic tensor networks (QTNs) [29], [30]. For example, a huge vector data. In addition to enabling data completion (interpolation of
x ! R I with I = 2 L elements can be quantized and tensorized missing data), this also provides a significant reduction of the cost
into a (2 # 2 # g # 2) tensor X of order L, as illustrated in Fig- of data acquisition, manipulation, and storage, breaking the curse
ure 2(a). If x is an exponential signal, x (k) = az k, then X is a of dimensionality being an extreme case.
symmetric rank-1 tensor that can be represented by two parame- While algorithms for this purpose are available both for low-
ters: the scaling factor a and the generator z (cf. (2) in the sec- rank and low multilinear rank representation [59], [87], an even
tion “Tensorization—Blessing of Dimensionality”). Nonsymmetric more drastic approach would be to directly adopt sampled fibers
terms provide further opportunities, beyond the sum-of-exponen- as the bases in a tensor representation. In the TKD setting, we
tial representation by symmetric low-rank tensors. Huge matrices would choose the columns of the factor matrices B (n) as
and tensors may be dealt with in the same manner. For instance, mode-n fibers of the tensor, which requires us to address the fol-
an Nth-order tensor X ! R I1 # g # I N , with I n = q L n, can be quan- lowing two problems: 1) how to find fibers that allow us to accurately
tized in all modes simultaneously to yield a (q # q # g # q) represent the tensor and 2) how to compute the corresponding core
quantized tensor of higher order. In QTN, q is small, typically tensor at a low cost (i.e., with minimal access to the data). The mat-
q = 2, 3, 4, e.g., the binary encoding ^ q = 2 h reshapes an Nth rix counterpart of this problem (i.e., representation of a large
-order tensor with (2 L 1 # 2 L2 # g # 2 L N ) elements into a tensor matrix on the basis of a few columns and rows) is referred to as
of order (L 1 + L 2 + g + L N ) with the same number of elements. the pseudoskeleton approximation [98], where the optimal
The TT decomposition applied to quantized tensors is referred to representation corresponds to the columns and rows that inter-
as the quantized TT (QTT); variants for other tensor representa- sect in the submatrix of maximal volume (maximal absolute
tions have also been derived [29], [30]. In scientific computing, value of the determinant). Finding the optimal submatrix is
such formats provide the so-called supercompression—a logarith- computationally hard, but quasioptimal submatrices may be
mic reduction of storage requirements: O (I N ) " O (N log q (I)). found by heuristic so-called cross-approximation methods that
standard PLS was applied in the same way to the unfolded tensors.
P(2) (I3 × RL3)
Figure 14(c) shows that although the standard PLS was able
... to predict the movement corresponding to each marker indi-
= T
X
P(1)T vidually, such a prediction is quite crude as the two-way PLS
does not adequately account for mutual information among the
(I1 × R) (R × RL2 × RL3) (RL2 × I2)
four markers. The enhanced predictive performance of the BTD-
based HOPLS [the red line in Figure 14(c)] is therefore attrib-
Q(2) (J3 × L3) Q(2) (J3 × L3) uted to its ability to model interactions between complex latent
1 R
components of both predictors and responses.
~
= + ··· +
u1 LINKED MULTIWAY COMPONENT ANALYSIS
Q(1)T uR Q(1)T
1 R
(I1 × J2 × J3) (J ) AND TENSOR DATA FUSION
(L2 × J2) (J ) (L2 × J2)
Data fusion concerns the joint analysis of an ensemble of data
Q(2) (J3 × RL3) sets, such as multiple views of a particular phenomenon, where
some parts of the scene may be visible in only one or a few data
... sets. Examples include the fusion of visual and thermal images
= U Q(1)T
Y in low-visibility conditions and the analysis of human electro-
(J1 × R) (R × RL2 × RL3) (RL2 × J2) physiological signals in response to a certain stimulus but from
different subjects and trials; these are naturally analyzed
together by means of matrix/tensor factorizations. The coupled
[Fig13] The principle of HOPLS for third-order tensors. The core
tensors G X and G Y are block-diagonal. The BTD-type structure nature of the analysis of such multiple data sets ensures that we
allows for the modeling of general components that are highly are able to account for the common factors across the data sets
correlated in the first mode. and, at the same time, to guarantee that the individual compo-
nents are not shared (e.g., processes that are independent of exci-
R
X, / G (Xr) # 1 t r # 2 P (r1) g # N P (rN - 1) (17) tations or stimuli/tasks).
r=1 The linked multiway component analysis (LMWCA) [106],
R
Y, / G (Yr) # 1 u r # 2 Q (r1) g # N Q (rM - 1) . (18) shown in Figure 15, performs such a decomposition into shared
r=1 and individual factors and is formulated as a set of approxi-
mate joint TKD of a set of data tensors X (k) ! R I1 # I2 # g # I N ,
A number of data-analytic problems can be reformulated as either (k = 1, 2, f, K)
regression or similarity analysis [analysis of variance (ANOVA),
autoregressive moving average modeling (ARMA), linear discri- X (k) , G (k) # 1 B (1, k) # 2 B (2, k) g # N B (N, k), (19)
minant analysis (LDA), and canonical correlation analysis (CCA)],
(n)
so that both the matrix and tensor PLS solutions can be general- where each factor matrix B (n, k) = [B C , B (In, k)] ! R I n # R n has
(n)
ized across exploratory data analysis. 1) components B C ! R I n # C n (with 0 # C n # R n) that are common
(i.e., maximally correlated) to all tensors and 2) components
B (I , ) ! R I n # (R n - C n) that are tensor specific. The objective is to esti-
nk
Example 4
(n)
The predictive power of tensor-based PLS is illustrated on a real- mate the common components B C , the individual components
(n, k)
world example of the prediction of arm movement trajectory from B I , and, via the core tensors G (k), their mutual interactions. As
the electrocorticogram (ECoG). Figure 14(a) illustrates the experi- in MWCA (see the section “Tucker Decomposition”), constraints
mental setup, whereby the 3-D arm movement of a monkey was may be imposed to match data properties [73], [76]. This enables a
captured by an optical motion capture system with reflective more general and flexible framework than group ICA and independ-
markers affixed to the left shoulder, elbow, wrist, and hand; for full ent vector analysis, which also performs linked analysis of multiple
details, see http://neurotycho.org. The predictors (32 ECoG chan- data sets but assume that 1) there exist only common components
nels) naturally build a fourth-order tensor X (time#channel_no and 2) the corresponding latent variables are statistically independ-
#epoch_length#frequency) while the movement trajectories for ent [107], [108]. Both are quite stringent and limiting assumptions.
the four markers (response) can be represented as a third-order As an alternative to TKD, coupled tensor decompositions may be of
tensor Y (time#3D_marker_position#marker_no). The goal of a polyadic or even block term type [89], [109].
X-Position
6
Tensor (Limb Trajectories) 4
2
Motion Capture Tensorization 0
20 40 60 80 100
Time (s)
Time
1
Z-Position Y-Position
0
–1
–2
X(t ) –3
20 40 60 80 100
Y(t ) ker
Mar Time (s)
Marker Coordinates Z(t ) 2
1
0
–1
20 40 60 80 100
Time (s)
Model
ECoG
Parameters
18 17 16 15
Layout HOPLS
5
Predictor
11 10 9
21 20 19
27 26 25
14 13 12
8
30 29 28
24 23 22
1
3
6
32 31
2
4
7
ECoG Recordings
Ch1
Ch2 Tensorization
Time
Time
Ch3
...
...
...
Fr Fre
Ch32 e qu l q l
en
han
ne ue nne
0 5 cy C nc
y Cha
Time (s)
(a) (b) (c)
[Fig14] The prediction of arm movement from brain electrical responses. (a) The experiment setup. (b) The construction of the
data and response tensors and training. (c) The new data tensor (bottom) and the predicted 3-D arm movement trajectories
(X, Y, Z coordinates) obtained by tensor-based HOPLS and standard matrix-based PLS (top).
Example 5 are attributed to the fact that the classification makes use of only
We employed LWCA for classification based on common and dis- the common components and is not hindered by components that
tinct features of natural objects from the ETH-80 database (http:// are not shared across objects or views.
www.d2.mpi-inf.mpg.de/Data sets/ETH80) whereby the discrimi-
nation among objects was performed using only the common fea- SOFTWARE
tures. This data set c onsists of 3,280 images in eight categories, The currently available software resources for tensor decompo-
each containing ten objects with 41 views per object. For each cat- sitions include:
egory, the training data were organized in two distinct fourth- ■■ The tensor toolbox, a versatile framework for basic opera-
order (128 # 128 # 3 # I 4) tensors, where I 4 = 10 # 41 # 0.5p, tions on sparse and dense tensors, including CPD and Tucker
where p denotes the fraction of training data. LMWCA was applied formats [111].
to these two tensors to find the common and individual features, ■■ The TDALAB and TENSORBOX, which provide a user-
with the number of common features set to 80% of I 4 . In this friendly interface and advanced algorithms for CPD, nonneg-
way, eight sets of common features were obtained for each cat- ative TKD, and MWCA [112], [113].
egory. The test sample label was assigned to the category whose ■■ The Tensorlab toolbox builds upon the complex optimiza-
common features matched the new sample best (evaluated by tion framework and offers numerical algorithms for comput-
canonical correlations) [110]. F igure 16 compares LMWCA with ing the CPD, BTD, and TKD; the toolbox includes a library of
the standard K-nearest neighbors (K-NNs) and LDA classifiers constraints (e.g., nonnegativity and orthogonality) and the
(using 50 principal components as features), all averaged over 50 possibility to combine and jointly factorize dense, sparse, and
Monte Carlo runs. The enhanced classification results for LMWCA incomplete tensors [89].
…
LMWCA
B(3) B(3, K )
Apple
C I
Cow
(l1 × I2 × I3) (l1 × R1) (R2 × I2) Features Best
(R1 × R2 × R3) Match the Test
Sample
[Fig15] Coupled TKD for LMWCA. The data tensors have (a)
both shared and individual components. Constraints such 95
as orthogonality, statistical independence, sparsity, and
nonnegativity may be imposed where appropriate. 90
85
Accuracy (%)
CONCLUSIONS AND FUTURE DIRECTIONS [Fig16] The classification of color objects belonging to different
We live in a world overwhelmed by data, from multiple pictures categories. By using only common features, LMWCA achieves a
high classification rate, even when the training set is small. (a)
of Big Ben on various social Web links to terabytes of data in Classification based on LMWCA. (b) Performance comparison.
multiview medical imaging, while we may also need to repeat
the scientific experiments many times to obtain the ground
truth. Each snapshot gives us a somewhat incomplete view of sensors. We have also discussed multilinear variants of several
the same object and involves different angles, illumination, standard signal processing tools such as multilinear SVD, ICA,
lighting conditions, facial expressions, and noise. NMF, and PLS and have shown that tensor methods can operate
We have shown that tensor decompositions are a perfect in a deterministic way on signals of very short duration.
match for exploratory analysis of such multifaceted data sets At present, the uniqueness conditions of standard tensor
and have illustrated their applications in multisensor and multi- models are relatively well understood and efficient computation
modal signal processing. Our emphasis has been to show that algorithms do exist. However, for future applications, several
tensor decompositions and multilinear algebra open up com- challenging problems remain to be addressed in more depth.
pletely new possibilities for component analysis, as compared ■■ A whole new area emerges when several decompositions
with the flat view of standard two-way methods. that operate on different data sets are coupled, as in multi-
Unlike matrices, tensors are multiway arrays of data samples view data where some details of interest are visible in, e.g.,
whose representations are typically overdetermined (fewer only one mode. Such techniques need theoretical support in
parameters in the decomposition than the number of data terms of existence, uniqueness, and numerical properties.
entries). This gives us an enormous flexibility in finding hidden ■■ As the complexity of advanced models increases, their
components in data and the ability to enhance both robustness computation requires efficient iterative algorithms, extend-
to noise and tolerance to missing data samples and faulty ing beyond the ALS class.
[20] N. Sidiropoulos, G. Giannakis, and R. Bro, “Blind PARAFAC receivers for DS- [48] A. Stegeman, “On uniqueness conditions for Candecomp/Parafac and
CDMA systems,” IEEE Trans. Signal Processing, vol. 48, no. 3, pp. 810–823, 2000. Indscal with full column rank in one mode,” Linear Algebr. Appl., vol. 431, no. 1–2,
pp. 211–227, 2009.
[21] A. Cichocki, R. Zdunek, A.-H. Phan, and S. Amari, Nonnegative Matrix and [49] E. Sanchez and B. Kowalski, “Tensorial resolution: A direct trilinear decompo-
Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and sition,” J. Chemomet., vol. 4, no. 1, pp. 29–45, 1990.
Blind Source Separation. Hoboken, NJ: Wiley, 2009.
[50] I. Domanov and L. De Lathauwer, “Canonical polyadic decomposition of third-
[22] J. Landsberg, Tensors: Geometry and Applications. AMS, 2012. order tensors: Reduction to generalized eigenvalue decomposition,” SIAM Anal.
[23] W. Hackbusch, Tensor Spaces and Numerical Tensor Calculus (ser. Springer Appl., vol. 35, no. 2, pp. 636–660, 2014.
series in computational mathematics). Heidelberg: Springer, 2012, vol. 42. [51] S. Vorobyov, Y. Rong, N. Sidiropoulos, and A. Gershman, “Robust iterative
[24] E. Acar and B. Yener, “Unsupervised multiway data analysis: A literature sur- fitting of multilinear models,” IEEE Trans. Signal Processing, vol. 53, no. 8,
vey,” IEEE Trans. Knowledge Data Eng., vol. 21, no. 1, pp. 6–20, 2009. pp. 2678–2689, 2005.
[25] T. Kolda and B. Bader, “Tensor decompositions and applications,” SIAM Rev., [52] X. Liu and N. Sidiropoulos, “Cramér-Rao lower bounds for low-rank decompo-
vol. 51, no. 3, pp. 455–500, Sept. 2009. sition of multidimensional arrays,” IEEE Trans. Signal Processing, vol. 49, no. 9,
pp. 2074–2086, Sept. 2001.
[26] P. Comon, X. Luciani, and A. L. F. de Almeida, “Tensor decompositions, alter-
nating least squares and other tales,” J. Chemomet., vol. 23, no. 7–8, pp. 393–405, [53] P. Tichavský, A.-H. Phan, and Z. Koldovský, “Cramér-Rao-induced bounds for
2009. CANDECOMP/PARAFAC tensor decomposition,” IEEE Trans. Signal Processing,
vol. 61, no. 8, pp. 1986–1997, 2013.
[27] H. Lu, K. Plataniotis, and A. Venetsanopoulos, “A survey of multilinear subspace
learning for tensor data,” Pattern Recognit., vol. 44, no. 7, pp. 1540–1551, 2011. [54] B. Chen, S. He, Z. Li, and S. Zhang, “Maximum block improvement and poly-
nomial optimization,” SIAM J. Optim., vol. 22, no. 1, pp. 87–107, 2012.
[28] M. Mørup, “Applications of tensor (multiway array) factorizations and decom- [55] A. Uschmajew, “Local convergence of the alternating least squares algorithm
positions in data mining,” Wiley Interdisc. Rew.: Data Mining Knowled. Discov., for canonical tensor approximation,” SIAM J. Matrix Anal. Appl., vol. 33, no. 2,
vol. 1, no. 1, pp. 24–40, 2011. pp. 639–652, 2012.
[29] B. Khoromskij, “Tensors-structured numerical methods in scientific com- [56] M. J. Mohlenkamp, “Musings on multilinear fitting,” Linear Algebr. Appl.,
puting: Survey on recent advances,” Chemomet. Intell. Lab. Syst., vol. 110, no. 1, vol. 438, no. 2, pp. 834–852, 2013.
pp. 1–19, 2011.
[57] M. Razaviyayn, M. Hong, and Z.-Q. Luo, “A unified convergence analysis of
[30] L. Grasedyck, D. Kressner, and C. Tobler, “A literature survey of low-rank tensor block successive minimization methods for nonsmooth optimization,” SIAM J. Op-
approximation techniques,” CGAMM-Mitteilungen, vol. 36, no. 1, pp. 53–78, 2013. tim., vol. 23, no. 2, pp. 1126–1153, 2013.
[31] P. Comon, “Tensors: A brief introduction,” IEEE Signal Processing Mag., [58] P. Paatero, “The multilinear engine: A table-driven least squares program for
vol. 31, no. 3, pp. 44–53, May 2014. solving multilinear problems, including the n-way parallel factor analysis model,” J.
[32] A. Bruckstein, D. Donoho, and M. Elad, “From sparse solutions of systems Computat. Graph. Stat., vol. 8, no. 4, pp. 854–888, Dec. 1999.
of equations to sparse modeling of signals and images,” SIAM Rev., vol. 51, no. 1, [59] E. Acar, D. Dunlavy, T. Kolda, and M. Mørup, “Scalable tensor factorizations
pp. 34–81, 2009. for incomplete data,” Chemomet. Intell. Lab. Syst., vol. 106, no. 1, pp. 41–56, 2011.
[33] J. Kruskal, “Three-way arrays: Rank and uniqueness of trilinear decomposi- [60] A.-H. Phan, P. Tichavský, and A. Cichocki, “Low complexity damped Gauss-
tions, with application to arithmetic complexity and statistics,” Linear Algebr. Newton algorithms for CANDECOMP/PARAFAC,” SIAM J. Matrix Anal. Appl. (SI-
Appl., vol. 18, no. 2, pp. 95–138, 1977. MAX), vol. 34, no. 1, pp. 126–147, 2013.
[34] I. Domanov and L. De Lathauwer, “On the uniqueness of the canonical poly- [61] L. Sorber, M. Van Barel, and L. De Lathauwer, “Optimization-based algorithms
adic decomposition of third-order tensors—Part I: Basic results and uniqueness of for tensor decompositions: Canonical Polyadic Decomposition, decomposition in
one factor matrix and part II: Uniqueness of the overall decomposition,” SIAM J. rank-(L r, L r, 1) terms and a new generalization,” SIAM J. Optim., vol. 23, no. 2,
Matrix Anal. Appl., vol. 34, no. 3, pp. 855–903, 2013. pp. 695–720, 2013.
[35] M. Elad, P. Milanfar, and G. H. Golub, “Shape from moments—An estimation [62] V. de Silva and L.-H. Lim, “Tensor rank and the ill-posedness of the best low-
theory perspective,” IEEE Trans. Signal Processing, vol. 52, no. 7, pp. 1814–1829, rank approximation problem,” SIAM J. Matrix Anal. Appl., vol. 30, pp. 1084–1127,
2004. Sept. 2008.
[36] N. Sidiropoulos, “Generalizing Caratheodory’s uniqueness of harmonic [63] W. Krijnen, T. Dijkstra, and A. Stegeman, “On the non-existence of optimal
parameterization to N dimensions,” IEEE Trans. Inform. Theory, vol. 47, no. 4, solutions and the occurrence of “degeneracy” in the Candecomp/Parafac model,”
pp. 1687–1690, 2001. Psychometrika, vol. 73, no. 3, pp. 431–439, 2008.