0% found this document useful (0 votes)
41 views

Ring

Ring of energy

Uploaded by

Azlan Amirul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Ring

Ring of energy

Uploaded by

Azlan Amirul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

MULTISCALE MODEL. SIMUL.

© 2021 Society for Industrial and Applied Mathematics


Vol. 19, No. 3, pp. 1261--1284

EFFICIENT CONSTRUCTION OF TENSOR RING


REPRESENTATIONS FROM SAMPLING\ast
Downloaded 07/21/22 to 132.174.251.2 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

YUEHAW KHOO\dagger , JIANFENG LU\ddagger , AND LEXING YING\S

Abstract. In this paper we propose an efficient method to compress a high dimensional function
into a tensor ring format, based on alternating least squares (ALS). Since the function has size
exponential in d, where d is the number of dimensions, we propose an efficient sampling scheme
to obtain O(d) important samples in order to learn the tensor ring. Furthermore, we devise an
initialization method for ALS that allows fast convergence in practice. Numerical examples show
that to approximate a function with similar accuracy, the tensor ring format provided by the proposed
method has fewer parameters than the tensor-train format and also better respects the structure of
the original function.

Key words. tensor decompositions, tensor train, randomized algorithm, function approximation

AMS subject classifications. 65D15, 33F05, 15A69

DOI. 10.1137/17M1154382

1. Introduction. Consider a function f : [n]d \rightarrow \BbbR which can be treated as a


tensor of size nd ([n] := \{ 1, . . . , n\} ). In order to store and perform algebraic manipu-
lation of the exponentially sized tensor, typically the tensor f has to be decomposed
into various low complexity formats. Most current applications involve the CP [8] or
Tucker decompositions [8, 17]. However, the CP decomposition for a general tensor
is nonunique, whereas the components of a Tucker decomposition have exponential
size in d. The tensor train (TT) [14], better known as the matrix product states
(MPS) proposed earlier in the physics literature (see, e.g., [1, 19, 15]), emerges as an
alternative that breaks the curse of dimensionality while avoiding the ill-posedness
issue in tensor decomposition. For this format, function compression and evaluation
can be done in O(d) complexity. The situation is, however, unclear when generalizing
a TT to a tensor network. Therefore, in this paper, we consider the compression of a
black box function f into a tensor ring (TR), i.e., to find 3-tensors H 1 , . . . , H d such
that for x := (x1 , . . . , xd ) \in [n]d

f (x1 , . . . , xd ) \approx Tr H 1 (:, x1 , :)H 2 (:, x2 , :) \cdot \cdot \cdot H d (:, xd , :) .


\bigl( \bigr)
(1)

Here H k \in \BbbR rk - 1 \times n\times rk , rk \leq r and we often refer to (r1 , . . . , rd ) as the TR rank.
Such type of tensor format is a generalization of the TT format for which H 1 \in
\BbbR 1\times n\times r1 , H d \in \BbbR rd - 1 \times n\times 1 . The difference between TR and TT is illustrated in Fig-
ure 1 using tensor network diagrams introduced in section 1.1. Due to the exponential
\ast Received by the editors November 2, 2017; accepted for publication (in revised form) December

22, 2020; published electronically August 5, 2021.


https://doi.org/10.1137/17M1154382
Funding: The first and third authors were supported in part by the National Science Foundation
under award DMS-1521830 and the U.S. Department of Energy's Advanced Scientific Computing Re-
search program under award DE-FC02-13ER26134/DE-SC0009409. The second author is supported
in part by the National Science Foundation under award DMS-1454939.
\dagger Department of Statistics, The University of Chicago, Chicago, IL 60637 USA
([email protected]).
\ddagger Departments of Physics, Chemistry, and Mathematics, Duke University, Durham, NC 27708

USA ([email protected]).
\S Department of Mathematics, Stanford University, Stanford, CA 94305-2125 USA (lexing@

stanford.edu).
1261

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


1262 YUEHAW KHOO, JIANFENG LU, AND LEXING YING

1 2 𝑑−1
𝑑
1 2 𝑑−1 𝑑

Downloaded 07/21/22 to 132.174.251.2 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

v
v v
v

Tensor Ring Tensor Train

Fig. 1. Comparison between a TR and a TT.

number of entries, typically we do not have access to the entire tensor f . Therefore,
TR format has to be found based on ``interpolation"" from f (\Omega ) where \Omega is a subset
of [n]d . For simplicity, in the rest of the note, we assume r1 = r2 = \cdot \cdot \cdot = rd = r.
1.1. Notations. We first summarize the notations used in this note and intro-
duce tensor network diagrams for the ease of presentation. Depending on the context,
f is often referred to as a d-tensor of size nd (instead of a function). For a p-tensor
T , given two disjoint subsets \alpha , \beta \subset [p] where \alpha \cup \beta = [p], we use

(2) T\alpha ;\beta

to denote the reshaping of T into a matrix, where the dimensions corresponding to


sets \alpha and \beta give rows and columns, respectively. Often we need to sample the values
of f on a subset of [n]d grid points. Let \alpha and \beta be two groups of dimensions where
\alpha \cup \beta = [d], \alpha \cap \beta = \emptyset , and \Omega 1 and \Omega 2 be some subsampled grid points along the
subsets of dimensions \alpha and \beta , respectively. We use

(3) f (\Omega 1 ; \Omega 2 ) := f\alpha ;\beta (\Omega 1 \times \Omega 2 )

to indicate the operation of reshaping f into a matrix, followed by rows and columns
subsampling according to \Omega 1 , \Omega 2 . For any vector x \in [n]d and any integer i, we let

(4) xi := x[(i - 1) mod d]+1 .

For a p-tensor T , we define its Frobenius norm as


\biggl( \sum \biggr) 1/2
(5) \| T \| F := T (i1 , . . . , ip )2 .
i1 ,...,ip

The notation vec(A) is used to denote the vectorization of a matrix A, formed by


stacking the columns of A into a vector. For two sets \alpha , \beta , we also use the notation

(6) \alpha \setminus \beta := \{ i \in \alpha | i \in \beta c \}

to denote the set difference between \alpha , \beta .


In this note, for the convenience of presentation, we use tensor network diagrams
to represent tensors and contractions between them. A tensor is represented as a
node, where the number of legs of a node indicates the dimensionality of the tensor.
For example Figure 2(a) shows a 3-tensor A and a 4-tensor B. When joining edges
between two tensors (for example, in Figure 2(b) we join the third leg of A and first

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


CONSTRUCTION OF TENSOR RING REPRESENTATIONS 1263
Downloaded 07/21/22 to 132.174.251.2 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

(a) (b)

Fig. 2. (a) Tensor diagram for a 3-tensor A and a 4-tensor B. (b) Contraction between tensors
A and B.

leg of B), we mean (with the implicit assumption that the dimensions represented by
these legs have the same size)
\sum
(7) Ai1 i2 k Bkj2 j3 j4 .
k

See the review article [12] for a more complete introduction of tensor network dia-
grams.
1.2. Previous approaches. In this section, we survey previous approaches for
compressing a blackbox function into TT or TR. In [13], successive CUR (skeleton)
decompositions [6] are applied to find a decomposition of tensor f in TT format.
In [4], a similar scheme is applied to find a TR decomposition of the tensor. A
crucial step in [4] is to ``disentangle"" one of the 3-tensors H k 's, say H 1 , from the TR.
First, f is treated as a matrix where the first dimension of f gives rows, the second,
third, . . . , dth dimensions of f give columns, i.e., reshaping f to f1;[d]\setminus 1 . Then CUR
decomposition is applied such that

(8) f1;[d]\setminus 1 = CU R
2
and the matrix C \in \BbbR n\times r in the decomposition is regarded as H2;3,1
1
(the R part
in CUR decomposition is never formed due to its exponential size). As noted by the
authors in [4], a shortcoming of the method lies in the reshaping of C into H 1 . As
in any factorization of a low-rank matrix, there is an inherent ambiguity for CUR
decomposition in that CU R = CAA - 1 U R for any invertible matrix A. Such ambigu-
ity in determining H 1 may lead to large TR rank in the subsequent determination of
H 2 , H 3 , . . . , H d . More recently, [22] proposes various alternating least squares (ALS)-
based techniques to determine the TR decomposition of a tensor f . However, they
only consider the situation where entries of f are fully observed, which limits the
applicability of their algorithms to the case with rather small d. Moreover, depending
on the initialization, ALS can suffer from slow convergence. In [18], ALS is used to
determine the TR in a more general setting where only partial observations of the
function f are given. In this paper, we further assume the freedom to observe any
O(d) entries from the tensor f . As we shall see, leveraging such freedom, the com-
plexity of the iterations can be reduced significantly compare to the ALS procedure
in [18].
1.3. Our contributions. In this paper, assuming f admits a rank-r TR decom-
position, we propose an ALS-based two-phase method to reconstruct the TR when
only a few entries of f can be sampled. Here we summarize our contributions.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


1264 YUEHAW KHOO, JIANFENG LU, AND LEXING YING

1. The optimization problem of finding the TR decomposition is nonconvex


hence requires good initialization in general. We devise a method for initial-
izing H 1 , . . . , H d that helps to resolve the aforementioned ambiguity issues
Downloaded 07/21/22 to 132.174.251.2 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

via certain probabilistic assumption on the function f .


2. When updating each 3-tensor in the TR, it is infeasible to use all the entries
of f . We devise a hierarchical strategy to choose the samples of f efficiently
via interpolative decomposition. Furthermore, the samples are chosen in a
way that makes the per iteration complexity of the ALS linear in d.
While we focus in this note on the problem of construction of the TR format,
the above proposed strategies can be applied to tensor networks in higher spatial
configuration (like PEPS; see, e.g., [12]), which will be considered in future works.
The paper is organized as followed. In section 2 we detail the proposed algorithm.
In section 3, we provide intuition and theoretical guarantess to motivate the proposed
initialization procedure, based on certain probabilistic assumption on f . In section 4,
we demonstrate the effectiveness of our methods through numerical examples. Finally
we conclude the paper in section 5.
2. Proposed method. In order to find a TR decomposition (1), our overall
strategy is to solve the minimization problem
\sum \bigl( \bigr) 2
(9) min Tr(H 1 [x1 ] \cdot \cdot \cdot H d [xd ]) - f (x1 , . . . , xd ) ,
H 1 ,...,H d
x\in [n]d

where
H k [xk ] := H k (:, xk , :) \in \BbbR r\times r
denotes the xk th slice of the 3-tensor H k along the second dimension. It is compu-
tationally infeasible just to set up problem (9), as we need to evaluate f nd times.
Therefore, analogously to the matrix or CP-tensor completion problem [3, 21], a ``TR
completion"" problem [18]
\sum \bigl( \bigr) 2
(10) min Tr(H 1 [x1 ] \cdot \cdot \cdot H d [xd ]) - f (x1 , . . . , xd ) ,
H 1 ,...,H d
x\in \Omega

where \Omega is a subset of [n]d should be solved instead. Since there are a total of dnr2
parameters for the tensors H 1 , . . . , H d , there is hope that by observing a small number
of entries in f (at least O(ndr2 )), we can obtain the rank-r TR.
A standard approach for solving the minimization problem of the type (10) is via
ALS. At every iteration of ALS, a particular H k is treated as variable while H l , l \not = k
are kept fixed. Then H k is optimized w.r.t. the least-squares cost in (10). More
precisely, to determine H k , we solve
\sum \bigl( \bigr) 2
(11) min Tr(H k [xk ]C x\setminus xk ) - f (x) ,
Hk
x\in \Omega

where each coefficient matrix

(12) C x\setminus xk := H k+1 [xk+1 ] \cdot \cdot \cdot H d [xd ]H 1 [x1 ] \cdot \cdot \cdot H k - 1 [xk - 1 ], x \in \Omega .

By an abuse of notation, we use x \setminus xk to denote the exclusion of xk from the d-tuple
x. As mentioned previously, | \Omega | should be at least O(ndr2 ) in order to determine
the TR decomposition. This creates a large computational cost in each iteration of

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


CONSTRUCTION OF TENSOR RING REPRESENTATIONS 1265

the ALS, as it takes | \Omega | (d - 1) (which has O(d2 ) scaling as | \Omega | has size O(d)) matrix
multiplications just to construct C x\setminus xk for all x \in \Omega . When d is large, such quadratic
scaling in d for setting up the least-squares problem in each iteration of the ALS is
Downloaded 07/21/22 to 132.174.251.2 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

undesirable.
The following simple but crucial observation allows us to gain a further speedup.
Although O(ndr2 ) observations of f are required to determine all the components
H 1 , . . . , H d , when it comes to determining each individual H k via solving the linear
system (11), only O(nr2 ) equations are required for the well-posedness of the linear
system. This motivates us to use different \Omega k 's each having size O(nr2 ) (with | \Omega 1 | +
\cdot \cdot \cdot + | \Omega d | \sim O(ndr2 )) to determine different H k 's in the ALS steps instead of using
a fixed set \Omega with size O(ndr2 ) for H k 's. If \Omega k is constructed from densely sampling
the dimensions near k (where a neighborhood is defined according to ring geometry)
while sparsely sampling the dimensions far away from k, computational savings can
be achieved. The specific construction of \Omega k is made precise in section 2.1. We further
remark that if

(13) Tr(H k [xk ]C x\setminus xk ) \approx f (x)

holds with small error for every x \in [n]d , then using any \Omega k \in [n]d in place of \Omega in
(11) should give similar solutions, as long as (11) is well-posed. Therefore, we solve
\sum \bigl( \bigr) 2
(14) min Tr(H k [xk ]C x\setminus xk ) - f (x)
Hk
x\in \Omega k

instead of (11) in each step of the ALS where the index sets \Omega k 's depend on k. We
note that in practice, a regularization term \lambda \sigma k \| H k (xk )\| 2F is added to the cost in
(14) to reduce numerical instability resulting from a potential high condition number
of the least-squares problem (14). In all of our experiments, \lambda is set to 10 - 9 and \sigma k
is the top singular value of the Hessian of the least-squares problem (14). From our
experience, the quality of TR is rather insensitive to the choice of \lambda , which indicates
the problem of determining H k 's is rather well-posed.
At this point it is clear that there are two issues needed to be addressed. The
first issue is concerning the choice of \Omega k , k \in [d]. Another issue is that the nonconvex
nature of the TR completion problem 10 may cause difficulty in the convergence of
ALS. We solve the first issue using a hierarchical sampling strategy. As for the second
issue, by making certain probabilistic assumptions on f , we are able to obtain a
cheap and intuitive initialization that allows fast convergence. Before moving on, we
summarize the full algorithm in Algorithm 1. The steps of Algorithm 1 are further
detailed in sections 2.1, 2.2, and 2.3.

Algorithm 1 Alternating least squares.


Require:
Function f : [n]d \rightarrow \BbbR .
Ensure:
TR H 1 , . . . , H d \in \BbbR r\times n\times r .
1: Identify the index sets \Omega k 's and compute f (\Omega k ) for each k \in [d] (section 2.1).
2: Initialize H 1 , . . . , H d (section 2.2).
3: Start ALS by solving (14) for each k \in [d] (section 2.3).

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


1266 YUEHAW KHOO, JIANFENG LU, AND LEXING YING

2.1. Constructing \Omega \bfitk . In this section, we detail the construction of \Omega k for
each k \in [d]. We first construct an index set \Omega envi
k \subset [n]d - 3 with fixed size s. The
envi
elements in \Omega k correspond to different choices of indices for the [d]\setminus \{ k - 1, k, k+1\} th
Downloaded 07/21/22 to 132.174.251.2 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

dimensions of the function f . Then for each of the elements in \Omega envi k , we sample all
possible indices from the (k - 1)th, kth, (k + 1)th dimensions of f to construct \Omega k ,
i.e., letting

(15) \Omega k = [n]3 \times \Omega envi


k .

We let | \Omega envi


k | = s for all k where s is a constant that does not depend on the
dimension d. In this case, when determining C x\setminus xk , x \in \Omega k , in (14), only O(| \Omega envi k | d)
multiplications of r\times r matrices are needed, giving a complexity that is linear in d when
setting up the least-squares problem. We want to emphasize that although naively
it seems that O(n3 ) samples are needed to construct \Omega k in (15), the n3 samples
corresponding to each sample in \Omega envi k can be obtained via applying interpolative
decomposition [5] to the n \times n \times n tensor with O(n) observations.
It remains that \Omega envi k 's need to be constructed. There are two criteria we use for
constructing \Omega envi
k , k \in [d]. First, we want the range of fk;[d]\setminus k (\Omega k ) to be the same as
the range of fk;[d]\setminus k . This is a necessary condition of the least squares in (14) having
a small residual. In this case, the following observation holds.
Observation 1. If
\sqrt{} \sum
\bigl( \bigr) 2
(16) Tr(H k [xk ]C x\setminus xk ) - f (x) \leq \epsilon ,
x\in \Omega k

then
\epsilon
(17) k
\| H2;3,1 - fk;[d]\setminus k (\Omega k )[vec(C x\setminus xk )]\dagger x\in \Omega k \| F \leq ,
\sigma min ([vec(C x\setminus xk )]x\in \Omega k )

where \dagger denotes the pseudoinverse, and \sigma min denotes the smallest singular value.
k
Therefore, Range(H2;3,1 ) is similar to Range(fk;[d]\setminus k (\Omega k )). On the other hand, an
k
optimal H should satisfy
k
(18) H2;3,1 [vec(C x\setminus xk )]x\in [n]d = fk;[d]\setminus k

for all the entries of f , thus

(19) Range(fk;[d]\setminus k (\Omega k )) \approx Range(fk;[d]\setminus k ).

Here we emphasize that it is possible to reshape f (\Omega k ) into a matrix fk;[d]\setminus k (\Omega k ) as
in (17) due to the product structure of \Omega k in (15), where the indices along dimension
k are fully sampled. The second criterion is that we require the cost in (14) to
approximate the cost in (9).
To meet the first criterion, we propose a hierarchical strategy to determine \Omega envi k
such that fk;[d]\setminus k (\Omega k ) has large singular values. Assuming d = 3 \cdot 2L for some natural
number L, we summarize such a strategy in Algorithm 2 (the upward pass) and 3 (the
downward pass). The dimensions are divided into groups of size 3\cdot 2L - l on each level l
for l = 1, . . . , L. We emphasize that level l = 1 corresponds to the coarsest partitioning
of the dimensions of the tensor f . The purpose of the upward pass is to hierarchically
find skeletons \Theta in,lk which represent the kth group of indices, while the downward pass

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


CONSTRUCTION OF TENSOR RING REPRESENTATIONS 1267

hierarchically constructs representative environment skeletons \Theta envi,l k . At each level,


the skeletons are found by using rank revealing QR (RRQR) factorization [9].
After a full upward-downward pass where the RRQR are called O(d log d) times,
Downloaded 07/21/22 to 132.174.251.2 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

\Theta envi,L
k with k \in [2L ] are obtained. Then another upward pass can be reinitiated.
Instead of sampling new \Theta envi,l
k 's, the stored \Theta envi,l
k 's in the downward pass are used.
Multiple upward-downward passes can be called to further improved these skeletons.
Finally, we let

(20) \Omega envi envi


3k - 1 := \Theta k , k \in [2L ].

Observe that we have only obtained \Omega envi k for k = 2, 5, . . . , d - 1. Therefore, we


need to apply the upward-downward pass to different groupings of tensor f 's dimen-
sions in step (1) of the upward pass. More precisely, we group the dimensions as
(2, 3, 4), (5, 6, 7), . . . , (d - 1, d, 1) and (d, 1, 2), (3, 4, 5), . . . , (d - 3, d - 2, d - 1) when ini-
tializing the upward pass to determine \Omega envi k with k = 3, 6, . . . , d and k = 1, 4, . . . , d - 2,
respectively.
Finally, to meet the second criterion that the cost in (14) should approximate
the cost in (9), to each \Omega envi k , we add extra samples x \in [n]
d - 3
by sampling xi 's
uniformly and independently from [n]. We typically sample an extra 5s samples to
each \Omega envi
k . This completes the construction for \Omega k
envi
's and their corresponding \Omega k 's
in Algorithm 1.

Algorithm 2 Upward pass.


Require:
Function f : [n]d \rightarrow \BbbR , number of skeletons s.
Ensure:
Skeleton sets \Theta in,l
k 's
1: Decimate the number of dimensions by clustering every three dimensions. More
precisely, for each k \in [2L ], let

\~ in,L := \{ (x3k - 2 , x3k - 1 , x3k ) | x3k - 2 , x3k - 1 , x3k \in [n]\} .


\Theta k

There are 2L index sets after this step. For each k \in [2L ], construct the set of
environment skeletons

(21) \Theta envi,l


k \subset [n]d - 3

with s elements either by selecting multi-indices from [n]d - 3 randomly, or by using


the output of Algorithm 3 (when an iteration of upward and downward passes is
employed). This step is illustrated in the following figure:

[𝑛][𝑛][𝑛] [𝑛][𝑛][𝑛]

% '(,* : = 𝑛
Θ - [𝑛][𝑛][𝑛]
&

for l = L to l = 1

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


1268 YUEHAW KHOO, JIANFENG LU, AND LEXING YING

2: Find the skeletons within each index set \Theta \~ in,l , k \in [2l ], where the elements in
k
\~ in,l are multi-indices of length 3 \cdot 2L - l . Apply RRQR factorization to the
each \Theta
Downloaded 07/21/22 to 132.174.251.2 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

k
matrix
\~ in,l |
(22) f (\Theta envi,l
k
\~ in,l ) \in \BbbR s\times | \Theta k
; \Theta k

to select s columns that best resembles the range of f (\Theta envi,l


k
\~ in,l ). The multi-
; \Theta k
in,l in,l
indices for these s columns form the set \Theta k . Store \Theta k for each k \in [2l ].
This step is illustrated in the following figure, where the thick lines are used to
denote the index sets with size larger than s.

) &$,(
Θ Θ#$%&
,(
" "

Θ&$,(
" ∀𝑘 ∈ 2(

3: If l > 1, for each k \in [2l - 1 ], construct

(23) \~ in,l - 1 := \Theta in,l \times \Theta in,l .


\Theta k 2k - 1 2k

Then, sample s elements randomly from


\prod
(24) \Theta in,l
j
j\in [2l ]\setminus \{ 2k - 1,2k\}

to form \Theta envi,l - 1


k , or by using the output of Algorithm 3 (when an iteration of
upward and downward passes is employed). This step is depicted in the next
figure, and again thick lines are used to denote the index sets with size larger
than d.

Θ&',) &',)
"#$%Θ"#

. &',)$%
Θ # ∀𝑘 ∈ 2)$%

end for

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


CONSTRUCTION OF TENSOR RING REPRESENTATIONS 1269

Algorithm 3 Downward pass.


Require:
Function f : [n]d \rightarrow \BbbR , \Theta in,l
k 's from the upward pass, number of skeletons s.
Downloaded 07/21/22 to 132.174.251.2 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Ensure:
Skeletons \Theta envi,l
k 's
envi,1
1: Let \Theta 1 = \Theta 2 , \Theta envi,1
in,1
2 = \Theta in,1
1 .

Θ#$,"
" (Θ'
($)#,"
) Θ#$,"
' (Θ"
($)#,"
)

for l = 2 to l = L
2: For each k \in [2l ], we obtain \Theta envi,l
k by applying RRQR factorization to

(25) f (\Theta in,l in,l envi,l - 1


k ; \Theta k+1 \times \Theta (k+1)/2 )

or

(26) f (\Theta in,l in,l envi,l - 1


k ; \Theta k - 1 \times \Theta k/2 )

for odd or even k, respectively, to obtain s important columns. The multi-


indices corresponding to these s columns are used to update \Theta envi,l
k . The
selection of the environment skeletons when k is odd is illustrated in the next
figure:

Θ%&,(
" Θ%&,( .&/%,(0$
"#$ ×Θ("#$)/-

%&,( .&/% ,(
Θ" Θ" ∀𝑘 ∈ 2(
end for

2.2. Initialization. Due to the nonlinearity of the optimization problem (10), it


is possible for ALS to get stuck at local minima or saddle points. A good initialization
is crucial for the success of ALS. One possibility is to use the ``opening"" procedure in
[4] to obtain 3-tensors each. As mentioned previously, this may suffer an ambiguity
issue, leading us to consider a different approach. The proposed initialization pro-
cedure consists of two steps. First we obtain H k 's up to gauges Gk 's between them
(Algorithm 4). Then we solve d least-squares problems to fix the gauges between the
H k 's (Algorithm 5). More precisely, after Algorithm 4, we want to use T k,C as H k .
However, as in any factorization, SVD can only determine the factorization of T k,C
up to gauge transformations, as shown in Figure 3. Therefore, between T k,C and
T k+1,C , some appropriate gauge Gk has to be inserted (Figure 3).
After gauge fixing, we complete the initialization step in Algorithm 1. Before
moving on, we demonstrate the superiority of this initialization versus random ini-
tialization. In Figure 4 we plot the error between TR and the full function versus

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


1270 YUEHAW KHOO, JIANFENG LU, AND LEXING YING
Downloaded 07/21/22 to 132.174.251.2 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

𝑇𝑘,𝐶 𝐺𝑘 𝑇𝑘+1,𝐶
Fig. 3. A gauge Gk needs to be inserted between T k,C and T k+1,C .

102
Proposed initialization
Random initialization
100
Error

10-2

10-4

10-6 0
10 101 102 103
Number of iteration

Fig. 4. Plot of convergence of the ALS using both random and the proposed initializations for
the numerical example given in section 4.3 with n = 3, d = 12. The error measure is defined in
(40).

the number of iterations in ALS, when using the proposed initialization and random
initialization. By random initialization, we mean the H k 's are initialized by sampling
their entries independently from the normal distribution. Then ALS is performed on
the example detailed in section 4.3 with n = 3, d = 12. We set the TR rank to be
r = 3. As we can see, after one iteration of ALS, we already obtain a 10 - 4 error using
our proposed method, whereas with random initialization, the convergence of ALS is
slower and the solution has a lower accuracy.
2.3. Alternating least squares. After constructing \Omega k and initializing H k ,
k \in [d], we start ALS by solving problem (14) at each iteration. This completes
Algorithm 1.
When running ALS, sometimes we want to increase the TR rank to obtain a
higher accuracy approximation to the function f . In this case, we simply add a row
and column of random entries to each H k , i.e.,

\epsilon i,k
\biggl[ k \biggr]
k H (:, i, :) 1
(27) H (:, i, :) \leftarrow , i = 1, . . . , n, k = 1, . . . , d,
\epsilon i,k
2 1

where each entry of \epsilon i,k


1 \in \BbbR
r\times 1 i,k
, \epsilon 2 \in \BbbR 1\times r is sampled from a Gaussian distribu-
tion, and continue with the ALS procedure with the new H k 's until the error stops
decreasing. The variance of each Gaussian random variable is typically set to 10 - 8 .
3. Motivation of the initialization procedure. In this section, we motivate
our initialization procedure in Algorithm 4. The main idea is by fixing a random
index set, a portion of the ring can be singled out and extracted. To this end, we
place the following assumption on the TR f .

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


CONSTRUCTION OF TENSOR RING REPRESENTATIONS 1271

Algorithm 4
Require:
Function f : [n]d \rightarrow \BbbR .
Downloaded 07/21/22 to 132.174.251.2 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Ensure:
T k,L \in \BbbR n\times r , T k,C \in \BbbR r\times n\times r , T k,R \in \BbbR r\times n , k \in [d].
for k = 1 to k = d
1: Pick an arbitrary z \in [n]d - 3 and let

\Omega ini d
\bigl\{ \bigr\}
(28) k := x \in [n] | x[d]\setminus \{ k - 1,k,k+1\} = z, xk - 1 , xk , xk+1 \in [n] .

Define

(29) T k := f (\Omega ini


k ) \in \BbbR
n\times n\times n
,

where the first, second, and third dimensions of T k correspond to the (k -


1), k, (k + 1)th dimensions of f . Note that we only pick one z in \Omega envi
k , which
is the key that we can use an SVD procedure in the next step and avoid
ambiguity in the initialization. The justification of such a procedure can be
found in Appendix 3.
2: Now we want to factorize the 3-tensor T k into a TT with three nodes using
SVD. First treat T k as a matrix by treating the first leg as rows and the second
and third legs as columns. Apply a rank-r approximation to T k using SVD:
k
(30) T1;2,3 \approx UL \Sigma L VLT .
2
Let C k \in \BbbR r\times n\times n be reshaped from \Sigma L VLT \in \BbbR r\times n .
3: Treat C k as a matrix by treating the first and second legs as rows and the
third leg as columns. Apply SVD to obtain a rank-r approximation:
k
(31) C1,2;3 \approx UR \Sigma R VRT .

Let T\~k,C \in \BbbR r\times n\times r be reshaped from UR \Sigma R \in \BbbR rn\times r .
1/2 1/2
4: Let T k,L := UL \Sigma L and T k,R := \Sigma R VRT . Let T k,C be defined by

𝑇),+ ≔

Σ"#$/& 𝑇(),+ Σ-
#$/&

3-tensor T k is thus approximated by a TT with three tensors T k,L \in


\BbbR n\times r , T k,C \in \BbbR r\times n\times r , T k,R \in \BbbR r\times n .
end for

Assumption 1. Let the TR f be partitioned into four disjoint regions (Figure


5): Regions a, b, c1 , and c2 , where a, b, c1 , c2 \subset [d]. Regions a, b, c1 , c2 contain
La , Lb , Lc1 , Lc2 number of dimensions, respectively, where La + Lb + Lc1 + Lc2 = d.
If La , Lb \geq Lbuffer for any z \in [n]La +Lb , the TR f satisfies
(35) f (xc1 , xa\cup b , xc2 )| xa\cup b =z \propto g(xc1 , xa\cup b )| xa\cup b =z h(xa\cup b , xc2 )| xa\cup b =z

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


1272 YUEHAW KHOO, JIANFENG LU, AND LEXING YING

Algorithm 5
Require:
Function f : [n]d \rightarrow \BbbR , T k,L , T k,C , T k,R for k \in [d] from Algorithm 4.
Downloaded 07/21/22 to 132.174.251.2 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Ensure:
Initialization H k , k \in [d].
for k = 1 to k = d
1: Pick an arbitrary z \in [n]d - 4 and let
(32)
\Omega gauge := x \in [n]d | x[d]\setminus \{ k - 1,k,k+1,k+2\} = z, \forall xk - 1 , xk , xk+1 , xk+2 \in [n]
\bigl\{ \bigr\}
k

and sample

(33) S k = f (\Omega gauge


k ) \in \BbbR n\times n\times n\times n .

2: Solve the least-squares problem

(34) Gk = argmin \| Lk1,2;3 GR1;2,3


k k
- S1,2;3,4 \| 2F
G

where Lk and Rk are defined as


1 2 2 3
𝐿, = 𝑅, =
3 1

𝑇𝑘,$ 𝑇𝑘,𝐶 𝑇𝑘+1,𝐶 𝑇𝑘+1,𝑅


3: Obtain H k :

𝐻" =

𝑇 ",$ 𝐺"
end for

for some functions g, h. Here ``\propto "" denotes the proportional up to a constant relation-
ship.
We note that Assumption 1 holds if f is a nonnegative function and admits a
Markovian structure. Such functions can arise from a Gibbs distribution with energy
defined by short-range interactions [20], for example, the Ising model.
Next we make certain non-degeneracy assumption on the TR f .
Assumption 2. Any segment H of the TR f (for example H a , H b , H c1 , H c2 shown
in Figure 6), satisfies

(36) rank(HL+1,L+2;[L] ) = r2

if L \geq L0 for some natural number L0 . In particular, if L \geq L0 , we assume the


condition number of H[L];L+1,L+2 \geq \kappa for some \kappa = 1 + \delta \kappa , where \delta \kappa \geq 0 is a small
parameter.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


CONSTRUCTION OF TENSOR RING REPRESENTATIONS 1273

𝐻(+ = 𝐿" + 1 𝐿" + 2 𝐻() =


Downloaded 07/21/22 to 132.174.251.2 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

𝐻" =
𝐿(+ +1 1 2 … 𝐿" 𝐿() +1
1 1
Region 𝑎
2… …2
𝐿* + 1 𝐿* + 2
𝐿(+ +2 𝐿() +2 𝐿
𝐿(+ ()
𝐻* = 1 2 … 𝐿*
Region 𝑐0 Region 𝑏 Region 𝑐/

Fig. 5. Figure of TR f partitioned into regions a, b, c1 , c2 .

Fig. 6. Figure of a segment of TR, denoted as H, with L + 2 dimensions. The 1, . . . , Lth


dimensions have size n, corresponding to outgoing legs of the TR, and the L + 1, L + 2th dimensions
are the latent dimensions with size r.

2 L
Since HL+1,L+2;[L] \in \BbbR r \times n , it is natural to expect when nL \geq r2 , HL+1,L+2;[L]
is rank r2 generically [15].
We now state a proposition that leads us to the intuition behind designing the
initialization procedure Algorithm 4.
Proposition 1. Let
(37) s1 = ei1 \otimes ei2 \otimes \cdot \cdot \cdot \otimes eiLa , s2 = ej1 \otimes ej2 \otimes \cdot \cdot \cdot \otimes ejLb
be any two arbitrary sampling vectors, where \{ ek \} nk=1 is the canonical basis in \BbbR n .
If La , Lb , Lc1 , Lc2 \geq max(L0 , Lbuffer ), the two matrices B 1 , B 2 \in \BbbR r\times r defined in
Figure 7 are rank-1.
2
\times nLc1
Proof. Due to Assumption 2, HLc1c \in \BbbR r and HLc2c
1 +1,Lc1 +2;[Lc1 ] 2 +1,Lc2 +2;[Lc2 ]
2
\times nLc2
\in \BbbR r defined in Figure 7 are rank-r2 . Along with the implication of Assumption
1 that
\bigr) T
rank HLc1c B 1 \otimes B 2 HLc2c
\bigl( \bigl( \bigr)
(38) = 1,
1 +1,Lc1 +2;[Lc1 ] 2 +1,Lc2 +2;[Lc2 ]

we get
(39) rank(B 1 \otimes B 2 ) = 1.
Since rank(B 1 ) rank(B 2 ) = rank(B 1 \otimes B 2 ) = 1, it follows that the rank of B 1 , B 2 are
1.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


1274 YUEHAW KHOO, JIANFENG LU, AND LEXING YING
Downloaded 07/21/22 to 132.174.251.2 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

𝐵1 =

𝑠1 Region 𝑎

𝐵2 =

Region 𝑐( 𝑠& Region 𝑏 Region 𝑐&

Fig. 7. Definition of the matrices B 1 , B 2 in Proposition 1.

𝑇
𝑝𝑏 (𝑞𝑏 )
𝑠2

Fig. 8. Applying a sampling vector s2 in the canonical basis to region b gives the TT.

The conclusion of Proposition 1 implies that to obtain the segment of TR in


region a, one simply needs to apply some sampling vector s2 in the canonical basis to
region b to obtain the configuration in Figure 8, where the vectors pb , q b \in \BbbR r . Our
goal is to extract the nodes in region a as H k 's. It is intuitively obvious that one can
apply the TT-SVD technique in [13] to extract them. Such a technique is indeed used
in the proposed initialization procedure where we assume Lbuffer = 1, L0 = 1, La =
1, Lb = d - 3. For completeness, in Proposition 2 in the appendix, we formalize
the fact that one can use TT-SVD to learn each individual 3-tensor in the TR f up
to some gauges. We further provide a perturbation analysis for the case when the
Markovian-type assumption holds only approximately in Proposition 2.

4. Numerical results. In this section, we present numerical results on the pro-


posed method for TR decomposition. We calculate the error between the obtained
TR decomposition and function f as
\sqrt{}
\sum \bigl( \bigr) 2
1 d
x\in \Omega Tr(H [x1 ] \cdot \cdot \cdot H [xd ]) - f (x1 , . . . , xd )
(40) E= \sum 2
.
x\in \Omega f (x1 , . . . , xd )

Whenever it is feasible, we let \Omega = [n]d . Otherwise, we subsample \Omega from [n]d
at random: For every x \in \Omega , xi is drawn from [n] uniformly at random. If the
dimensionality of f is large, we simply sample \Omega from [n]d at random. For the
proposed algorithm, we also measure the error on the entries sampled for learning

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


CONSTRUCTION OF TENSOR RING REPRESENTATIONS 1275

TR as
\sqrt{}
\sum \bigl( \bigr) 2
Tr(H 1 [x1 ] \cdot \cdot \cdot H d [xd ]) - f (x1 , . . . , xd )
Downloaded 07/21/22 to 132.174.251.2 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

x\in \cup k \Omega k


(41) Eskeleton = \sum 2
.
x\in \cup k \Omega k f (x1 , . . . , xd )

In the experiments, we compare our method, denoted as ITR-ALS (``I"" stands for
``initialized"") with TR-ALS proposed in [18]. In [18], the cost in (9) is minimized using
ALS where (11) is solved for each k in an alternating fashion. Although [18] proposed
an SVD-based initialization approach similar to the recursive SVD algorithm for TT
[13], this method has exponential complexity in d. Therefore the comparison with
such an initialization is omitted and we use a randomized intialization for TR-ALS.
As we shall see, ITR-ALS is generally an order of magnitude faster than TR-ALS,
due to the special structure of the samples. For each experiment we run both TR-
ALS and ITR-ALS five times and report the median accuracy. For TR-ALS, we often
have to use fewer samples such that the running time is not excessively long (recall
that TR-ALS has O(d2 ) complexity per iteration). To compare with the algorithm
in [4], we simply cite the results in [4] since the software is not publicly available.
We also compare ourselves with the density matrix renormalization group (DMRG)-
cross algorithm [16] (which gives a TT). As a method that is based on interpolative
decomposition, DMRG-cross is able to obtain a high quality approximation if we allow
a large TT-rank representation. Since we obtain the TR based on ALS optimization,
the accuracy may not be comparable to DMRG-cross. What we want to emphasize
here is that if the given situation only requires moderate accuracy, our method could
give a more economical representation than TT obtained from DMRG-cross. To
convey this message, we set the accuracy of DMRG-cross so that it matches the
accuracy of our proposed TR-ALS.
4.1. Example 1: A toy example. We first compress the function
1
(42) f (x1 , . . . , xd ) = \sqrt{} , xk \in [0, 1],
1 + x21 + \cdot \cdot \cdot + x2d
considered in [4] into a TR. The results are presented in Table 1. In this example,
we let s = 4 (recall that s is the size of \Omega envi
k ) in ITR-ALS. The number of samples
we can afford to use for TR-ALS is less than ITR-ALS due to the excessively long
running time since each iteration of TR-ALS has a complexity scaling of O(d2 ). In
this example, although sometimes ITR-ALS has lower accuracy than TR-ALS, the
running time of ITR-ALS is significantly shorter. In particular, for the case when
d = 12, TR-ALS fails to converge using the same amount of samples as ITR-ALS.
Both ITR-ALS and TR-ALS give TR with tensor components with smaller sizes than
TT. The error E reported for the case of d = 12 is obtained from sampling 105 entires
of the tensor f .
4.2. Example 2: Ising spin glass. In this example, we demosntrate the advan-
tage of ITR-ALS in compressing a high-dimensional function arising from many-body
physics, the traditional field where TT or MPS is extensively used [1, 19]. We consider
compressing the free energy of Ising spin glass with a ring geometry:
d \biggl[ \beta J
e i e - \beta Ji
\biggl[ \biggl( \prod \biggr] \biggr) \biggr]
1
(43) f (J1 , . . . , Jd ) = - log Tr .
\beta e - \beta Ji e\beta Ji
i=1

We let \beta = 10 and Ji \in \{ - 2.5, - 1.5, 1, 2\} , i \in [d]. This corresponds to an Ising model
with temperature of about 0.1K. The results are presented in Table 2. We let the
number of environment samples s = 5. When computing the error E for the case

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


1276 YUEHAW KHOO, JIANFENG LU, AND LEXING YING

Table 1
Results for Example 1. n corresponds to the number of uniform grid points on [0, 1] for each
xk . The tuple (r1 , . . . , rd ) indicates the rank of the learned TR and TT. Eskeleton is computed on
the samples used for learning the TR.
Downloaded 07/21/22 to 132.174.251.2 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Rank Number of observations


Setting Format Eskeleton E nd
Run time (s)
(r1 , . . . , rd )
d = 6, n = 10 ITR-ALS (3,3,3,3,3,3) 2.3e-03 6.3e-04 1.8e-01 4.7
TR-ALS (3,3,3,3,3,3) 4.3e-05 4.5e-05 2.8e-02 1360
TT (5,5,5,5,5,1) - 1.2e-04 - 2.4
TR[4] (3,3,3,3,3,3) - 2.3e-04 - -
d = 6, n = 20 ITR-ALS (3,3,3,3,3,3) 5.1e-04 9.4e-05 2.1e-02 24
TR-ALS (3,3,3,3,3,3) 5.0e-05 5.4e-05 8.2e-04 2757
TT (5,5,6,5,5,1) - 6.8e-05 - 7.1
TR[4] (3,3,5,6,6,6) - 1.8e-03 - -
(3,3,3,3,3,3
d = 12, n = 5 ITR-ALS 7.1e-04 5.9e-04 1.7e-04 28
3,3,3,3,3,3)
(3,3,3,3,3,3
TR-ALS 0.97 0.97 1.7e-04 3132
3,3,3,3,3,3)
(5,6,6,6,6,6
TT - 2.2e-05 - 2.9
6,6,5,5,5,1)

Table 2
Results for Example 2. Learning the free energy of Ising spin glass.

Rank Number of observations


Setting Format Eskeleton E nd
Run time (s)
(r1 , . . . , rd )
(4,4,4,4,4,4
d = 12, n = 4 ITR-ALS 3.9e-03 3.8e-03 1.6e-02 7
4,4,4,4,4,4)
(4,4,4,4,4,4
TR-ALS 4.4e-02 5.2e-02 1.6e-02 994
4,4,4,4,4,4)
(6,7,7,7,7,7
TT - 4.2e-03 - 2.8
7,7,7,6,4,1)
(3,3,3,3,3,3
3,3,3,3,3,3
d = 24, n = 4 ITR-ALS 4.8e-03 2.7e-03 1.6e-10 19
3,3,3,3,3,3
3,3,3,3,3,3)
TR -ALS - - - 1.6e-10 -
(6,8,8,8,6,6
6,6,6,6,7,6
TT - 3.7e-03 - 9.3
5,6,6,6,6,7
7,6,6,6,4,1)

of d = 24, due to the size of f , we simply subsample 105 entries of f , where Ji 's
are sampled independently and uniformly from \{ - 2.5, - 1.5, 1, 2\} . For d = 12, the
solution obtained by ITR-ALS is superior due to the initialization procedure. We see
that in both d = 12, 24 cases, the running time of TR-ALS is much longer compare
to ITR-ALS.
4.3. Example 3: Parametric elliptic partial differential equation (PDE).
In this section, we demonstrate the performance of our method in solving a parametric
PDE. We are interested in solving an elliptic equation with random coefficients
\biggl( \biggr)
\partial \partial
(44) a(x) u(x) + 1 = 0, x \in [0, 1],
\partial x \partial x
subject to a periodic boundary condition, where a(\cdot ) is a random field. In particular,
we want to parameterize the effective conductance function
\int \biggl( \biggr) 2
\partial
(45) Aeff (a(\cdot )) := a(x) u(x) + 1 dx
[0,1] \partial x

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


CONSTRUCTION OF TENSOR RING REPRESENTATIONS 1277
Table 3
Results for Example 3. Solving a parametric elliptic PDE.
Downloaded 07/21/22 to 132.174.251.2 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Rank Number of observations


Setting Format Eskeleton E nd
Run time (s)
(r1 , . . . , rd )
(3,3,3,3,3,3
d = 12, n = 3 ITR-ALS 1.1e-05 1.1e-05 1.4e-02 22
3,3,3,3,3,3)
(3,3,3,3,3,3
TR-ALS 5.7e-06 6.8e-06 1.4e-02 1414
3,3,3,3,3,3)
(5,5,5,5,5,5
TT - 2.5e-05 - 0.76
5,5,5,3,3,1)
(3,3,3,3,3,3
3,3,3,3,3,3
d = 24, n = 3 ITR-ALS 2.6e-05 2.8e-05 5.5e-06 47
3,3,3,3,3,3
3,3,3,3,3,3)
TR-ALS - - - 5.5e-06 -
(5,5,5,5,5,5
5,5,5,5,5,5
TT - 1.7e-05 - 1.5
5,5,5,5,5,5
5,5,5,3,3,1)

\sum d
as a TR. By discretizing the domain into d segments and assuming a(x) = i=1 ai \chi i (x),
where each ai \in [1, 2, 3] and \chi i 's being step functions on uniform intervals on [0, 1],
we determine Aeff (a1 , . . . , ad ) as a TR. In this case, the effective coefficients have an
analytic solution
d
\biggl( \sum \biggr) - 1
1
(46) Aeff (a1 , . . . , ad ) = ai
d i=1

and we use this formula to generate samples to learn the TR. For this example, we
pick s = 4. The results are reported in Table 3. When computing E with d = 24,
again 105 entries of f are subsampled, where the ai 's are sampled independently and
uniformly from \{ 1, 2, 3\} . We note that although in this situation, there is an analytic
formula for the function we want to learn as a TR, we foresee further usage of our
method when solving parametric PDEs with periodic boundary conditions, where
there is no analytic formula for the physical quantity of interest (for example for the
cases considered in [10]).
5. Conclusion. In this paper, we propose a method for learning a TR repre-
sentation based on ALS. Since the problem of determining a TR is a nonconvex op-
timization problem, we propose an initialization strategy that helps the convergence
of ALS. Furthermore, since using the entire tensor f in the ALS is infeasible, we
propose an efficient hierarchical sampling method to identify the important samples.
Our method provides a more economical representation of the tensor f than the TT
format. As for future works, we plan to investigate the performance of the algorithms
for quantum systems. One difficulty is that the Assumption 1 (Appendix 3) for the
proposed initialization procedure does not in general hold for quantum systems with
short-range interactions. Instead, a natural assumption for a quantum state exhibit-
ing a TR format representation is the exponential correlation decay [7, 2]. The design
of efficient algorithms to determine the TR representation under such an assumption
is left for future works. Another natural direction is to extend the proposed method to
tensor networks in higher spatial dimensions, which we shall also explore in the future.
Appendix A. Stability of initialization. In this section, we analyze the
stability of the proposed initialization procedure, where we relax Assumption 1 to
approximate Markovianity.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


1278 YUEHAW KHOO, JIANFENG LU, AND LEXING YING

Assumption 3. Let

\Omega z := (xc1 , xa\cup b , xc2 ) | xc1 \in [n]Lc1 , xc2 \in [n]Lc2 , xa\cup b = z
\bigl\{ \bigr\}
(47)
Downloaded 07/21/22 to 132.174.251.2 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

for some given z \in [n]La +Lb . For any z \in [n]La +Lb , we assume

\| f (\Omega z )c1 ;a\cup b\cup c2 \| 22


(48) \geq \alpha
\| f (\Omega z )c1 ;a\cup b\cup c2 \| 2F

for some 0 < \alpha \leq 1 if La , Lb \geq Lbuffer .


This assumption is a relaxation of Assumption 1. Indeed, if (48) holds for \alpha = 1,
it implies that f (\Omega z )c1 ;a\cup b\cup c2 is rank 1. Under Assumption 3, we want to show that
using Algorithm 4, one can extract H k 's approximately. The final result is stated in
Proposition 2, obtained via the next few lemmas. In particular, we show that when
the condition number \kappa of the TR components (defined in Lemma 1) satisfies \kappa = 1,
as \alpha \rightarrow 1, the approximation error goes to 0. In the first lemma, we show that B 1 , B 2
defined in Figure 7 are approximately rank-1.
Lemma 1. Let H c1 , H c2 , B 1 , B 2 be defined according to Figures 5 and 7, where
the sampling vectors s1 , s2 are defined in Proposition 1. If Lc1 , Lc2 , La , Lb \geq
max(L0 , Lbuffer ), then

\| B 1 \| 22 \| B 2 \| 22 \alpha
(49) , \geq 4 .
\| B 1 \| 2F \| B 2 \| 2F \kappa

Proof. By Assumption 3,
\bigm\| \bigl( c \bigr) T 1 \bigm\| 2
\bigm\| H 1
Lc1 +1,Lc1 +2;[Lc1 ] B \otimes B 2 HLc2c +1,Lc +2;[Lc ] \bigm\| 2
\alpha \leq \bigm\| \bigl( \bigr) T
2 2 2
\bigm\| 2
\bigm\| H c1 B 1 \otimes B 2 H c2 \bigm\|
Lc +1,Lc +2;[Lc ]
1 1 1
Lc +1,Lc +2;[Lc ] F 2 2 2

\| B 1 \otimes B 2 \| 22
\leq \kappa 2c1 \kappa 2c2
\| B 1 \otimes B 2 \| 2F
\| B 1 \| 22 \| B 2 \| 22
(50) = \kappa 2c1 \kappa 2c2 ,
\| B 1 \| 2F \| B 2 \| 2F

where \kappa c1 , \kappa c2 \leq \kappa are condition numbers of HLc1c +1,Lc +2;[Lc ] and HLc2c +1,Lc +2;[Lc ] ,
1 1 1 2 2 2
respectively.
Let pb (q b )T be the best rank-1 approximation to B 2 . Before registering the next
\~ [d]\setminus a in Figure 9.
corollary, we define H [d]\setminus b and H
Corollary 1. Under the assumptions of Lemma 1, for any sampling operator
s2 defined in Proposition 1,
[d]\setminus b
\| H[d - Lb ];d - Lb +1,d - Lb +2 vec(pb (q b )T ) - f[d]\setminus b;b s2 \| 22 \Bigl( \alpha \Bigr)
(51) \leq \kappa 2 1 - 4 .
\| f[d]\setminus b;b s2 \| 2F \kappa

Proof. Lemma 1 implies

\| HLb b +1,Lb +2;[Lb ] s2 - vec(pb (q b )T )\| 22


\| HLb b +1,Lb +2;[Lb ] s2 \| 22

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


CONSTRUCTION OF TENSOR RING REPRESENTATIONS 1279

𝐻 * ∖( =
Downloaded 07/21/22 to 132.174.251.2 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

2
𝑑 − 𝐿(
1 𝑑 − 𝐿( + 1 𝑑 − 𝐿( + 2

𝑑 − 𝐿$ + 1 𝑑 − 𝐿$ + 2
.[*]\2 =
𝐻 𝑑 − 𝐿$
1

𝐵4

\~ [d]\setminus a .
Fig. 9. Definition of H [d]\setminus b and H

\| B 2 - pb (q b )T \| 2F \| B 2 \| 2F - \| pb (q b )T \| 2F \alpha
(52) = 2 2 = 2 2 \leq 1 - 4 .
\| B \| F \| B \| F \kappa
Then
[d]\setminus b
\| H[d - Lb ];d - Lb +1,d - Lb +2 vec(pb (q b )T ) - f[d]\setminus b;b s2 \| 22
\| f[d]\setminus b;b s2 \| 22
[d]\setminus b
\| H[d - Lb ];d - Lb +1,d - Lb +2 \| 22 \| HLb b +1,Lb +2;[Lb ] s2 - vec(pb (q b )T )\| 22
\leq [d]\setminus b
\| H[d - Lb ];d - Lb +1,d - Lb +2 HLb b +1,Lb +2;[Lb ] s2 \| 22
\| HLb b +1,Lb +2;[Lb ] s2 - vec(pb (q b )T )\| 22
(53) \leq \kappa 2[d]\setminus b ,
\| HLb b +1,Lb +2;[Lb ] s2 \| 22
[d]\setminus b
where \kappa 2[d]\setminus b is the condition number of H[d - Lb ];d - Lb +1,d - Lb +2 . Recall that H b is
defined in Figure 5.
This corollary states that the situation in Figure 8 holds approximately. More
d - L
precisely, let T, T\^ \in \BbbR n b be defined as
[d]\setminus b
(54) T := H[d - Lb ];d - Lb +1,d - Lb +2 vec(pb (q b )T ), T\^ := f[d]\setminus b;b s2 ,

respectively, as demonstrated in Figure 10(a), where pb , q b appear in Corollary 1.


Corollary 1 implies
\| E\| 2F \Bigl( \alpha \Bigr)
(55) T = T\^ + E, \leq \kappa 2
1 - .
\| T\^\| 2F \kappa 4

In the following, we want to show that we can approximately extract the H k 's in
\~ c1
region a. For this, we need to take the right-inverses of H \~ c2
Lc1 +1;[Lc1 ] and HLc2 +1;[Lc2 ] ,
defined in Figure 10(b). This requires a singular value lower bound, provided by the
next lemma.
Lemma 2. Let \sigma k : \BbbR m1 \times m2 \rightarrow \BbbR be a function that extracts the kth singular
value of an m1 \times m2 matrix. Then
\~ c1
\sigma r (H 2 \~ c2 2 \surd \sqrt{}
Lc1 +1;[Lc1 ] ) \sigma r (HLc2 +1;[Lc2 ] ) 1 2 r \alpha
(56) [d]\setminus a
\geq 6
- 2
1 - 4
\| H\~ \| 2 \kappa \kappa \kappa
d - La +1,d - La +2;[d - La ] 2

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


1280 YUEHAW KHOO, JIANFENG LU, AND LEXING YING

𝑇=
Downloaded 07/21/22 to 132.174.251.2 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

𝑝" (𝑞" )& "#$ = "#+ =


𝐻 𝐻& = 𝐻
1 … 𝐿#$ 𝐿#+ … 1
𝑇=
𝑇/ =
𝐿#$ + 1 𝐿#+ + 1

𝐵( Region 𝑐( Region 𝑎 Region 𝑐)

Region 𝑐+ Region 𝑎 Region 𝑐(


(a) (b)

Fig. 10. (a) Definition of T and T\^. The dimensions in region a, c1 , c2 are grouped into
\scrI a , \scrI c1 , \scrI c2 , respectively, for the tensors T and T\^. (b) Individual components of T .

assuming

\surd
\sqrt{}
1 \alpha
(57) 4
- 2 r 1 - \geq 0.
\kappa \kappa 4

Proof. First,
a
\| H[L \| 2 \sigma 2 (H \~ c1 \~ c2 2
\sigma r2 (T\scrI a ;\scrI c1 \scrI c2 )2 a ];La +1,La +2 2 r Lc1 +1;[Lc1 ] \otimes HLc2 +1;[Lc2 ] )
\leq
\| T\^\scrI ;\scrI ,\scrI \| 2
a c1 c2 2 \| T\^\scrI ;\scrI ,\scrI \| 2
a c1 c2 2

a 2 \~ c1 \~ c2
\| H[La ];La +1,La +2 \| 2 \sigma r (HLc +1;[Lc ] )2 \sigma r (H 2
1 1
Lc2 +1;[Lc2 ] )
= [d]\setminus a
a
\| H[L \~ 2
a ];La +1,La +2
H d - La +1,d - La +2;[d - La ] \| 2
a
\| H[L \~ c1
\| 2 \sigma r (H \~ c2
)2 \sigma r (H )2
a ];La +1,La +2 2 Lc 1 +1;[Lc1 ] Lc 2 +1;[Lc2 ]
\leq
a \~ [d]\setminus a
)2 \| H 2
\sigma r2 (H[L a ];La +1,La +2 d - La +1,d - La +2;[d - La ] \| 2
\~ c1
\sigma r (H 2 \~ c2 2
2 Lc +1;[Lc1 ] ) \sigma r (HLc2 +1;[Lc2 ] )
(58) \leq \kappa 1
[d]\setminus a
.
\~
\| H 2
d - La +1,d - La +2;[d - La ] \| 2

The equality follows from

T\^\scrI a ;\scrI c1 ,\scrI c2 = H[L


a \~ [d]\setminus a
H
a ];La +1,La +2 d - La +1,d - La +2;[d - La ] ,

\~ [d]\setminus a in Figure 9.
which follows from (54), and the definition of H
Observe that

\sigma r2 (T\scrI a ;\scrI c1 ,\scrI c2 )2 \sigma r2 (T\^\scrI a ;\scrI c1 ,\scrI c2 )2 - 2\| E\| F \sigma r2 (T\^\scrI a ;\scrI c1 ,\scrI c2 ) + \| E\| 2F
\geq
\| T\^\scrI ;\scrI ,\scrI \| 2
a c1 c2 2 \| T\^\scrI ;\scrI ,\scrI \| 2
a c1 c2 2

\sigma r2 (T\^\scrI a ;\scrI c1 ,\scrI c2 )2 2\| E\| F \sigma r2 (T\^\scrI a ;\scrI c1 ,\scrI c2 )
\geq -
\| T\^\scrI a ;\scrI c1 ,\scrI c2 \| 2
2 \| T\^\scrI a ;\scrI c1 ,\scrI c2 \| 22
a
\sigma r2 (H[L \~ [d]\setminus a
)2 \sigma r2 (H 2
a ];La +1,La +2 d - La +1,d - La +2;[d - La ] )
\geq [d]\setminus a
a
\| H[L \~
\| 2 \| H 2
a ];La +1,La +2 2 d - La +1,d - La +2;[d - La ] \| 2

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


CONSTRUCTION OF TENSOR RING REPRESENTATIONS 1281

2\| E\| F \sigma r2 (T\^\scrI a ;\scrI c1 ,\scrI c2 )


-
\| T\^\scrI ;\scrI ,\scrI \| 2
a c1 c2 2
Downloaded 07/21/22 to 132.174.251.2 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

1 2\| E\| F \sigma r2 (T\^\scrI a ;\scrI c1 ,\scrI c2 )


\geq -
\kappa 4 \| T\^\scrI a ;\scrI c1 ,\scrI c2 \| 22
\surd
1 2 r\sigma r2 (T\^\scrI a ;\scrI c1 ,\scrI c2 )\| E\| F
\geq -
\kappa 4 \| T\^\scrI a ;\scrI c1 ,\scrI c2 \| 2 \| T\^\scrI a ;\scrI c1 ,\scrI c2 \| F
\surd
\sqrt{}
1 \alpha
(59) \geq - 2 r 1 - 4 ;
\kappa 4 \kappa
we established the claim. The first inequality regarding perturbation of singular values
follows from the theorem by Mirsky [11]:
\bigm| \sigma r2 (T\scrI a ;\scrI c ,\scrI c ) - \sigma r2 (T\^\scrI a ;\scrI c ,\scrI c )\bigm| \leq \| E\| 2 \leq \| E\| F ,
\bigm| \bigm|
(60) 1 2 1 2

and assuming \| E\| F \leq \sigma r2 (T\^\scrI a ;\scrI c1 ,\scrI c2 ) . Such an assumption holds when demanding
the lower bound in (59) to be nonnegative, i.e.,

\sigma r2 (T\^\scrI a ;\scrI c1 ,\scrI c2 )2 2\| E\| F \sigma r2 (T\^\scrI a ;\scrI c1 ,\scrI c2 ) \surd
\sqrt{}
1 \alpha
(61) - \geq - 2 r 1 - 4 \geq 0.
\| T\^\scrI a ;\scrI c1 ,\scrI c2 \| 22 \| T\^\scrI a ;\scrI c1 ,\scrI c2 \| 22 \kappa 4 \kappa

The last inequality follows from Corollary 1.


In the next lemma, we prove that when applying Algorithm 4 to T\^, where T\^ is
treated as a 3-tensor formed from grouping the dimensions in each of set \scrI a , \scrI c1 \scrI c2 ,
it gives a close approximation to T\^.
Lemma 3. Let
Lc
\Pi 1 := Y | Y = XX T , X \in \BbbR n 1 \times r , X T X = I ,
\bigl\{ \bigr\}
Lc
\Pi 2 := Y | Y = XX T , X \in \BbbR n 2 \times r , X T X = I ,
\bigl\{ \bigr\}
(62)

where I is the identity matrix. Let P1\ast \in \Pi 1 be the best rank-r projection for T\^\scrI c2 \scrI a ;\scrI c1
such that T\^\scrI c2 \scrI a ;\scrI c1 P1\ast \approx T\^\scrI c2 \scrI a ;\scrI c1 in the Frobenius norm, and

P2\ast = min \| (T\^\scrI a ;\scrI c1 \scrI c2 (I \otimes P2 ) - T\^\scrI a ;\scrI c1 \scrI c2 )(P1\ast \otimes I)\| 2F .
P2 \in \Pi 2

Then

(63) \| T\^\scrI a ;\scrI c1 \scrI c2 (I \otimes P2\ast )(P1\ast \otimes I) - T\^\scrI a ;\scrI c1 \scrI c2 \| 2F \leq 2\| E\| 2F .

Proof. To simplify the notations, let T\~\scrI a ;\scrI c1 \scrI c2 := T\^\scrI a ;\scrI c1 \scrI c2 (I \otimes P2 ). Then

min \| T\^\scrI a ;\scrI c1 \scrI c2 (I \otimes P2 )(P1\ast \otimes I) - T\^\scrI a ;\scrI c1 \scrI c2 \| 2F
P2 \in \Pi 2

= min \| (T\~\scrI a ;\scrI c1 \scrI c2 - T\^\scrI a ;\scrI c1 \scrI c2 + T\^\scrI a ;\scrI c1 \scrI c2 )(P1\ast \otimes I) - T\^\scrI a ;\scrI c1 \scrI c2 \| 2F
P2 \in \Pi 2

= min \| T\^\scrI a ;\scrI c1 \scrI c2 (I - P1\ast \otimes I)\| 2F + \| (T\~\scrI a ;\scrI c1 \scrI c2 - T\^\scrI a ;\scrI c1 \scrI c2 )(P1\ast \otimes I)\| 2F
P2 \in \Pi 2

\leq min \| T\^\scrI a ;\scrI c1 \scrI c2 (I - P1\ast \otimes I)\| 2F + \| T\~\scrI a ;\scrI c1 \scrI c2 - T\^\scrI a ;\scrI c1 \scrI c2 \| 2F
P2 \in \Pi 2

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


1282 YUEHAW KHOO, JIANFENG LU, AND LEXING YING

(64) = min \| T\^\scrI a ;\scrI c1 \scrI c2 (I - P1\ast \otimes I)\| 2F + \| T\^\scrI a ;\scrI c1 \scrI c2 (I - I \otimes P2 )\| 2F .
P2 \in \Pi 2

The inequality comes from the fact that P1\ast \otimes I is a projection matrix. Next,
Downloaded 07/21/22 to 132.174.251.2 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

(65) \| T\^\scrI a ;\scrI c1 \scrI c2 (I - P1\ast \otimes I)\| 2F + min \| T\^\scrI a ;\scrI c1 \scrI c2 (I - I \otimes P2 )\| 2F
P2 \in \Pi 2

= min \| T\^\scrI a ;\scrI c1 \scrI c2 (I - P1 \otimes I)\| 2F + min \| T\^\scrI a ;\scrI c1 \scrI c2 (I - I \otimes P2 )\| 2F
P1 \in \Pi 1 P2 \in \Pi 2

\leq \| E\| 2F + \| E\| 2F \leq 2\| E\| 2F ,

and we can conclude the lemma. The equality comes from the definition of P1\ast ,
whereas the inequality is due to the facts that P1 , P2 are rank-r projectors, and there
exists T such that T\^ = T - E, where rank(T\scrI c1 \scrI a ;\scrI c2 ), rank(T\scrI c1 ;\scrI a \scrI c2 ) \leq r.
We are ready to state the final proposition.
Proposition 2. Let

(66) T\^\scrI a ;\scrI c1 \scrI c2 := T\^\scrI a ;\scrI c1 \scrI c2 (I \otimes P2\ast )(P1\ast \otimes I),

where P1\ast , P2\ast are defined in Lemma 3. Then


(67)
\surd
a
\| H[La ];La +1,La +2
- T\^\scrI a ;\scrI c1 \scrI c2 (H
\~ c1 \~ c2
Lc +1;[Lc ] \otimes HLc +1;[Lc2 ] )
\dagger 2
\| F (1 + 2)2 \kappa 4 (1 - \kappa \alpha 4 )
a
1 1 2
\leq 1 \surd \sqrt{} ,
\| H[L \| 2
a ];La +1,La +2 F \kappa 4 - 2 r 1 - \kappa 4
\alpha

where ``\dagger "" is used to denote the pseudoinverse of a matrix, if the upper bound is
positive. When \kappa = 1 + \delta \kappa and \alpha = 1 - \delta \alpha , where \delta \kappa , \delta \alpha \geq 0 are small parameters,
we have
a
\| H[L a ];La +1,La +2
- T\^\scrI a ;\scrI c1 \scrI c2 (H
\~ c1
Lc
\~ c2
\otimes H )\dagger \| 2F
1 +1;[Lc1 ] Lc 2 +1;[Lc2 ]
(68) a \leq O(\delta \alpha + 4\delta \kappa ).
\| H[L \| 2
a ];La +1,La +2 F

Proof. From Lemma 3 and (55), we get

\| T\^\scrI a ;\scrI c1 \scrI c2 - T\scrI a ;\scrI c1 \scrI c2 \| F


= \| T\^\scrI ;\scrI \scrI (I \otimes P \ast )(P \ast \otimes I) - T\scrI
a c1 c2 2 1 a ;\scrI c1 \scrI c2 \| F
\leq \| T\^\scrI a ;\scrI c1 \scrI c2 (I \otimes P2\ast )(P1\ast \otimes I) - T\^\scrI a ;\scrI c1 \scrI c2 \| F + \| T\^\scrI a ;\scrI c1 \scrI c2 - T\scrI a ;\scrI c1 \scrI c2 \| F
\surd
(69) \leq (1 + 2)\| E\| F .

Recalling that

(70) a
H[L \~ c1
= T\scrI a ;\scrI c1 ,\scrI c2 (H \~ c2
\otimes H )\dagger ,
a ];La +1,La +2 Lc 1 +1;[Lc1 ] Lc 2 +1;[Lc2 ]

where the existence of a full-rank pseudoinverse is guaranteed by the singular value


lower bound in Lemma 2, we have
a
\| H[L a ];La +1,La +2
- T\^\scrI a ;\scrI c1 ,\scrI c2 (H
\~ c1
Lc +1;[Lc1 ]
\~ c2
\otimes H Lc
\dagger 2
+1;[Lc2 ] ) \| F
1 2

\| H[La ];La +1,La +2 \| 2F


a
\surd
\~ c1
2)2 \| E\| 2F \| (H \~ c2
\otimes H \dagger 2
(1 + Lc +1;[Lc1 ] Lc +1;[Lc2 ] ) \| 2
\leq a
1 2

\| H[L \| 2
a ];La +1,La +2 F

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


CONSTRUCTION OF TENSOR RING REPRESENTATIONS 1283
\surd
(1 + 2)2 \| E\| 2F
\leq
\~ c1 2 \~ c2 2 a 2
\sigma r (H Lc1 +1;[Lc1 ] ) \sigma r (HLc2 +1;[Lc2 ] ) \| H[La ];La +1,La +2 \| F
\surd 2
Downloaded 07/21/22 to 132.174.251.2 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

(1 + 2) \| T\^\| 2F \| E\| 2F
=
\~ c1 2 \~ c2 2 a 2 \^ 2
\sigma r (H Lc1 +1;[Lc1 ] ) \sigma r (HLc2 +1;[Lc2 ] ) \| H[La ];La +1,La +2 \| F \| T \| F
\surd
(1 + 2)2 \| H \~ [d]\setminus a 2
d - La +1,d - La +2;[d - La ] \| 2 \| E\| F
2
\leq
\sigma r (H \~ 1c 2 \~ 2
c \^\| 2
2 \| T
Lc1 +1;[Lc1 ] ) \sigma r (HLc2 +1;[Lc2 ] ) F
\surd 2 \biggl( \biggr)
(1 + 2) \alpha
(71) \leq \surd
2 r \sqrt{}
\kappa 2 1 - 4 .
1
6 - 2 1 - 4 \alpha \kappa
\kappa \kappa \kappa

The first inequality follows from (69) and (70), and the last inequality follows from
Corollary 1 and Lemma 2.
When La = Lc1 = Lc2 = 1, applying Algorithm 4 to T\^ results in T\^ (represented
by the tensors T a,L , T a,C , and T a,R ). Therefore, this proposition essentially implies
T a,C approximates H a up to a gauge transformation.

REFERENCES

[1] I. Affleck, T. Kennedy, E. H. Lieb, and H. Tasaki, Valence bond ground states in isotropic
quantum antiferromagnets, Comm. Math. Phys., 115 (1988), pp. 477--528.
[2] F. G. S. L. Brandao and M. Horodecki, Exponential decay of correlations implies area law,
Comm. Math. Phys., 333 (2015), pp. 761--798.
[3] E. J. Cand\ès and B. Recht, Exact matrix completion via convex optimization, Found. Com-
put. Math., 9 (2009), 717.
[4] M. Espig, K. K. Naraparaju, and J. Schneider, A note on tensor chain approximation,
Comput. Vis. Sci., 15 (2012), pp. 331--344.
[5] S. Friedland, V. Mehrmann, A. Miedlar, and M. Nkengla, Fast low rank approximations
of matrices and tensors, Electron. J. Linear Algebra, 22 (2011), 67.
[6] F. Ruvimovich Gantmacher and J. L. Brenner, Applications of the Theory of Matrices,
Dover, Mineola, NY, 2005.
[7] M. B. Hastings and T. Koma, Spectral gap and exponential decay of correlations, Comm.
Math. Phys., 265 (2006), pp. 781--804.
[8] F. L Hitchcock, The expression of a tensor or a polyadic as a sum of products, J. Math.
Phys., 6 (1927), pp. 164--189.
[9] Y. P. Hong and C.-T. Pan, Rank-revealing QR factorizations and the singular value decom-
position, Math. Comp., 58 (1992), pp. 213--232.
[10] Y. Khoo, J. Lu, and L. Ying, Solving parametric PDE problems with artificial neural net-
works, European J. Appl. Math., 32 (2021), pp. 421--435.
[11] L. Mirsky, Symmetric gauge functions and unitarily invariant norms, Quart. J. Math., 11
(1960), pp. 50--59.
[12] R. Orus, A practical introduction to tensor networks: Matrix product states and projected
entangled pair states, Ann. Phys., 349 (2013), pp. 117--158.
[13] I. Oseledets and E. Tyrtyshnikov, TT-cross approximation for multidimensional arrays,
Linear Algebra Appl., 432 (2010), pp. 70--88.
[14] I. V. Oseledets, Tensor-train decomposition, SIAM J. Sci. Comput., 33 (2011), pp. 2295--2317.
[15] D. Perez-Garcia, F. Verstraete, M. M. Wolf, and J. I. Cirac, Matrix product state
representations, Quantum Inf. Comput., 7 (2007), pp. 401--430.
[16] D. Savostyanov and I. Oseledets, Fast adaptive interpolation of multi-dimensional arrays
in tensor train format, in 2011 7th International Workshop on Multidimensional (nD)
Systems, IEEE, Piscataway, NJ, 2011, pp. 1--8.
[17] L. R. Tucker, Some mathematical notes on three-mode factor analysis, Psychometrika, 31
(1966), pp. 279--311.
[18] W. Wang, V. Aggarwal, and S. Aeron, Efficient low rank tensor ring completion, Proceed-
ings of the IEEE International Conference on Computer Vision, IEEE Computer Society,
Los Alamitos, CA, 2017, pp. 5697--5705.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


1284 YUEHAW KHOO, JIANFENG LU, AND LEXING YING

[19] S. R. White, Density matrix formulation for quantum renormalization groups, Phys. Rev.
Lett., 69 (1992), pp. 2863--2866.
[20] M. M. Wolf, F. Verstraete, M. B. Hastings, and J. I. Cirac, Area laws in quantum
Downloaded 07/21/22 to 132.174.251.2 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

systems: Mutual information and correlations, Phys. Rev. Lett., 100 (2008), 070502.
[21] M. Yuan and C.-H. Zhang, On tensor completion via nuclear norm minimization, Found.
Comput. Math., 16 (2016), pp. 1031--1068.
[22] Q. Zhao, G. Zhou, S. Xie, L. Zhang, and A. Cichocki, Tensor Ring Decomposition, preprint,
arXiv:1606.05535, 2016.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

You might also like