Venera Khoromskaia - Boris Khoromskij - Tensor Numerical Methods in Quantum Chemistry-De Gruyter (2018)
Venera Khoromskaia - Boris Khoromskij - Tensor Numerical Methods in Quantum Chemistry-De Gruyter (2018)
Khoromskij
Tensor Numerical Methods in Quantum Chemistry
Also of Interest
Tensor Numerical Methods in Scientific Computing
Boris N. Khoromskij, 2018
ISBN 978-3-11-037013-3, e-ISBN (PDF) 978-3-11-036591-7,
e-ISBN (EPUB) 978-3-11-039139-8
Tensor Numerical
Methods in Quantum
Chemistry
|
Mathematics Subject Classification 2010
65F30, 65F50, 65N35, 65F10
Authors
Dr. Venera Khoromskaia
Max-Planck Institute for
Mathematics in the Sciences
Inselstr. 22-26
04103 Leipzig
Germany
[email protected]
ISBN 978-3-11-037015-7
e-ISBN (PDF) 978-3-11-036583-2
e-ISBN (EPUB) 978-3-11-039137-4
www.degruyter.com
Contents
1 Introduction | 1
Bibliography | 271
Index | 287
1 Introduction
All truths are easy to understand once they are discovered;
the point is to discover them.
Galileo Galilei
https://doi.org/10.1515/9783110365832-001
2 | 1 Introduction
the number of summands. This idea is well known since it was proposed by Frank L.
Hitchcock in 1927 [131] in the form of so-called “canonical tensors”. Thus, the canon-
ical tensor format allows us to avoid the curse of dimensionality. This can be seen as
a discrete analogue of the representation of a multivariate function by a sum of sepa-
rable functions. The main problem of the canonical tensor format is in the absence of
stable algorithms for its representation from a full size tensor.
The Tucker tensor decomposition was invented in 1966 by Ledyard R. Tucker and
was used in principal component analysis problems in psychometrics, chemomet-
rics, and signal processing for calculating the amount of correlations in experimen-
tal data. Usually these data contain rather moderate number of dimensions; the data
sizes in every dimension are not large, and the accuracy issues are not significant. The
main advantage of the Tucker tensor format is in the existence of stable algorithms
for the tensor decomposition based on the higher-order singular value decomposition
(HOSVD) introduced by Lieven De Lathauwer et al. in [61, 60]. However, this Tucker
algorithm from multilinear algebra requires an available storage for a full format ten-
sor, nd , and exhibits the complexity of the order of O(nd+1 ) for the HOSVD. Rather low
compression rate by the Tucker tensor decomposition in problems of principal compo-
nent analysis could hardly promote this method for accurate calculations in scientific
computing.
The fascinating story with the grid-based tensor numerical methods in scientific
computing started in 2006, when it was proven that the error of the Tucker tensor
approximation applied to several classes of function-related tensors decays exponen-
tially fast in the Tucker rank [161]. That is, instead of a three-dimensional (3D) tensor
having n3 entries in a full format, one obtains its Tucker tensor approximation given by
only O(3n + log3 n) numbers, thus gaining an enormous compression rate. The related
analytical results on the rank bounds for canonical tensors based on the sinc approxi-
mation method had been proven earlier by Ivan Gavrilyuk, Wolfgang Hackbusch, and
Boris Khoromskij in [94, 91, 111].
In numerical tests for several classical multivariate functions discretized on
n × n × n 3D Cartesian grids, it was shown that the Tucker decomposition provides
an easily computable low-rank separable representation in a problem-adapted basis
[173]. Such beneficial separable representation enables efficient numerical treatment
of the integral transforms, and other computationally extensive operations with the
multivariate functions. However, the HOSVD in the Tucker decomposition requires
full format tensors, which is often not applicable for numerical modeling in physics
and quantum chemistry. Thus, the HOSVD does not break the curse of dimensionality,
and has, indeed, a limited significance in computational practice.
In this regard, an essential advancement was brought forth by the so-called re-
duced higher order singular value decomposition (RHOSVD), introduced by Boris
Khoromskij and Venera Khoromskaia as part of the canonical-to-Tucker (C2T) trans-
form [173, 174]. The latter works efficiently in cases where the standard Tucker de-
composition is unfeasible. It was demonstrated that for the Tucker decomposition
1 Introduction | 3
of function-related tensors given in the canonical form, for example, resulting from
analytic approximation and certain algebraic transforms, there is no need to build
a full-size tensor. It is enough to find the orthogonal Tucker basis only by using the
directional matrices of the canonical tensor, consisting of skeleton vectors in every
single dimension. The C2T decomposition proved to be an efficient tool for reducing
the redundant rank parameter in the large canonical tensors. Since the RHOSVD does
not require the full size tensor, it promoted further development of tensor methods
also to higher dimensions, because it applies to canonical tensors, which are free from
the curse of dimensionality.
Furthermore, the orthogonal Tucker vectors, being an adaptive basis of the Tucker
tensor representation, exhibited smooth oscillating shapes, which can be viewed as
“fingerprints” for a given multivariate function. This property facilitated the multigrid
Tucker decomposition proposed in [174, 146], which enables fast 3D tensor calculus in
electronic structure calculations using incredibly large grids. Further gainful proper-
ties of the multigrid approach for the tensor numerical methods are not yet compre-
hensively investigated. Since the rank-structured tensor decompositions are basically
working on Cartesian grids, the methodology developed for the finite difference meth-
ods, including the Richardson extrapolation techniques, yielding O(h3 ) accuracy in
the mesh size h can be applied.
The traditional methods for numerical solution of the Hartree–Fock equation have
been developed in computational quantum chemistry. They are based on the ana-
lytical computation of the arising two-electron integrals,1 convolution-type integrals
in ℝ3 , in the problem-adapted naturally separable Gaussian-type basis sets [3, 277,
128], by using erf -functions. This rigorous approach resulted in a number of efficient
program packages, which required years of development by large scientific groups
and which are nowadays widely used in scientific community. See, for example, [299,
292, 88], and other packages listed in Wikipedia. Other models in quantum chemistry,
like the density functional theory [251, 107, 268], usually apply a combination of rigor-
ously constructed pseudopotentials, and the grid-based wavefunctions, as well as, the
experimentally justified coefficients. In general, for solution of the multidimensional
problems in physics and chemistry, it is often best to approximate the multivariate
functions by sums of separable functions. However, the initial separable representa-
tion of functions may be deteriorated by the integral transforms and other operations,
leading to cumbersome computational schemes.
In such a way, the success of the analytical integration methods for the ab-initio
electronic structure calculations stems from the big amount of precomputed infor-
mation based on the physical insight, including the construction of problem-adapted
atomic orbitals basis sets, and elaborate nonlinear optimization for calculation of the
density-fitting basis. The known limitations of this approach appear due to a strong
dependence of the numerical efficiency on the size and quality of the chosen Gaussian
basis sets. These restrictions might be essential in calculations for larger molecules
and heavier atoms. Now, it is a common practice to reduce these difficulties by switch-
ing partially or completely to grid-based calculations. The conventional numerical
methods quickly encounter tractability limitations even for small molecules, and
when using moderate grid sizes. The real space multi-resolution approaches suggest
to reduce the grid size by local mesh refinements [122, 305], which may encounter
problems with computation of three-dimensional convolution integrals for functions
with multiple singularities.
The grid-based tensor-structured numerical methods were first developed for
solving challenging problems in electronic structure calculations. The main ingredi-
ents include the low-rank grid representation of multivariate functions and operators,
and tensor calculation of the multidimensional integral transforms, introduced by the
authors in 2007–2010, [166, 187, 145, 146, 147, 168]. An important issue was the possi-
bility for comparison of the results of tensor-based computations with the outputs of
benchmark quantum chemical packages, which use the analytical methods for cal-
culating the three-dimensional convolution integrals [300]. It was shown that tensor
calculation of the multidimensional convolution operators is reduced to a sequence of
one-dimensional convolutions and one-dimensional Hadamard and scalar products
[145, 146]. Such reduction to one-dimensional operations enables computations on
exceptionally fine tensor grids. The initial multilevel tensor-structured solver for the
Hartree–Fock equation was based on the calculation of the Coulomb and exchange
integral operators “on-the-fly”, using a sequence of refined uniform grids, thus avoid-
ing precomputation and storage of the two-electron integrals tensor [146, 187]. The
disadvantage of this version is rather substantial time consumption. This solver is
discussed in Chapter 8.
Further progress in tensor methods in electronic structure calculations was pro-
moted by a fast algorithm for the grid-based computation of the two-electron integrals
(TEI) [157, 150] in O(Nb3 ) storage in the number of basis functions Nb . The fourth-order
TEI tensor is calculated in a form of low-rank Cholesky factorization by using an al-
gebraic black-box-type “1D density fitting”scheme, which applies to the products of
discretized basis functions. Using the low-rank tensor representation of the Newton
convolving kernel and that of the products of basis functions, all represented on n×n×n
Cartesian grid, the 3D integral transforms are calculated in O(n log n) complexity. The
corresponding algorithms are described in Chapter 10.
The elaborated tensor-based Hartree–Fock solver [147], described in Chapter 11,
employs factorized representation of the two-electron integrals and tensor calculation
of the core Hamiltonian, including the three-dimensional Laplace and nuclear poten-
tial operators [156]. In the course of self-consistent iteration for solving the Hartree–
Fock eigenvalue problem, due to factorized representation of TEI, the update of the
Coulomb, and exchange parts in the Fock matrix, is reduced to cheap algebraic opera-
tions. Owing to grid representation of basis functions, the basis sets are not restricted
1 Introduction | 5
2 The paper on QTT approximation was first published in September 2009 as the Preprint 55/2009 of
the Max-Planck Institute for Mathematics in the Sciences in Leipzig.
6 | 1 Introduction
3 The method works also for other types of multivariate radial basis functions p(‖x‖).
8 | 1 Introduction
u1
[u ]
[ 2] n
u=[
[ .. ] ∈ ℝ .
]
[.]
[un ]
To show that it is a column vector, one can write it explicitly, u ∈ ℝn×1 . Transpose of a
column vector uT is a row vector, uT ∈ ℝ1×n .
Products of column and row vectors give different results depending on the order
of multiplication. Multiplying a row vector with a column vector, we obtain a scalar
product of vectors, and the result is a number. That is, the scalar (or inner) product of
two vectors uT ∈ ℝ1×n and v ∈ ℝn×1 is the real number given by
v1
[v ]
[ 2]
uT v = [u1 u2 ⋅⋅⋅ [ .. ] = u1 v1 + u2 v2 + ⋅ ⋅ ⋅ + un vn .
un ] [ ]
[.]
[vn ]
https://doi.org/10.1515/9783110365832-002
10 | 2 Rank-structured formats for multidimensional tensors
u1 u1 v1 u1 v2 ⋅⋅⋅ u1 vn
[u ] [u v u2 v2 ⋅⋅⋅ u2 vn ]
[ 2] [ 2 1 ]
A = u ⊗ v = uvT = [
[ .. ] ⋅ [v1
] v2 ⋅⋅⋅ vn ] = [
[ .. .. .. .. ]]. (2.1)
[ . ] [ . . . . ]
[um ] [um v1 um v2 ⋅⋅⋅ um vn ]
where the number aij is called the entry of the matrix. A matrix A is an element of the
linear vector space ℝm×n equipped with an Euclidean scalar product
m n
⟨A, B⟩ = ∑ ∑ aij bij (2.2)
i=1 j=1
m n
‖A‖ = √∑ ∑ a2ij . (2.3)
i=1 j=1
Computation of the Frobenius norm of a general matrix needs O(nm) operations. But
for the rank-1 matrices, A = u ⊗ v = uvT , the norm
m n m n m n
‖A‖ = √∑ ∑(ui vj )2 = √∑ ∑ u2i vj2 = √∑ u2i ∑ vj2 = ‖u‖ ⋅ ‖v‖
i=1 j=1 i=1 j=1 i=1 j=1
When multiplying the row vector and the matrix, the entries of the resulting vector
are computed by scalar products of the vector and columns vectors of a matrix (their
sizes should coincide),
where
Matrix–matrix multiplication can be explained from the point of view of the matrix–
vector multiplication (2.4). For two matrices A ∈ ℝm×n and B ∈ ℝn×p , their product is a
matrix
C = AB, C ∈ ℝm×p .
As we can see, each entry of the resultant matrix C is obtained as the scalar product of
two n-vectors. Complexity of multiplication of two square matrices is O(n3 ). If one of
matrices is given as an R-term sum of tensor products of vectors, then the complexity
of multiplication is O(Rn2 ).
12 | 2 Rank-structured formats for multidimensional tensors
Figure 2.1: A matrix A in the basis of column vectors of the matrix U yields the matrix AU .
The Kronecker product is an operation on matrices in linear algebra that maps matri-
ces to a matrix. The Kronecker product of matrices A ∈ ℝm×n and B ∈ ℝp×q is defined
by
Factorized low-rank representation of matrices reduces the cost of linear algebra op-
erations considerably. There is a number of methods for decomposing a matrix into
a sum of tensor products of vectors. In the following, we discuss briefly the singular
value decomposition (SVD), the QR-factorization, and the Cholesky decomposition.
There is a large number of routines on various platforms that can be applied to calcu-
late these decompositions. For convenience, we refer to the corresponding commands
in Matlab.
(1) We start with an eigenvalue decomposition (EVD), which diagonalizes a ma-
trix, that is, finds a basis in which the symmetric matrix becomes diagonal. The eigen-
value decomposition for the symmetric matrix requires the full set of eigenvectors and
eigenvalues for the algebraic problem
Au = λu.
A Matlab command
[V,D] = eig(A)
Theorem 2.1. Let A ∈ ℝm×n , with m ≤ n, for definiteness. Then there exist U ∈ ℝm×m ,
Σ ∈ ℝm×n , and V ∈ ℝn×n such that
A = UΣV T , (2.7)
Here, matrices U and V include the full set of left and right singular vectors, respec-
tively,
14 | 2 Rank-structured formats for multidimensional tensors
where ui , vi are the respective left and right singular vectors of A. The approximation
error in the Frobenius norm is bounded by a sum of squares of discarded singular
values:
n
‖Ar − A‖F ≤ √ ∑ σi2 . (2.8)
i=r+1
[U,S,V] = svd(A)
produces a diagonal matrix S of singular values and the orthogonal matrices U and V
whose columns are the corresponding singular vectors so that A = USV T .
(3) LU decomposition represents a matrix as a product of lower and upper triangu-
lar matrices. This decomposition is commonly used in the solution of linear systems
of equations. For the LU decomposition
A = LT U,
[L,U] = lu(A)
so that A = LT U.
(4) The orthogonal-triangular decomposition is called QR factorization that is
A = QR.
[Q,R] = qr(A)
2.1 Some notions from linear algebra | 15
and produces an m-by-n upper triangular matrix R and an m-by-m unitary matrix Q so
that A = QR, QT Q = I.
(5) Cholesky decomposition of a symmetric non-negative definite matrix A,
A = RT R
produces an upper triangular matrix R satisfying the equation RT R = A. The chol func-
tion in MATLAB,
R = chol(A)
In Example 1 below, we present a simple MATLAB script for testing the decay of the
singular values of several matrices. First, a two-dimensional Slater function, e−α‖x‖ ,
is discretized in a square box [−b/2, b/2]2 using the n × n 2D Cartesian grids with n =
65, 257, and n = 513, and the SVD is computed for the resulting matrices. Figure 2.2
(left) shows exponentially fast decay of singular values for all three matrices nearly
independently on the matrix size.
Figure 2.2: Decay of singular values for a matrix generated by a Slater function (left) and for a matrix
containing random valued entries (right).
16 | 2 Rank-structured formats for multidimensional tensors
Next, we compose matrices of the same size, but using the generator of random num-
bers in the interval [0, 1]. The singular values of these matrices are shown in Figure 2.2
(right). They are not decaying fast, as was the case for the function related matrix.
%____Example 1____________________
clear; b=10; alp=1;
figure(1);
[Fun,sigmas,x,y] = Gener_Slat(65,b,alp); semilogy(sigmas);
hold on; grid on;
[~,sigmas,~,~]= Gener_Slat(257,b,alp); semilogy(sigmas,'r');
[~,sigmas,~,~]= Gener_Slat(513,b,alp); semilogy(sigmas,'black');
grid on; axis tight; set(gca,'fontsize',16);
hold off;
figure(2); mesh(x,y,Fun);
figure(3);
A1 = rand(65,65); [~,S1,~]= svd(A1); semilogy(diag(S1));
hold on; grid on;
A = rand(257,257); [~,S1,~]= svd(A); semilogy(diag(S1),'r');
A = rand(513,513); [~,S1,~]= svd(A); semilogy(diag(S1),'black');
grid on; axis tight; set(gca,'fontsize',16);
hold off;
%______________________
function [Fun1,sigmas,x,y]=Gener_Slat(n1,b,alpha1)
h1=b/(n1-1); x=-b/2:h1:b/2; y=-b/2:h1:b/2;
Fun1=zeros(n1,n1);
for i=1:n1
Fun1(i,:)= exp(-alpha1*sqrt(x(1,i)^2 +y(1,:).^2));
end
[~,S1,~]=svd(Fun1); sigmas=diag(S1);
end
%____________end of Example 1____________________________
Note that the slope in the Slater function in Example 1 is controlled by the parameter
“alp”. One can generate a Slater function with sharper or smoother shape by changing
the parameter “alp” and observe nearly the same behavior of singular values.
Example 2 demonstrates the error of the approximation to the discretized Slater
function (given by a matrix A) by a sum of tensor products of the singular vectors with
the first m = 18 singular values.
m
A = u1 σ1 vT1 + ⋅ ⋅ ⋅ + um σm vTm = ∑ σi ui vT .
i=1
2.1 Some notions from linear algebra | 17
Figure 2.3: A matrix representing a discretized two-dimensional Slater function (left) and the error of
its rank-18 factorized representation (right).
When running this program, figure (3) works as an “animation” picture, where one
can distinctly notice diminishing of the error of the approximation within the loop
while adding to approximation more summands with smaller singular values. Fig-
ure 2.3 (left) shows the original discretized function with cusp at zero, and Figure 2.3
(right) shows the final approximation error for rank r = m = 18.
%____Example 2_________________________________________
b=10; n=412; h1=b/n;
x=-b/2:h1:b/2; [~,n1]=size(x);
y=-b/2:h1:b/2;
A1=zeros(n1,n1); alpha1=1;
for i=1:n1
A1(i,:)= exp(-alpha1*sqrt(x(1,i)^2 +y(1,:).^2));
end
figure(1); mesh(x,y,A1);
[U1 S1 V1]=svd(A1); sigmas=diag(S1);
figure(5); semilogy(sigmas);
Let us consider a rank-R matrix M = ABT ∈ ℝn×n , with the factor matrices A ∈ ℝn×R and
B ∈ ℝn×R , where R ≤ n. We are interested in the best rank r approximation of M, with
r < R. It can be implemented using the following algorithm that avoids the singular
value decomposition of the target matrix M with possibly large n.
This algorithm includes the following steps:
(1) Perform the QR-decomposition of the side matrices,
A = QA RA , B = QB RB ,
with the unitary matrices QA , QB ∈ ℝn×R and the upper triangular matrices RA , RB ∈
ℝR×R .
(2) Compute the SVD of the core matrix, RA RTB ∈ ℝR×R
RA RTB = UΣV T ,
The approximation error is bounded by √∑Ri=r+1 σi2 . The complexity of the above algo-
rithm scales linear in n, O(nR2 ) + O(R3 ). In the case R ≪ n, this reduces dramatically
the cost O(n3 ) of the truncated SVD applied to the full-format n × n matrix M.
Low-rank approximation of matrices by using only partial information can be
computed by heuristic adaptive cross approximation (ACA) methods developed in
[286, 99, 287, 288, 15, 289, 14, 223], see also literature therein. Dynamical low-rank
approximation of matrices has been analyzed in [191].
in principal component analysis and image, and signal processing, are presented in
[54, 270, 1, 193]. Nowadays, there is an extensive research on tensor decomposition
methods in computer science towards big data analysis, see for example [2, 55].
Notice that the tensor decompositions have been used in computer science mostly
for quantitative analysis of correlations in the multidimensional data arrays obtained
from experiments, without special requirements on the accuracy of decompositions.
Usually these data arrays have been considered for a small number of dimensions
(modes) and moderate mode sizes.
A mathematical approval and analysis of the Tucker tensor decomposition algo-
rithm was presented in 2000 in the seminal works of L. De Lathauwer, B. De Moor,
and J. Vandewalle on the higher-order singular value decomposition [61] and on the
best rank-(r1 , . . . , rd ) orthogonal Tucker approximation of higher-order tensors [60].
The higher-order singular value decomposition (HOSVD) provides a generalization of
the matrix singular value decomposition [98]. The main limitation of the Tucker al-
gorithm from computer science [61, 193, 78] is the requirement to have a storage for
full size tensor nd , as well as the complexity of HOSVD, O(nd+1 ), which includes the
singular value decomposition of directional unfolding matrices. This makes HOSVD
and the corresponding Tucker decomposition algorithm practically unfeasible for the
problems in electronic structure calculations, and for solving multidimensional PDEs.
However, multilinear algebra with the Tucker tensor decomposition via HOSVD
was one of the starting points for the tensor numerical methods. In what follows, we
recall the tensor formats and main algorithms [60, 9] from multilinear algebra, where
the techniques are being developed in view of the arbitrary content of the multidimen-
sional arrays.
In forthcoming chapters, we shall see that the content of a tensor matters and that
for function-related multidimensional arrays, even the standard multilinear algebra
algorithms provide amazing results. One can further enhance the schemes by taking
into account the predictions from approximation theory [111, 161] on the exponentially
fast convergence of the Tucker/CP decompositions in tensor rank applied to the grid-
based representation of the multidimensional functions and operators.
Let us start with the multilinear algebra approach to rank-structured tensor ap-
proximation, taking into account a general tensor content.
d
𝕍n = ⨂ ℝnℓ ,
ℓ=1
(A + B)i = ai + bi
The linear vector space 𝕍n of tensors is equipped with the Euclidean scalar product
⟨⋅, ⋅⟩ : 𝕍n × 𝕍n → ℝ defined as
‖A‖F := √⟨A, A⟩
d
N = ∏ nℓ , that is, for nℓ = n, N = nd .
ℓ=1
This phenomenon is often called the “curse of dimensionality”. As a result, any mul-
tilinear operations with tensors given in full format (2.9), for example, computation of
a scalar product, have an exponential complexity scaling O(nd ).
Some multilinear algebraic operations with tensors of order d (d ≥ 3), can be re-
duced to the standard linear algebra by unfolding of a tensor into a matrix.
Unfolding of a tensor A ∈ ℝI1 ×⋅⋅⋅×Id along the ℓ-mode1 arranges the ℓ-mode columns
of a tensor to be the columns of the resulting unfolding matrix. Figure 2.4 shows un-
folding of a 3D tensor. The unfolding of a tensor is a matrix whose columns are the
respective fibers2 along ℓ-mode, ℓ = 1, . . . , d.
1 Note that in multilinear algebra the notion “mode” is often used for designating the particular di-
mension. ℓ-mode means the dimension number ℓ. Also, tensors of order d are called d-dimensional
tensors.
2 Fibers along mode ℓ are generalization of notions of rows and columns for matrices.
2.2 Introduction to multilinear algebra | 21
whose columns are the respective fibers [193] of A along the ℓth mode such that the
tensor entry ai1 i2 ...id is mapped into the matrix element aiℓ j where the long index is given
by
d k−1
j = 1 + ∑ (ik − 1)Jk , with Jk = ∏ nm .
k=1,k =ℓ̸ m=1,m=ℓ̸
The size of the unfolding matrix A(1) in the above Matlab example is 5 × 70, whereas
the size of the unfolding matrix A(3) is 10 × 35.
Another important tensor operation is the so-called contracted product of two ten-
sors. This operation is similar to matrix–matrix multiplication with the difference that
for matrices it is important that matrices are positioned in a proper way for multipli-
cation over the compatible size; in the case of tensors, one explicitly determines the
22 | 2 Rank-structured formats for multidimensional tensors
Definition 2.2 ([59]). Contracted product: Given a tensor A ∈ ℝI1 ×⋅⋅⋅×Id and a matrix
M ∈ ℝJℓ ×Iℓ , we define the respective mode-ℓ tensor–matrix product by
where3
nℓ
bi1 ...iℓ−1 jℓ iℓ+1 ...id = ∑ ai1 ...iℓ−1 iℓ iℓ+1 ...id mjℓ iℓ , jℓ ∈ Jℓ .
iℓ =1
The examples of contractions of a tensor with a matrix are shown in the subroutine
for the Tucker decomposition algorithm presented in Section 3.
The tensor–matrix contracted product can be applied successively along several
modes, and it can be shown to be commutative:
(A ×ℓ M) ×m P = (A ×m P) ×ℓ M = A ×ℓ M ×m P, ℓ ≠ m.
We notice the convenience of notation of type ×ℓ since it gives explicitly the mode
number, which is subjected to contraction.
Figure 2.5 illustrates a sequence of contracted products of a tensor A ∈ ℝn1 ×n2 ×n3
with matrices M3 ∈ ℝr3 ×n3 , M2 ∈ ℝr2 ×n2 , and M1 ∈ ℝr1 ×n1 as follows:
– Contraction of tensor A in mode ℓ = 3 with the matrix M3 ∈ ℝr3 ×n3 yields a tensor
A3 of size n1 × n2 × r3 ,
– Contraction of tensor A3 in mode ℓ = 2 with the matrix M2 ∈ ℝr2 ×n2 yields a tensor
A2 of size n1 × r2 × r3 ,
– Contraction in the mode 1 with the matrix M1 ∈ ℝr1 ×n1 yields the tensor A1 ∈
ℝr1 ×r2 ×r3 ,
As a result of all contractions, the original tensor A is represented in the basis given
by matrices M1 , M2 , and M3 .
3 Here the sign “×ℓ ” denotes contraction over the mode number ℓ.
2.2 Introduction to multilinear algebra | 23
Figure 2.5: A sequence of contracted products in all three modes of a tensor A with the correspond-
ing matrices M3 , M2 , and M1 .
As we mentioned in the previous section, the number of entries in a full format tensor
grows exponentially in dimension d.
To get rid of exponential scaling in the dimension, we are interested in the rank-
structured representations of tensors. The simplest rank-structured tensor is con-
structed by tensor product of vectors u(ℓ) = {u(ℓ) }n ∈ ℝnℓ , which forms the canonical
iℓ iℓ =1
rank-1 tensor
ucts
d
⟨U, V⟩ := ∏⟨u(ℓ) , v(ℓ) ⟩,
ℓ=1
which can be calculated in O(dn) operations. Recall that for d = 2, the tensor product
of two vectors, u ∈ ℝI and v ∈ ℝJ , represents a rank-1 matrix (see also equation (2.1)
in Section 2.1),
u ⊗ v = uvT ∈ ℝI×J .
Definition 2.3. The canonical tensor format: Given a rank parameter R ∈ ℕ, we denote
by 𝒞 R ⊂ 𝕍n a set of tensors that can be represented in the canonical format,
R
U = ∑ ξν u(1) (d)
ν ⊗ ⋅ ⋅ ⋅ ⊗ uν , ξν ∈ ℝ, (2.13)
ν=1
The storage for a tensor in the canonical format is dRn ≪ nd . Figure 2.6 visualizes
a canonical tensor in 3D.
Note that an analogue of the canonical tensor is the representation of a multivari-
ate function f (x1 , x2 , . . . , xd ) ∈ ℝd by a sum of R separable functions:
R
f (x1 , x2 , . . . , xd ) = ∑ f1,k (x1 )f2,k (x2 ) ⋅ ⋅ ⋅ fd,k (xd ),
k=1
and the diagonal tensor ξ := diag{ξ1 , . . . , ξR } such that ξν1 ,...,νd = 0 except when ν1 =
⋅ ⋅ ⋅ = νd with ξν,...,ν = ξν (ν = 1, . . . , R), we obtain the equivalent contracted product
representation of the rank-R canonical tensor,
The canonical tensor representation is helpful for the multilinear tensor operations. In
Section 2.2.4 it is shown that the bilinear tensor operations with tensors in the rank-R
canonical format have linear complexity
d
O( ∑ nℓ ), or O(dRn) if nℓ = n,
ℓ=1
with respect to both the univariate grid size n of a tensor and the dimension parame-
ter d. The disadvantage of this representation is lack of fast and stable algorithms for
the best approximation of arbitrary tensors in the fixed-rank canonical format.
The other commonly used tensor format, introduced by Tucker [284], is the
rank-(r1 , . . . , rd ) Tucker tensor format. It is based on a representation in subspaces
d
𝕋r := ⨂ 𝕋ℓ of 𝕍n for certain 𝕋ℓ ⊂ 𝕍ℓ
ℓ=1
Definition 2.4. The Tucker tensor format: For given rank parameter r = (r1 , . . . , rd ), we
denote by 𝒯 r the subset of tensors in 𝕍n represented in the Tucker format
r1 rd
A = ∑ ⋅ ⋅ ⋅ ∑ βν1 ,...,νd v(1) (d)
ν1 ⊗ ⋅ ⋅ ⋅ ⊗ vνd ∈ 𝕍n , (2.15)
ν1 =1 νd =1
Iℓ
with some vectors v(ℓ)
νℓ ∈ 𝕍ℓ = ℝ (1 ≤ νℓ ≤ rℓ ), which form an orthonormal basis of
r
rℓ -dimensional subspaces 𝕋ℓ = span{v(ℓ)
ν }ν=1 (ℓ = 1, . . . , d).
ℓ
26 | 2 Rank-structured formats for multidimensional tensors
is called the core tensor. We call the parameter r = minℓ {rℓ } the minimal Tucker rank.
Figure 2.7 visualizes a Tucker tensor decomposition of a tensor A in ℝn1 ×n2 ×n3 .
Note that for problems in signal processing or principal component analysis, some
of the mode sizes of the core tensor, i. e., the Tucker rank rℓ , may be close to the original
tensor size nℓ in the corresponding mode.
Introducing the (orthogonal) side matrices V (ℓ) = [v(ℓ) 1 ⋅ ⋅ ⋅ v(ℓ)
rℓ ] such that
T
V (ℓ) V (ℓ) = Irℓ ×rℓ , we then use a tensor-by-matrix contracted product notation to
represent the Tucker decomposition of A(r) ∈ 𝒯 r in a compact form,
Remark 2.5. Notice that the representation (2.17) is not unique, since the tensor A(r) is
invariant under directional rotations. In fact, for any set of orthogonal rℓ × rℓ matrices
Yℓ (ℓ = 1, . . . , d), we have the equivalent representation
A(r) = β
̂× V
1
̂ (1) ×2 V
̂ (2) ⋅ ⋅ ⋅ ×d V
̂ (d) ,
with
β
̂ = β × Y × Y ⋅⋅⋅ × Y ,
1 1 2 2 d d
̂ (ℓ) = V (ℓ) Y T ,
V ℓ = 1, . . . , d.
ℓ
2.2 Introduction to multilinear algebra | 27
r
Remark 2.6. If the subspaces 𝕋ℓ = span{vν(ℓ) }ν=1
ℓ
⊂ 𝕍ℓ are fixed, then the approxima-
tion A(r) ∈ 𝒯 r of a given tensor A ∈ 𝕍n is reduced to the orthogonal projection of A
onto the particular linear space 𝕋r = ⨂dℓ=1 𝕋ℓ ⊂ 𝒯 r,n , that is,
r
A(r) = ∑ ⟨vν(1)
1
⊗ ⋅ ⋅ ⋅ ⊗ vν(d)
d
, A⟩vν(1)
1
⊗ ⋅ ⋅ ⋅ ⊗ vν(d)
d
ν1 ,...,νd =1
T T
= (A ×1 V (1) ×2 ⋅ ⋅ ⋅ ×d V (d) ) ×1 V (1) ×2 . . . ×d V (d) .
This property plays an important role in the computation of the best orthogonal Tucker
approximation, where the “optimal” subspaces 𝕋ℓ are recalculated within a nonlinear
iteration process.
over all tensors A ∈ 𝒮 = {𝒯 r,n }. Here, 𝒮0 might be the set of Tucker or CP tensors with
the rank parameter substantially larger than r.
As the basic nonlinear approximation scheme, we consider the best orthogonal
rank-(r1 , . . . , rd ) Tucker approximation for the full format input, corresponding to the
choice 𝒮0 = 𝒯 r,n . Tensors A ∈ 𝒯 r , are parameterized as in (2.17), with the orthogonal-
ity constraints
where
n×r
𝒱n,r := {Y ∈ ℝ : Y T Y = Ir×r ∈ ℝr×r } (2.19)
The key point for the efficient solution of the minimization problem (2.18) over
tensor manifold 𝒮 = 𝒯 r,n is its equivalent reformulation as the dual maximization
problem [60],
r 2
[Z (1) , . . . , Z (d) ] = argmax[⟨vν(1)
1
⊗ ⋅ ⋅ ⋅ ⊗ vν(d)
d
, A⟩]ν=1 𝔹 (2.20)
r
Lemma 2.7 ([60]). For given A0 ∈ ℝI1 ×⋅⋅⋅×Id , the minimization problem (2.18) on 𝒯 r is
equivalent to the dual maximization problem
T T 2
g(V (1) , . . . , V (d) ) := A0 ×1 V (1) ×2 ⋅ ⋅ ⋅ ×d V (d) → max (2.21)
over a set V (ℓ) ∈ ℝnℓ ×rℓ from the Grassman manifold, i. e., V (ℓ) ∈ 𝒢ℓ (ℓ = 1, . . . , d). For
given maximizing matrices Z (m) (m = 1, . . . , d), the core tensor β minimizing (2.18) is
represented by
T T
β = A0 ×1 Z (1) ×2 ⋅ ⋅ ⋅ ×d Z (d) ∈ ℝr1 ×⋅⋅⋅×rd . (2.22)
The best (nonlinear) Tucker approximation by solving the dual maximization problem
(2.20) is usually solved numerically by the ALS iteration, combined with the so-called
higher order SVD (HOSVD), introduced by De Lathauwer et al. in [61] and [60], respec-
tively. We recall the theorem from [61].
Theorem 2.8 (dth-order SVD, HOSVD, [61]). Every real (complex) n1 ×n2 ×⋅ ⋅ ⋅×nd -tensor
A can be written as the product
in which
(1) V (ℓ) = [V1(ℓ) V2(ℓ) ⋅ ⋅ ⋅ Vn(ℓ)
ℓ
] is a unitary nℓ × nℓ -matrix;
(2) 𝒮 is a complex n1 × n2 × ⋅ ⋅ ⋅ × nd -tensor of which the subtensors 𝒮iℓ =α , obtained by
fixing the ℓth index to α, have the following properties:
2.2 Introduction to multilinear algebra | 29
(i) all-orthogonality: two subtensors 𝒮iℓ =α and 𝒮iℓ =β are orthogonal for all possible
values of ℓ, α, and β subject to α ≠ β:
(ii) ordering: ‖𝒮iℓ =1 ‖ ≥ ‖𝒮iℓ =2 ‖ ≥ ⋅ ⋅ ⋅ ≥ ‖𝒮iℓ =nℓ ‖ ≥ 0 for all positive values of ℓ.
The Frobenius norms ‖𝒮iℓ =i ‖, symbolized by σi(ℓ) , are ℓ-mode singular values of A(ℓ) and
the vector Ui(ℓ) is the ith ℓ-mode left singular vector of A(ℓ) .
Another theorem from [61] proves the error bound for the truncated HOSVD. It
states that for the HOSVD of A, as given in Theorem 2.8 with the ℓ-mode rank of A,
rank(A(ℓ) ) = Rℓ (ℓ = 1, . . . , d), the tensor à obtained by discarding the smallest ℓ-mode
singular values σr(ℓ)+1 , σr(ℓ)+2 , . . . , σR(ℓ) for given values of rℓ (ℓ = 1, . . . , d), (i. e., setting the
ℓ ℓ ℓ
corresponding parts of 𝒮 equal to zero) provides the following approximation error
R1 R2 Rd
̃ 2 ≤ ∑ σ (1) 2 + ∑ σ (2) 2 + ⋅ ⋅ ⋅ + ∑ σ (d) 2 .
‖A − A‖ i 1i 2 i d
i1 =r1 +1 i2 =r2 +1 id =rd +1
We refer to the original papers [60, 61] on the detailed discussions of above theory
which was an important step for applying tensor decompositions in scientific com-
puting.
Figure 2.8 illustrates the statements of above theorems by an example of a cubic
third-order tensor A. It shows the core tensor 𝒮 and the matrices V (1) , V (2) , and V (3)
from (2.24). The size of the core tensor 𝒮 is the same as the size of the original tensor A,
except that it is now represented in the orthogonal basis, given by matrices V1 , V2 , and,
V3 . The core tensor of the truncated HOSVD is colored by yellow.
The orthogonality of subtensors 𝒮iℓ =α and 𝒮iℓ =β follows from the fact that these matri-
ces originate from reshaping of the orthogonal vectors in the matrix (W (ℓ) )T of the SVD
of the respective matrix unfolding of A for modes ℓ, ℓ = 1, 2, 3,
T
A(ℓ) = V (ℓ) Σ(ℓ) (W (ℓ) ) .
30 | 2 Rank-structured formats for multidimensional tensors
Note that matrices V (ℓ) , ℓ = 1, 2, 3, obtained as a result of the singular value decom-
position of the corresponding matrix unfolding of A for modes ℓ, ℓ = 1, 2, 3, initially
have the same size as the original tensor. Based on the truncated HOSVD, their size re-
duction can be performed taking into account the decay of singular values in Σ(ℓ) and
then discarding the smallest singular values subject to some threshold ε > 0. It corre-
sponds to the choice of first r (ℓ) vectors in V (ℓ) , as shown in Figure 2.8. The sizes r (ℓ)
may be different, depending on the chosen threshold and the structure of the initial
tensor A.
Next, we recall the Tucker decomposition algorithm for full format tensors, intro-
duced by De Lathauwer et al. in [60]. It is based on the initial guess by HOSVD and the
alternating least square (ALS) iteration.
T T T T
B = A ×1 Vk(1) ×2 ⋅ ⋅ ⋅ ×q−1 Vk(q−1) ×q+1 Vk−1
(q+1) (d)
⋅ ⋅ ⋅ ×d Vk−1 . (2.26)
(3) Set V (ℓ) = Vk(ℓ) , and compute the core β as the representation coefficients of the
max
r
orthogonal projection of A onto 𝕋n = ⨂dℓ=1 𝕋ℓ , with 𝕋ℓ = span{v(ℓ)
ν }ν=1 (see Re-
ℓ
mark 2.6),
T T
β = A ×1 V (1) ×2 ⋅ ⋅ ⋅ ×d V (d) ∈ 𝔹r .
The computational costs are the following: (1) the HOSVD cost is W = O(dnd+1 ); (2) the
costs of ALS procedure: each iteration has the cost O(dr d−1 n min{r d−1 , n}+dnd r), which
represents the expense of SVDs and the computation of matrix unfoldings B(q) . The
last step, i. e., computation of the core tensor, has the cost O(r d n).
2.2 Introduction to multilinear algebra | 31
Let us comment the Tucker decomposition algorithm from [61] for a third-order
tensor A ∈ ℝn1 ×n2 ×n3 by using Figures 2.9, 2.10, and 2.11. Set the Tucker ranks as r1 , r2 ,
and r3 , respectively.
Figure 2.9: The initial guess for the Tucker decomposition is computed by HOSVD via SVD of the
ℓ-mode unfolding matrices, ℓ = 1, 2, 3.
(1) At the first step, the truncated SVD is computed for three unfolding matrices A(1) ∈
ℝn1×n2 n3 , A(2) ∈ ℝn2×n3 n1 , and A(3) ∈ ℝn3×n1 n2 , as shown in Figure 2.9. Every SVD needs
O(n4 ) computer operations if we simplify nℓ = n. Thus, it is the most storage/time
consuming part of the algorithm.
(2) At the ALS iteration step of the scheme, the construction of the “single-hole” ten-
sors, given by (2.26), allows to reduce essentially the cost of computation of the best
mappings for the Tucker modes. The construction of a single-hole tensor for ℓ = 3
by contractions with matrices V (ℓ) for all ℓ, but one, is shown in Figure 2.10. As illus-
trated in Figure 2.11, the truncated SVD is performed for a tensor unfolding of much
smaller size, since it is already partially mapped into the Tucker projection subspaces
𝕋ℓ ∈ 𝒯 r,n , except the single mode ℓ = 1 from the original tensor space 𝕍n , for which
the mapping matrix is being updated. The ALS procedure is repeated kmax times for
every mode ℓ, ℓ = 1, . . . , d, of the tensor.
(3) At the last step of the algorithm, the core tensor is computed using contraction
of the original tensor with updated side matrices Vr(ℓ) ℓ
, ℓ = 1, . . . , d.
With fixed kmax , the overall complexity of the algorithm for d = 3, nℓ = n, and
rℓ = r, ℓ = 1, 2, 3, is estimated by
where different summands denote the cost of initial HOSVD of A, computation of un-
folding matrices B(q) , related SVDs, and computation of the core tensor, respectively.
Notice that the Tucker model applied to the general fully populated tensor of size
nd requires O(dnd+1 ) arithmetical operations, due to the presence of complexity domi-
nating HOSVD. Hence, in computational practice this algorithm applies only to small
d and moderate n.
We conclude that the ALS Tucker tensor decomposition algorithm poses severe
restriction on the size of available tensors. For example, for the conventional laptop
computers this is restricted to 3D tensors of size less than 2003 , which is not satisfac-
tory for real space calculations in quantum chemistry. This restriction will be avoided
for function-related tensors when using the multigrid Tucker tensor decomposition
discussed in Section 3.1.
2.2 Introduction to multilinear algebra | 33
We have observed that the canonical and Tucker tensor formats provide representa-
tions by using sums of tensor product of vectors. Hence, the standard operations with
tensors are reduced to one-dimensional operations in corresponding dimensions, ex-
actly in the same way, as it is done for rank-structured matrices (see Section 2.1).
The main point here is the rank of the tensor, that is, the number of tensor product
summands. However, the separation rank parameter is hard to be controlled for ten-
sors containing unstructured or experimental data. Due to addition/multiplication of
ranks in every rank-structured operation, after several steps we may have a “curse of
ranks” instead of curse of dimensions. However, it will be shown in Chapter 3 that for
function related tensors things become different, due to their intrinsically low ε-ranks.
Moreover, for tensors approximating functions and operators, it is possible to provide
means for reducing their ranks after a sequence of tensor operations.
For the sake of clarity (and without loss of generality), in this section we assume
that r = rℓ , n = nℓ (ℓ = 1, . . . , d). If there is no confusion, the index n can be skipped. We
denote by W the complexity of various tensor operations (say, W⟨⋅,⋅⟩ ) or the related stor-
age requirements (say, Wst(β) ). We estimate the storage demands Wst and complexity of
the following standard tensor-product operations: the scalar product, the Hadamard
(component-wise) product, and the convolution transform. We consider the multilin-
ear operations in 𝒯 r,n and 𝒞 R,n tensor classes.
The Tucker model requires
storage to represent a tensor. The storage for the rank-R canonical tensor scales lin-
early in d,
Setting R = αr with α ≥ 1, we can specify the range of parameters where the Tucker
model is less storage consuming compared with the canonical one
r1 rd
A1 = ∑ ⋅ ⋅ ⋅ ∑ βν1 ,...,νd u(1) (d)
ν1 ⊗ ⋅ ⋅ ⋅ ⊗ uνd ∈ 𝕍n ,
ν1 =1 νd =1
r1 rd (2.29)
A2 = ∑ ⋅ ⋅ ⋅ ∑ ζμ1 ,...,νd v(1) (d)
μ1 ⊗ ⋅ ⋅ ⋅ ⊗ vμd ∈ 𝕍n ,
μ1 =1 μd =1
r1 r2 d
⟨A1 , A2 ⟩ := ∑ ∑ βk1 ...kd ζm1 ...md ∏⟨u(ℓ)
k
, v(ℓ)
mℓ ⟩. (2.30)
ℓ
k=1 m=1 ℓ=1
In fact, applying the definition of the scalar product in (2.10) to the rank-1 tensors (with
R = r = 1), we have
Then, the above representation follows by combining all rank-1 terms in the left-hand
side in (2.30).
We further simplify and suppose that r = r1 = r2 = (r, . . . , r). The calculation in
(2.30) then includes dr 2 scalar products of vectors of size n plus r 2d multiplications,
leading to the overall complexity
W⟨⋅,⋅⟩ = O(dnr 2 + r 2d ),
whereas for calculation of the respective tensor norm, the second term reduces to
O(r d ).
Note that in the case of mixed Tucker-canonical decomposition (see Definition
3.15), the scalar product can be computed in O(R2 + dr 2 n + dR2 r) operations (cf. [161],
Lemma 2.8).
For given tensors A, B ∈ ℝℐ , the Hadamard product A ⊙ B ∈ ℝℐ of two tensors of
the same size ℐ is defined by the componentwise product,
(A ⊙ B)i = ai ⋅ bi , i ∈ ℐ.
Again, applying definition (2.10) to the rank-1 tensors (with β = ζ = 1), we obtain
Then, (2.32) follows by summation over all rank-1 terms in A1 ⊙A2 . Relation (2.32) leads
to the storage requirement
Wst(⊙) = O(dr 2 n + r 2d ),
which includes the memory size for d modes n × r × r Tucker vectors, and for the new
Tucker core of size (r 2 )d .
Summation of two tensors is performed by concatenation of the side matrices,
their orthogonalization and recomputation of the Tucker core.
nℓ
with normalized vectors u(ℓ) k
, v(ℓ)
m ∈ ℝ . For simplicity of discussion, we assume that
nℓ = n, ℓ = 1, . . . , d. We have
(1) A sum of two canonical tensors, given by (2.34), can be written as
R1 R2
A1 + A2 = ∑ ck u(1)
k
⊗ ⋅ ⋅ ⋅ ⊗ u(d)
k
+ ∑ bm v(1) (d)
m ⊗ ⋅ ⋅ ⋅ ⊗ vm , (2.35)
k=1 m=1
resulting in the canonical tensor with the rank, at most, RS = R1 + R2 . This operation
has no cost since it is simply a concatenation of side matrices.
(2) For given canonical tensors A1 , A2 , the scalar product (2.10) is computed by
(see (2.31))
R1 R2 d
⟨A1 , A2 ⟩ := ∑ ∑ ck bm ∏⟨u(ℓ)
k
, v(ℓ)
m ⟩. (2.36)
k=1 m=1 ℓ=1
(3) For A1 , A2 given by (2.34), we tensorize the Hadamard product by (see (2.33))
R1 R2
A1 ⊙ A2 := ∑ ∑ ck bm (u(1)
k
⊙ v(1) (d) (d)
m ) ⊗ ⋅ ⋅ ⋅ ⊗ (uk ⊙ vm ). (2.37)
k=1 m=1
r ̂ = O(|log ε|).
https://doi.org/10.1515/9783110365832-003
38 | 3 Rank-structured grid-based representations of functions in ℝd
In this section, we demonstrate that, for a wide class of function-related tensors, the
Tucker decomposition provides a separable approximation with exponentially fast
decay of the error with respect to the rank parameter.
The particular properties of the Tucker decomposition for function-related ten-
sors led to invention of the multigrid Tucker decomposition method, which allows one
to reduce numerical complexity dramatically. Moreover, for function-related tensors,
the novel canonical-to-Tucker (C2T) transform and the reduced higher-order singular-
value decomposition (RHOSVD) were developed in [174], which made a tremendous
impact on the evolution of tensor numerical methods. C2T transform provides a stable
algorithm for reducing a large canonical tensor rank arising in the course of bilinear
matrix–tensor and tensor–tensor operations.
The C2T algorithm has pushed forward the grid-based numerical methods for cal-
culating the 3D integral convolution operators for functions with multiple singulari-
ties [174], providing the accuracy level comparable with the analytical evaluation of
the same integrals. In turn, the RHOSVD applies to the tensor in the canonical form
(say, resulting from certain algebraic transforms or analytic approximations), and it
does not need building a full tensor for the Tucker decomposition. Indeed, it is enough
to find the orthogonal Tucker basis only for directional matrices of the canonical ten-
sor, which consists of skeleton vectors in every single dimension [174]. Presumably,
the invention of the RHOSVD and the C2T algorithm anticipated development of the
tensor formats avoiding the “curse of dimensionality”.
We conclude that coupling the multi-linear algebra of tensors and the nonlinear
approximation theory resulted in the tensor-structured numerical methods for mul-
tidimensional PDEs. The prior results on theory of tensor-product approximation of
multivariate functions and operators [94, 91, 111, 161] were significant prerequisites for
understanding and developing the tensor numerical methods. First, we sketch some
of the basic results in approximation theory.
In the following, we choose the set 𝒮 of rank-structured (formatted) tensors within the
above defined tensor classes and call the elements in 𝒮 as 𝒮 -tensors.
To perform computation in the low-parametric tensor formats (say, in the course of
rank-truncated iteration), we need to perform a nonlinear “projection” of the current
iterand onto 𝒮 . This action is fulfilled by using the tensor truncation operator T𝒮 :
𝕍n,d → 𝒮 defined by
A0 by its approximation in the tensor class 𝒮 is called the tensor truncation to 𝒮 and
is denoted by T𝒮 A0 .
There are analytic and algebraic methods of approximate solution to the problem
(3.2) applicable to different classes of rank-structured tensors 𝒮 . The target tensor may
arise, in particular, as the grid-based representation of regular enough functions, say,
solutions of PDEs or some classical Green’s kernels. The storage and numerical com-
plexity for the elements in 𝒮 are strictly determined by the rank parameters involved
in the parametrization within given tensor format. In view of the relation r ̂ = O(|log ε|)
between the Tucker rank and the corresponding approximation error ‖A0 −T‖𝕍n , which
is the case for the wide class of function-related tensors [161], one may expect in the
PDE-related applications the O(log n) ranks asymptotic in the univariate mode size
of the n⊗d tensors living on n × ⋅ ⋅ ⋅ × n tensor grid. Our experience in the numerical
simulations in electronic structure calculations confirms this hypothesis.
Such optimistic effective rank bounds justify the benefits of tensor numerical
methods in large-scale scientific computing, indicating that these methods are not
just heuristic tools but rigorously approved techniques.
In what follows, we discuss the low rank approximation of a special class of higher-
order tensors, also called function-related tensors (FRTs), obtained by sampling the
multi-variate function over n × ⋅ ⋅ ⋅ × n tensor grid in ℝd . These data directly arise from:
(a) A separable approximation of multi-variate functions;
(b) Nyström/collocation/Galerkin discretization of integral operators with the Green’s
kernels;
(c) The tensor-product approximation of some analytic matrix-valued functions.
ℳℓ := {mℓ : mℓ = (iℓ , jℓ ), iℓ , jℓ ∈ In } (ℓ = 1, . . . , d)
Definition 3.1 (FRT by collocation). Let p = 2. Given the function g : ∈ ℝpd → ℝ and
tensor-product basis set (3.3), we introduce the coupled variable ζi(ℓ) := (xi(ℓ) , yℓ ) in-
ℓ ℓ
cluding the collocation point xi(ℓ) and yℓ ∈ Π, the pair mℓ := (iℓ , jℓ ) ∈ ℳℓ and define
ℓ
the collocation-type dth-order FRT by A ≡ A(g) := [am1 ...md ] ∈ ℝℳ1 ×⋅⋅⋅×ℳd with the
tensor entries
The key observation is that there is a natural duality between separable approx-
imation of the multivariate generating function g and the tensor-product decompo-
sition of the related multidimensional array A(g). As result, the canonical decompo-
sitions of A(g) can be derived by using a corresponding separable expansion of the
generating function g (see [111, 116] for more details).
Lemma 3.2 ([113]). Suppose that a multivariate function g : Ω ⊂ ℝpd → ℝ can be accu-
rately approximated by a separable expansion
R
gR (ζ ) := ∑ μk Φ(1)
k
(ζ (1) ) ⋅ ⋅ ⋅ Φ(d)
k
(ζ (d) ) ≈ g(ζ ), ζ = (ζ (1) , . . . , ζ (d) ) ∈ ℝpd , (3.5)
k=1
j
Vk(ℓ) = {∫ Φ(ℓ)
k
(ζi(ℓ) )ψℓ (yℓ )dyℓ } ∈ ℝℐℓ ×𝒥ℓ , ℓ = 1, . . . , d, k = 1, . . . , R. (3.6)
(i,j)∈ℳℓ
3.1 Super-compression of function-related tensors | 41
Then the FRT A(R) approximates A(g) with the error estimated by
A(g) − A(R) (gR )∞ ≤ C‖g − gR ‖L∞ (Ω) .
Dδ := {z ∈ ℂ : |ℑm z| < δ}
Given f ∈ H 1 (Dδ ), the step size of the quadrature h > 0, and M ∈ ℕ0 , the corresponding
(2M + 1)-point sinc-quadrature approximating the integral ∫ℝ f (ξ )dξ reads
M
TM (f , h) := h ∑ f (kh) ≈ ∫ f (ξ )dξ . (3.7)
k=−M ℝ
−2πδaM/ log(2πaM/b)
∫ f (ξ )dξ − TM (f , h) ≤ CN(f , Dδ )e .
ℝ
for all spatial dimensions ℓ = 1, . . . , d, where h > 0 is the mesh parameter of the spacial
grid. For the ease of exposition, we simplify further and set ρ ≡ ρ(ζ ) = ∑dℓ=1 ρ0 (ζ (ℓ) ),
i. e., ρℓ = ρ0 (xℓ , yℓ ) (ℓ = 1, . . . , d) with ρ0 : [a, b]2 → ℝ+ . For i ∈ In , let {x̄i } be the set of
cell-centered collocation points on the univariate grid of step size h in [a, b]. For each
i, j ∈ In , we introduce the parameter dependent integral
ℝ2
d
f (z) := φ (z)𝒢 (φ(z)) ∏ Ψiℓ jℓ (φ(z)) (3.13)
ℓ=1
belongs to the Hardy space H 1 (Dδ ) with N(f , Dδ ) < ∞ uniformly in (i, j);
(c) the function f (t), t ∈ ℝ, in (3.13) has either exponential (c1) or hyper-exponential
(c2) decay as t → ±∞ (see Proposition 3.3).
Under the assumptions (a)–(c), we have that, for each M ∈ ℕ, the FRT A(g), defined
on [a, b]d , allows an exponentially convergent symmetric1 canonical approximation
A(R) ∈ 𝒞 R with Vk(ℓ) as in (3.6), where the expansion (3.5) is obtained by substitution of f
from (3.13) into the sinc-quadrature (3.7) such that we have
−αM ν
A(g) − A(R) ∞ ≤ Ce with R = 2M + 1, (3.14)
1 2πδb
where ν = 2
and α = √2πδb in case (c1), and ν = 1 and α = log(2πaM/b)
in case (c2).
Theorem 3.4 proves the existence of the canonical decomposition of the FRT A(g)
with the Kronecker rank r = 𝒪(|log ε| log 1/h) (in case (c2)) or r = 𝒪(log2 ε) (in case
(c1)), which provide an approximation of order 𝒪(ε). In our applications, we usually
have 1/h = 𝒪(n), where n is the number of grid-points in one spacial direction. The-
orem 5.12 applies to translation invariant or spherically symmetric (radial) functions,
in particular, to the classical Newton, Yukawa, Helmholtz, and Slater-type kernels
where x, y ∈ ℝ3 , λ > 0; see [111] for the case of Newton kernel. We refer to [163, 164],
where the sinc-based CP approximations to the Yukawa and Helmholtz kernels have
been analyzed. In particular, the low-rank Tucker approximations to the Slater and
Yukawa kernels have been proven in [161] and in [164].
A0 ≡ A0 (g) := [ai1 ...id ] ∈ ℝI1 ×⋅⋅⋅×Id with ai1 ...id := g(xi(1) , . . . , xi(d) ), (3.15)
1 d
which are the nodes of equally spaced subintervals, with the mesh size hℓ =
2bℓ /(nℓ − 1); see Figure 3.1. When using an odd discretization parameter, the func-
tion is samples in the nodes of the grid, for example, for d = 3,
For functions in ℝ3 , we generate a tensor A ∈ ℝn1 ×n2 ×n3 with entries aijk = g(xi(1) ,
xj(2) , xk(3) ). We test the rank-dependence of the Tucker approximation to the function-
related tensors A. Based on the examples of some classical Green’s kernels, one can
figure out if it is possible to rely on the Tucker tensor approximation to obtain alge-
braically their low-rank separable tensor representations. We consider the Slater-type,
Newton, and Helmholtz kernels in ℝ3 , which have the typical singularity at the origin.
The initial tensor A0 is approximated by a rank r = (r, . . . , r) Tucker representation
A(r) , where the rank-parameter r increases from r = 1, 2, . . . to some predefined value,
rmax . Then the orthogonal Tucker vectors and the core tensor of size r × r × r are used
for the construction back to full size tensor corresponding to A(r) , for estimating the
error of tensor decomposition, ‖A0 − A(r) ‖ for the given rank. For every Tucker rank r in
the respective range, we compute the relative error in the Frobenius norm as in (2.10)
‖A0 − A(r) ‖
EFN = (3.18)
‖A0 ‖
3.1 Super-compression of function-related tensors | 45
‖A0 ‖ − ‖A(r) ‖
EFE = . (3.19)
‖A0 ‖
Notice that the projection property of the Tucker decomposition we have reads
‖A(r) ‖ ≤ ‖A0 ‖.
(1) Slater function. The Slater-type functions play a significant role in electronic struc-
ture calculations. For example, the Slater function given by
represents the electron “orbital” (α = 1) and the electron density function (α = 2) cor-
responding to the Hydrogen atom. Here and in the following, ‖x‖ = √∑dℓ=1 xℓ2 denotes
the Euclidean norm of x ∈ ℝd .
We compute the rank-(r, r, r) Tucker approximation to the function-related tensor
defined in the nodes of the n1 × n2 × n3 3D Cartesian grid with n1 = 65, n2 = 67, and
n3 = 69 in the interval 2b = 10. The slice of the discretized Slater function at the
middle of the z-axis is shown in Figure 3.2, top-left. Figure 3.2, top-right, shows the fast
exponential convergence of the approximation errors EFN , (3.18), and EFE , (3.19), with
respect to the Tucker rank. Thus, the Slater function can be efficiently approximated
by low-rank Tucker tensors. In fact, Tucker rank r = 10 provides a maximum absolute
error of the approximation of order 10−5 , and r = 18 provides approximation with
accuracy ∼10−10 . Note that the error of the Tucker tensor approximation only slightly
depends on the discretization parameter n. The corresponding numerical tests will
be demonstrated further in the section on multigrid tensor decomposition, since the
standard Tucker algorithm is practically restricted to univariate grid size of the order
of nℓ ≈ 200.
Figure 3.2, bottom-left, shows the example of the orthogonal vectors of the dom-
inating subspaces of the Tucker tensor decomposition. Note that the vectors corre-
sponding to the largest entries in the Tucker core exhibit essentially smooth shapes.
Figure 3.2, bottom-right, presents the entries of the Tucker core tensor β ∈ ℝ7×7×7 by
displaying its first four matrix slices Mβ,νr ∈ ℝ7×7×1 , νr = 1, . . . , 4. Numbers inside the
figure indicate the maximum values of the core entries at a given slice Mβ,νr ∈ ℝ7×7×1
of β. Figure 3.2 shows that the “energy” of the decomposed function is concentrated in
several upper slices of the core tensor, and the entries of the core are also decreasing
fast from slice to slice.
(2) Newton kernel. The best rank-r Tucker decomposition algorithm with r = (r, . . . , r)
is applied for approximating the Newton kernel [173]
1
g(x) = , x ∈ ℝ3 ,
‖x‖
46 | 3 Rank-structured grid-based representations of functions in ℝd
Figure 3.2: Top: discretized Slater function (left) and the error of its Tucker tensor approximation
versus the Tucker rank (right). Bottom: orthogonal vectors of the Tucker decomposition (left) and
entries of the Tucker core.
in the cube [−b, b]3 with b = 5, on the cell-centered uniform grid with discretization
parameter n = 64.
We consider the sampling points xi(ℓ) = h/2 + (i − 1)h, ℓ = 1, 2, 3, for three space
variables x(ℓ) . Figure 3.3, top-left, shows the potential at the plane close to zero point
(at z = h/2), and the top-right figure displays the absolute error of its approximation
with the Tucker rank r = 18, demonstrating the accuracy about 10−10 . Figure 3.3, left-
bottom, shows stable exponential convergence of the errors (3.18) and (3.19) with re-
spect to the Tucker rank. In particular, it follows that accuracy of the order of 10−5 , is
achieved with the Tucker rank r = 10, and for 10−3 , one can chose the rank r = 7. The
right hand side of Figure 3.3, bottom, shows the orthogonal vectors v(1) k
, k = 1, . . . , 6,
for the mode ℓ = 1 (x (1) -axis).
sin(κ‖x‖)
g1 (x) = with x = (x1 , x2 , x3 )T ∈ ℝ3 ,
‖x‖
3.1 Super-compression of function-related tensors | 47
Figure 3.3: Top: the plane of the 3D Newton potential (left), and the error of its Tucker tensor approxi-
mation with the rank r = 18. Bottom: decay of the Tucker approximation error versus the Tucker rank
(left) and the orthogonal Tucker vectors v(1)
k
, k = 1, . . . , 6.
Figure 3.4: Top: the plane of the 3D Helmholz potential sin‖x‖‖x‖ over cross-section (left) and the ab-
solute error for its Tucker approximation with the rank r = 6 (right). Bottom: decay of the Tucker
approximation error in the Frobenius norm with respect to the Tucker rank (left) and the orthogonal
Tucker vectors vk(1) , k = 1, . . . , 6, for the Helmholz potential sin‖x‖‖x‖ (right).
and the target A0 is estimated by the square of the relative Frobenius norm of A(r) −A0 ,
which was confirmed by the numerics above.
Lemma 3.5 (Quadratic convergence in norms). Let A(r) ∈ ℝI1 ×⋅⋅⋅×Id solve the minimiza-
tion problem (2.18) over A ∈ 𝒯 r . Then we have the “quadratic” relative error bound
Figure 3.5: Top: the slice of the 3D Helmholz potential sin‖x‖ ‖3x‖
over cross-section (left) and the abso-
lute error for its Tucker approximation with rank r = 10 (right). Bottom: decay of the Tucker approx-
imation error in the Frobenius norm with respect to the Tucker rank (left) and the orthogonal Tucker
vectors v(1)
k
, k = 1, . . . , 6, corresponding to sin‖x‖
‖3x‖
(right).
Figure 3.6: Decay of the Tucker approximation error in the Frobenius norm with respect to the
Tucker rank (left) and the orthogonal Tucker vectors v(1)
k
, k = 1, . . . , 6, for the Helmholz potential
cos ‖x‖
‖x‖
(right).
50 | 3 Rank-structured grid-based representations of functions in ℝd
We recommend to copy and paste first the main program from Example 3, and then
add all subroutines to the end of this file.
The Tucker_full_3D_ini function in subroutine 1 computes the Tucker decompo-
sition of a 3D tensor A3 for given Tucker ranks ir. Note that the initial guess by HOSVD
is computed only in the first call (since it is a repeated procedure for every call of the
function), and then it is stored in the auxiliary structure Ini. Function TnormF com-
putes the Frobenius norm of a given tensor A. The number of ALS iterations here is
chosen by kmax = 3.
%_______________subroutine 1____________________________________________
function [U1,U2,U3,LAM3F,Ini] = Tucker_full_3D_ini(A3,NR,kmax,ir,Ini)
[n1,n2,n3]=size(A3);
R1=NR(1); R2=NR(2); R3=NR(3); %nd=3;
[~,nstep]=size(Ini.U1);
if ir == 1
%___ Fase I - Initial Guess
D= permute(A3,[1,3,2]); B1= reshape(D,n1,n2*n3);
[Us, ~, ~]= svd(double(B1),0); U1=Us(:,1:R1);
Ini.U1=Us(:,1:nstep);
Ini.B1=B1;
D= permute(A3,[2,1,3]); B2= reshape(D,n2,n1*n3);
[Us, ~, ~]= svd(double(B2),0); U2=Us(:,1:R2);
Ini.U2=Us(:,1:nstep);
Ini.B2=B2;
D= permute(A3,[3,2,1]); B3= reshape(D,n3,n1*n2);
[Us, ~, ~]= svd(double(B3),0); U3=Us(:,1:R3);
Ini.U3=Us(:,1:nstep);
Ini.B3=B3;
end
if ir ~= 1;
U1=Ini.U1(:,1:R1); U2=Ini.U2(:,1:R2); U3=Ini.U3(:,1:R3);
B1=Ini.B1; B2=Ini.B2; B3=Ini.B3;
end
%_______ Fase II - ALS Iteration
for k1=1:kmax
Y1= B1*kron(U2,U3); C1=reshape(Y1,n1,R2*R3);
[W, ~, ~] = svd(double(C1), 0);
U1= W(:,1:R1);
Y2= B2*kron(U3,U1); C2=reshape(Y2,n2,R1*R3);
[W, ~, ~] = svd(double(C2), 0);
U2= W(:,1:R2);
Y3= B3*kron(U1,U2); C3=reshape(Y3,n3,R2*R1);
3.1 Super-compression of function-related tensors | 51
%_________subroutine 2 ______________________________________
function f = Tnorm(A)
NS = size(A); nd =length(NS); nsa =1;
for i = 1:nd
nsa = nsa*NS(i);
end
B = reshape(A,1,nsa); f = norm(B);
end
%_________________________________________________________
%_________subroutine 3___________________________________
function A3F=Tuck_2_F(LAM3F,U1,U2,U3)
[R1,R2,R3]=size(LAM3F);
[n1,~]=size(U1); [n2,~]=size(U2); [n3,~]=size(U3);
LAM31=reshape(LAM3F,R1,R2*R3);
CNT1=LAM31'*U1';
CNT2=reshape(CNT1,R2,R3*n1);
CNT3=CNT2'*U2';
CNT4=reshape(CNT3,R3,n1*n2);
CNT5=CNT4'*U3';
A3F=reshape(CNT5,n1,n2,n3);
end
%_________________________________________________________
In the main program, given in Example 3, first, the 3D tensor related to a Slater function
is generated in the rectangular computational box with the grid sizes n1 = 65, n2 = 67,
and n3 = 69 along x-, y-, and z-axis, respectively.
The Tucker tensor decomposition is performed with equal ranks for all three vari-
ables x, y, and z, in a loop starting from a rank equal to one up to a maximum rank
given by the parameter “max_Tr”. Number of ALS iterations is given by the parameter
“kmax”.
The main program prints out the same data as shown in Figure 3.5: the error of
the Tucker decomposition with respect to the Tucker rank in figure (1), displays the
generated tensor in figure (2), and examples of the Tucker vectors for one of the modes
are provided in figure (3).
52 | 3 Rank-structured grid-based representations of functions in ℝd
figure(3)
plot(x,U1(:,1),'Linewidth',2);
hold on;
for j=2:max_Tr-2
plot(x,U1(:,j),'Linewidth',2);
end
set(gca,'fontsize',16);
str3='Tucker vectors';
str2=[str3,' al= ',num2str(al) ', rank = ',num2str(max_Tr)];
title(str2,'fontsize',16); axis tight; grid on; hold off;
%______________________________________________________________
When changing the grid papameter “n”, please note that due to the restrictions for the
HOSVD, the size of the tensor should be not larger that n1 n2 n3 ≤ 1283 .
The following conclusions are the consequences from above numerics [146].
Remark 3.6. The Tucker approximation error for the considered class of function-
related tensors decays exponentially with respect to the Tucker rank.
Remark 3.7. The shape of the orthogonal vectors in the unitary matrices of the Tucker
decomposition for the class of function-related tensors is almost independent on n.
Remark 3.8. The entries of the core tensor of the Tucker decomposition for the con-
sidered function-related tensors decay fast vs. index kℓ = 1, . . . , r, ℓ = 1, 2, 3.
for full format target tensors of size n⊗d . This bound restricts application of this stan-
dard Tucker scheme from multilinear algebra to small dimensions d and moderate
grid sizes n. Thus, we have the computational work for the Tucker decomposition of
the full format tensors in 3D,
wF2T = O(n4 ), (3.22)
which practically restricts the maximum size of the input tensors to n ≈ 2003 for con-
ventional computers. Our goal is to reach linear in a volume complexity O(n3 ) by avoid-
54 | 3 Rank-structured grid-based representations of functions in ℝd
ing the HOSVD transform, thus allowing the maximum size of the input tensors cor-
responding to available computer storage.
The multigrid Tucker tensor decomposition, which gives the way to avoid storage
limitations of the standard Tucker algorithm, was introduced by V. Khoromskaia and
B. Khoromskij in 2008 [174, 146]. The idea of the multilevel Tucker approximation orig-
inates from investigating the numerical examples of the orthogonal Tucker decompo-
sition for the function-related tensors, in particular, the regularity of the orthogonal
Tucker vectors and the weak dependence of their shapes on the univariate grid param-
eter n.
The nonlinear multigrid Tucker tensor approximation problem for minimizing the
functional
A0,M ∈ 𝒮0 ⊂ 𝕍nM : f (A) := ‖A0,m − Am ‖2 → min (3.23)
For a fixed grid parameter n, let us introduce the equidistant tensor grid
ωd,n := ω1 × ω2 × ⋅ ⋅ ⋅ × ωd , (3.24)
anm ,i = f (xim ), im ∈ ℐm .
The algorithm for the multigrid Tucker tensor approximation for full size tensors is
described as follows.
(3b)Starting with the initial guess V (ℓ) (ℓ = 1, . . . , d), perform kmax steps of the ALS
iteration as in Step (2) of Basic Tucker algorithm (see Section 2.2.3).
(4) Compute the core β by the orthogonal projection of A onto 𝕋n = ⨂dℓ=1 𝕋ℓ with
rℓ
𝕋ℓ = span{vν(ℓ) }ν=1 (see Remark 2.6),
T T
β = A ×1 V (1) ×2 ⋅ ⋅ ⋅ ×d V (d) ∈ 𝔹r ,
Figure 3.7, left, shows the numerical example of the multigrid Tucker approxi-
mation to fully populated tensors given by the 3D Slater function e−‖x‖ (x ∈ [−b, b]3 ,
b = 5.0), sampled over large n × n × n uniform grids with n = 128, 256, and 512. The
corresponding computation times (in MATLAB) (sec) of the multigrid Tucker decom-
position algorithm are shown in Figure 3.7, right.
56 | 3 Rank-structured grid-based representations of functions in ℝd
Figure 3.7: Convergence of the multigrid Tucker approximation with respect to the Tucker rank r (left)
and times for the multigrid algorithm (right).
Figure 3.8: Tucker vectors for the Slater potential on the grid with n = 129 (left) and n = 513 (right).
Figure 3.8 shows the shape of the Tucker vectors for the values of the discretization
parameter n = 129 and n = 513.
For testing the programs for multigrid Tucker tensor decomposition, first generate
3D tensors by the program in Example 4, with n = 32, 64, 128, 256, (if storage allows,
also 512). Before starting the program in Example 5, add the subroutines MG, and In-
terpolation, as well as subroutines 2, 3 from the previous example.
Then one can start the program, choosing the parameter MG (MG = 3, 4, or 5 cor-
respond to the largest grid sizes 128, 256, or 512, respectively).
The complexity of the multigrid Tucker approximation by the ALS algorithm applied
to full format tensors is given in the following lemma.
Lemma 3.9 ([174]). Suppose that r 2 ≤ nm for large m. Then the numerical cost of the
multigrid Tucker algorithm is estimated by
Proof. In Step (2), the HOSVD on the coarsest grid level requires O(n40 ) operations
(which for large n = nm is negligible compared to other costs in the algorithm). Next,
for fixed n = nm , the assumption r 2 ≤ n implies that at every step of the ALS iterations
the costs of the consequent contractions to compute the n × r 2 unfolding matrix B(q) is
estimated by O(n3 r + n2 r 2 ), whereas the SVD of B(q) requires O(nr 4 ) operations. Sum-
ming up over the levels completes the proof, taking into account that the Tucker core
is computed in O(n3M r) operations.
for nr=1:nstep
NR=[nr nr nr]; disp(nr);
for im=1:MG
n1=ng*2^(im-1)+1; disp(n1);
b2=b/2; Hunif = b/(n1-1); xcol1=-b2:Hunif:b2;
ycol1=-b2-Hunif:Hunif:b2+Hunif;
zcol1=-b2-2*Hunif:Hunif:b2+2*Hunif;
filename1 = ['A3_' int2str(n1) '_N_Slat.mat'];
58 | 3 Rank-structured grid-based representations of functions in ℝd
load(filename1);
[n1,n2,n3]=size(A3);
if im==1; UC1=zeros(n1,nr); UC2=zeros(n2,nr); UC3=zeros(n3,nr);
save('INTER_COMPS_MG.mat','UC1','UC2','UC3');
else
load INTER_COMPS_MG.mat;
end
Kopt = 0; if im >1; Kopt = 1; end %MG
[U1,U2,U3,LAM3F] = TensR_3sub_OPT_MG(A3,NR,kmax,Kopt,UC1,UC2,UC3);
if im < MG
n11=2*n1-1;
Hunif11 = b/(n11-1); xcol11=-b2:Hunif11:b2;
ycol11=-b2-Hunif11:Hunif11:b2+Hunif11;
zcol11=-b2-2*Hunif11:Hunif11:b2+2*Hunif11;
[UC1,UC2,UC3] = Make_Inter_Vect_xyz(xcol1,xcol11,ycol1,ycol11,...
zcol1,zcol11,n11,U1,U2,U3);
end
save INTER_COMPS_MG.mat UC1 UC2 UC3;
A3F=Tuck_2_F(LAM3F,U1,U2,U3);
err=Tnorm(A3F - A3)/Tnorm(A3);
enr=(Tnorm(A3F) -Tnorm(A3))/Tnorm(A3);
T_error(nr,im)=abs(err);
T_energy(nr,im)=abs(enr);
fprintf(1, '\n iter = %d , err_Fro = %5.4e \n', nr, err);
end
end
figure(20);
for i=1:MG
semilogy(T_error(2:nstep,i),'Linewidth',2,'Marker','square');
hold on;
semilogy(T_energy(2:nstep,i),':','Linewidth',2,'Marker','square');
set(gca,'fontsize',16);
xlabel('Tucker rank','fontsize',16);
ylabel('error','fontsize',16);
grid on; axis tight;
end
%___________________end of main program___________________________________
%_______________subroutine_MG__________________________________
function [U1,U2,U3,LAM3F] = TensR_3sub_OPT_MG(A3,NR,kmax,...
Kopt,UC1,UC2,UC3)
[n1,n2,n3]=size(A3);
3.2 Multigrid Tucker tensor decomposition | 59
%______subroutine Interpolation______
function [U10,U20,U30] = Make_Inter_Vect_xyz(xcol,ixcol,ycol,iycol,...
zcol,izcol,n11,UT1,UT2,UT3)
n12=n11+2; n13=n11+4;
[~,R1]=size(UT1); [~,R2]=size(UT2); [~,R3]=size(UT3);
U10=zeros(n11,R1); U20=zeros(n12,R2); U30=zeros(n13,R3);
for i=1:R1
60 | 3 Rank-structured grid-based representations of functions in ℝd
U10(:,i) = interp1(xcol,UT1(:,i),ixcol,'spline');
end
for i=1:R2
U20(:,i) = interp1(ycol,UT2(:,i),iycol,'spline');
end
for i=1:R3
U30(:,i) = interp1(zcol,UT3(:,i),izcol,'spline');
end
end
%--------------------end of example------------------
Figure 3.9 (top-left) recalls a single Slater function. The corresponding convergence of
the multigrid Tucker approximation error in Frobenius norm for the grids 653 , 1293 ,
and 2573 , respectively, are shown in Figure 3.9 (top-right). Figure 3.9 (bottom-left)
shows the cross-section of a multi-centered Slater potential on an 8 × 8 × 8 lattice and
the corresponding Tucker tensor approximation error for the same grids is shown in
Figure 3.9 (bottom-right).
Inspection of these periodic structures shows that the convergence rate of the
rank-(r, r, r) Tucker approximation practically does not depend on the size of the
lattice-type structure. And accuracies are nearly the same. For example, for the
Tucker rank r = 10, it is exactly 10−5 for all versions of the single/multicentered
Slater function. These properties were first demonstrated on numerical examples of
multicentered Slater function in [146] for L×L×L lattices with L = 10 and L = 16. These
features can be valuable in the grid-based modeling of periodic (or nearly periodic)
structures in the density functional theory. It indicates that the Tucker decomposition
can be helpful in constructing of a small number of problem-adapted basis functions
for large lattice-type clusters of atoms.
Figure 3.10 shows the Tucker vectors of the multi-Slater function for L = 10. Grid
size is 1293 . Next remark, see V. Khoromskaia [146], became a prerequisite for the de-
velopment of powerful methods for summation of the long-range potentials on large
finite 3D lattices [148, 149, 153].
3.2 Multigrid Tucker tensor decomposition | 61
Figure 3.9: Comparison of the decay of the Tucker tensor decomposition error vs. r for a single Slater
function and for Slater functions positioned at 3D lattice nodes.
Remark 3.10. For a fixed approximation error, the Tucker rank of lattice-type struc-
tures practically does not depend on the number of cells included in the computa-
tional box.
62 | 3 Rank-structured grid-based representations of functions in ℝd
Figure 3.11: Convergence of the approximation error for the multi-centered unperturbed (left panel)
and randomly perturbed Slater potential (middle, and right panels).
The Tucker tensor decomposition can be used for measuring the level of noise in a
tensor resulting from finite element calculations [173]. In what follows, we show the
behavior of the approximation error, under random perturbation in the function re-
lated tensor. Figure 3.11 demonstrates such an example for the Slater potential, where
the random complement equals to 1 percent, 0.1 percent, and 0.01 percent of the max-
imum amplitude. It can be seen that the exponential convergence in the Tucker rank is
observed only till the order of the random perturbation. Further increase in the Tucker
rank does not improve the approximation. In some cases it is convenient to use the
Tucker decomposition to estimate the accuracy of finite elements calculations [36].
need to build a full tensor. Instead, the orthogonal basis is computed using only the
directional (side) matrices of the canonical tensor, that consist of skeleton vectors in
every single dimension. The RHOSVD can be considered as the generalization of the
reduced SVD for rank-R matrices (see Section 2.1.5) to higher-order canonical rank-R
tensors. Actually, RHOSVD is considered as the SVD in many dimensions that can be
performed without tensor–matrix unfolding, and it is free of the so-called “curse of
dimensionality”.
Figure 3.12: Representation of the 3D canonical rank-R tensor as contractions (3.27) with side matri-
ces.
64 | 3 Rank-structured grid-based representations of functions in ℝd
Instead of the Tucker transform decomposition of full size tensors that requires the
HOSVD via SVD of full unfolding matrices of size n × nd−1 , it is sufficient to make the
reduced HOSVD based on the SVD of small side matrices U (ℓ) of size n × R, ℓ = 1, . . . , d,
Figure 3.13 shows the SVD step in RHOSVD for the dimension ℓ = 1.
The RHOSVD transform is defined as follows, [174].
Figure 3.13: RHOSVD: truncated SVD of the side matrix U (1) in the C2T transform.
3.3 Reduced higher order SVD and canonical-to-Tucker transform | 65
whereas V0(ℓ) ∈ ℝn×rℓ and W0 (ℓ) ∈ ℝR×rℓ are the respective submatrices of V (ℓ) and W (ℓ)
in SVD of U (ℓ) in (3.28). Then the RHOSVD approximation of A is given by
T T T
A0(r) = ξ ×1 [V0(1) D1,0 W0(1) ] ×2 [V0(2) D2,0 W0(2) ] ⋅ ⋅ ⋅ ×d [V0(d) Dd,0 W0(d) ]. (3.29)
Notice that A0(r) in (3.29) is obtained by the projection of the tensor A onto the
matrices of left singular vectors V0(ℓ) . Using projections of the initial CP tensor A onto
the orthogonal matrices V0(ℓ) , it is possible to construct the single-hole tensor for every
mode of A. For example, if d = 3, the tensor given in (3.27) converts into a contraction
of two orthogonal matrices and the single-hole tensor, that is actually the tensor train
(TT) representation, see Figure 3.14.
The C2T decomposition with RHOSVD was originally developed for reducing the
ranks of the canonical tensor representation of the electron density. Now it is used in
many other applications, for example, in summation of many-particle potentials and
low-rank representation of the radial basis functions. In fact, the RHOSVD can be used
as a first step in multiplicative tensor formats like TT and HT, when the original tensor
is given in the canonical tensor format.
In what follows, we recall Theorem 2.5 from [174] describing the algorithm of
canonical-to-Tucker approximation and proving the error estimate.
R 2
T (1) T (d)
̂ (1) ̂ (d)
[V , . . . , V ] = argmax ∑ ξν (Y (1)
uν ) ⊗ ⋅ ⋅ ⋅ ⊗ (Y (d)
uν ) (3.31)
Y (ℓ) ∈𝒢ℓ
ν=1
𝔹r
T rℓ
Y (ℓ) u(ℓ)
ν ∈ℝ .
66 | 3 Rank-structured grid-based representations of functions in ℝd
and we have the solvability of (3.31), assuming that the above relation is valid.
n×rℓ
The maximizer in (3.31) is given by orthogonal matrices V (ℓ) = [v(ℓ)
1 ⋅ ⋅ ⋅ vrℓ ] ∈ ℝ
(ℓ)
,
which can be computed similarly to the Tucker decomposition for full size tensors,
where the truncated HOSVD at Step (1) is now substituted by RHOSVD; see (3.29).
(c) The minimizer in (3.30) is then calculated by the orthogonal projection
r
A(r) = ∑ μk v(1)
k
⊗ ⋅ ⋅ ⋅ ⊗ v(d)
k
, μk = ⟨v(1)
k
⊗ ⋅ ⋅ ⋅ ⊗ v(d)
k
, A⟩,
1 d 1 d
k=1
so that the core tensor μ = [μk ] can be represented in the rank-R canonical format
R
T T
μ = ∑ ξν (V (1) u(1)
ν ) ⊗ ⋅ ⋅ ⋅ ⊗ (V
(d)
u(d)
ν ) ∈ 𝒞 R,r . (3.32)
ν=1
(d) Let σℓ,1 ≥ σℓ,2 ≥ ⋅ ⋅ ⋅ ≥ σℓ,min(n,R) be the singular values of the ℓ-mode side-matrix
U (ℓ) ∈ ℝn×R (ℓ = 1, . . . , d). Then the RHOSVD approximation A0(r) , as in (3.29), ex-
hibits the error estimate
1/2
d min(n,R) R
0 2
A − A(r) ≤ ‖ξ ‖ ∑ ( ∑ σℓ,k ) , where ‖ξ ‖ = √ ∑ ξν2 . (3.33)
ℓ=1 k=rℓ +1 ν=1
The complexity of the C2T transform for the 3D canonical tensor is estimated by
WC→T = O(nR2 ).
We notice that the error estimate (3.33) in Theorem 3.12 actually provides the control
of the RHOSVD approximation error via the computable ℓ-mode error bounds since,
by the construction, we have
n
(ℓ) (ℓ) 2 2
U − V0 Dℓ,0 W0 F = ∑ σℓ,k , ℓ = 1, . . . , d.
(ℓ)
k=rℓ +1
This result is similar to the well-known error estimate for the HOSVD approximation;
see [61].
In the following, we specify the details of the C2T computational scheme for the case
d = 3. To define the RHOSVD-type rank-r Tucker approximation to the tensor in (2.13),
we set nℓ = n and suppose for definiteness that n ≤ R. Now the SVD of the side-matrix
U (ℓ) is given by
3.3 Reduced higher order SVD and canonical-to-Tucker transform | 67
n
T T
U (ℓ) = V (ℓ) Dℓ W (ℓ) = ∑ σℓ,k v(ℓ)
k
w(ℓ)
k
, v(ℓ)
k
∈ ℝn , w(ℓ)
k
∈ ℝR , (3.34)
k=1
1, 2, 3. Given the rank parameter r = (r1 , r2 , r3 ) with r1 , r2 , r3 < n, we recall the truncated
SVD of the side-matrix
rℓ
T T
U (ℓ) → U0(ℓ) = ∑ σℓ,k v(ℓ)
k
w(ℓ)
k
= V0(ℓ) Dℓ,0 W0(ℓ) , ℓ = 1, 2, 3,
k=1
where Dℓ,0 = diag{σℓ,1 , σℓ,2 , . . . , σℓ,rℓ } and matrices V0(ℓ) ∈ ℝn×rℓ , W0 (ℓ) ∈ ℝR×rℓ represent
the orthogonal factors being the respective sub-matrices in the SVD factors of U (ℓ) .
Based on Theorem 3.12, the corresponding algorithm C2T for the rank-R input data
can be designed. The algorithm Canonical-to-Tucker (for the 3D tensor) includes the
following steps [174]:
nℓ ×R
Input data: Side matrices U (ℓ) = [u(ℓ) 1 . . . uR ] ∈ ℝ
(ℓ)
, ℓ = 1, 2, 3, composed of the
nℓ
vectors uk ∈ ℝ , k = 1, . . . , R, see (2.13); maximal Tucker-rank parameter r; maximal
(ℓ)
Discard the singular vectors in V (ℓ) and the respective singular values up to given rank
threshold, yielding the small orthogonal matrices V0(ℓ) ∈ ℝnℓ ×rℓ , W0(ℓ) ∈ ℝR×rℓ , and
diagonal matrices Dℓ,0 ∈ ℝrℓ ×rℓ , ℓ = 1, 2, 3.
(II) Project the side matrices U (ℓ) onto the orthogonal basis set defined by V0(ℓ)
Figure 3.15 shows that it is exactly the same construction as for the so-called
single-hole tensor B(q) appearing at the ALS step in the Tucker decomposition
algorithm for the full size tensors.2 Here u(1)
k
∈ ℝn1 lives in the physical space for
mode ℓ = 1, whereas u ̃ k ∈ ℝr2 and u
(2)
̃ k ∈ ℝr3 , the column vectors of U
(3) ̃ (2) , and
̃
U , respectively, live in the index sets of V -projections.
(3) (ℓ)
2 But now we are not restricted by the storage for the full size tensor.
68 | 3 Rank-structured grid-based representations of functions in ℝd
– Reshape the tensor B ̃ (1) ∈ ℝn1 ×r2 ×r3 into a matrix MA ∈ ℝn1 ×(r2 r3 ) , representing the
1
span of the optimized subset of mode-1 columns of the partially projected ten-
̃ (1) . Compute the SVD of the matrix MA :
sor B 1
– Implement the single step of the ALS iteration for mode ℓ = 2 and ℓ = 3.
– End of the complete ALS iteration sweep.
– Repeat the complete ALS iteration mmax times at most to obtain the optimized
Tucker orthogonal side matrices Ṽ (1) , V
̃ (2) , V ̃ 3.
̃ (3) , and final projected image B
(IV) Project the final iterated tensor B ̃ 3 in (3.36) using the resultant basis set in V
̃ (3) to
r1 ×r2 ×r3
obtain the core tensor β ∈ ℝ .
Output data: The Tucker core tensor β and the Tucker orthogonal side matrices
̃ (ℓ) , ℓ = 1, 2, 3.
V
In such a way, it is possible to obtain the Tucker decomposition of a canonical
tensor with large mode-size and with rather large ranks, as it may be the case for
electrostatic potentials of biomolecules or the electron densities in electronic struc-
ture calculations. The Canonical-to-Tucker algorithm can be easily modified to use an
ε-truncation stopping criterion. Notice that the maximal canonical rank3 of the core
tensor β does not exceed minℓ (r1 r2 r3 )/rℓ ; see [161].
Our numerical study indicates that in the case of tensors via grid-based represen-
tation of functions describing physical quantities in electronic structure calculations,
the ALS step in the C2T transform is usually not required, that is, the RHOSVD approx-
imation is sufficient.
3 Further optimization of the canonical rank in the small-size core tensor β can be implemented by
applying the ALS iterative scheme in the canonical format, see e. g. [193].
3.3 Reduced higher order SVD and canonical-to-Tucker transform | 69
with exponential scaling in d. In absence of Step (2) (i. e., if RHOSVD provides a sat-
isfactory approximation), the algorithm does not contain iteration loops, and for any
d ≥ 2, it is a finite SVD-based scheme that is free of the curse of dimensionality.
Numerical tests show that Algorithm C2T(𝒞 R,n →𝒯 𝒞R ,r ) is efficient for moderate R
and n; in particular, it works well in electronic structure calculations on 3D Cartesian
grids for moderate grid size n ≲ 103 and for R ≤ 103 . However, in real life applica-
tions the computations may require one-dimension grid sizes in the range nℓ ≲ 3 ⋅ 104
(ℓ = 1, 2, 3) with canonical ranks R ≤ 104 . Therefore, to get rid of a polynomial scaling
in R, n, r for 3D applications one can apply the best Tucker approximation methods
based on the multigrid acceleration of the nonlinear ALS iteration as described in the
following section.
Notice that projected Galerkin discretization method can be applied as well. For fur-
ther constructions, we also need an “accurate” 1D interpolation operator ℐm−1→m from
the coarse to fine grids, acting in each spacial direction. For example, this might be
either the interpolation by piecewise linear or cubic splines.
70 | 3 Rank-structured grid-based representations of functions in ℝd
The idea of the multigrid accelerated best orthogonal Tucker approximation , see
[174], can be described as follows (for 𝒞 R,n initial data):
(1) General multigrid concept. Solving a sequence of nonlinear approximation prob-
lems for A = An as in (2.18) with n = nm := n0 2m , m = 0, 1, . . . , M, corresponding to
a sequence of (d-adic) refined spacial grids ωd,nm . The sequence of approximation
problems is treated successively in one run from coarse-to-fine grid (reminiscent
of the cascadic version of the MG method).
(2) Coarse initial approximation to the side-matrices U (q) . The initial approximation
of U (q) on finer grid ωd,nm is obtained by the linear interpolation from coarser grid
ωd,nm−1 , up to interpolation accuracy O(n−αm ), α > 0.
(3) Most important fibers. We employ the idea of “most important fibers” (MIFs) of
the q-mode unfolding matrices B(q) ∈ ℝn×rq , whose positions are extracted from
the coarser grids. To identify the location of MIFs, the so-called maximum energy
principle is applied as follows:
(3a) On the coarse grid, we calculate a projection of the q-mode unfolding ma-
trix B(q) onto the true q-mode orthogonal subspace Im U (q) = span{u(q) 1 , . . . , urq },
(q)
(3b) Now the maximal energy principle specifies the location of MIFs by finding
pr columns in β(q) with maximal Euclidean norms (supposing that pr ≪ r q ), see
Figures 3.16 and 3.17.4 The positions of MIFs are numbered by the index set ℐq,p
with #ℐq,p = pr, being the subset of the larger index set,
The practical significance of the use the MIFs is justified by the observation that
positions of MIFs5 remain almost independent on the grid parameters n = nm .
(4) Restricted ALS iteration. The proposed choice of MIFs allows to accelerate the ALS
iteration to solving the problem of best rank-r approximation to the large unfold-
ing matrix B(q) ∈ ℝn×rq with dominating second dimension r q = r d−1 (always the
case for large d). This approach allows to reduce the ALS iteration to computation
of the r-dimensional dominating subspace of small n × pr submatrices B(q,p) of
B(q) (q = 1, . . . , d), where p = O(1) is some fixed small parameter.
4 This strategy allows a “blind search” sampling of a fixed portion of q-mode fibers in the Tucker
core that accumulate the maximum part of ℓ2 -energy. The union of selected fibers from every space
dimension (specified by the index set ℐq,p , q = 1, . . . , d) accumulates the most important information
about the structure of the rank R-tensor in the dual space ℝr1 ×⋅⋅⋅×rd . This knowledge reduces the amount
of computational work on fine grids (SVD with matrix-size n × pr instead of n × r q ).
5 It resembles the multidimensional “adaptive cross approximation” (see, e. g., [231] and [87] related
to the 3D case) but now acting on a fixed subset of fibers defined by MIFs.
3.3 Reduced higher order SVD and canonical-to-Tucker transform | 71
Figure 3.16: Illustration for d = 3. Finding MIFs in the “preliminary” core β(q) for q = 1 for the rank-R
initial data on the coarse grid n = n0 = (n1 , n2 , n3 ). B(q) is presented in a tensor form for explanatory
reasons.
Figure 3.17: MIFs: selected projections of the fibers of the “preliminary” cores for computing U (1)
(left), U (2) (middle), and U (3) (right). The example is taken from the multigrid rank compression in the
computation of the Hartree potential for the water molecule with the choice r = 14, p = 4.
Figure 3.18: Linear scaling in R and in n (left). Plot of SVD for the mode-1 matrix unfolding B(1,p) ,
p = 4 (right).
If σmin (B(q,p) ) ≤ ε, then the index set ℐq,p is admissible. If for m = m0 the
approximation criteria above is not satisfactory, then choose p = p + 1 and
repeat steps m = 0, . . . , m0 .
(3c) Determine the orthogonal matrix U (q) ∈ ℝn×r via computing the r-dimensional
dominating subspace for the “restricted” matrix unfolding B(q,p) .
(4) For levels m = m0 + 1, . . . , M, perform the MGA Tucker approximation by ALS iter-
ation as in Steps (3a) and (3c), but now with fixed positions of MIFs specified by
the index set ℐq,p (nm0 ), i. e., by discarding all fibers in B(q) corresponding to the
“less important” index set ℐrq̄ \ ℐq,p .
(5) Compute the rank-R core tensor β ∈ 𝒞 R,r , as in Step (3) of basic algorithm C2T
(𝒞R,n → 𝒯𝒞R ,r ).
O(dRrnM + dp2 r 2 nM )
operations per ALS loop, plus extra cost Wn0 = O(dRn20 ) of the coarse mesh solver C2T
(𝒞 R,n0 →𝒯 𝒞R ,r ). It requires O(drnM + drR) storage to represent the result.
Proof. Step (3a) requires O(drnM ) operations and memory. Notice that for large M, we
have pr ≤ nM . Hence, the complexity of Step (3c) is bounded by O(dRrnM + prnM +
p2 r 2 nM ) per iteration loop, and same for Step (3b). Rank-R representation of β ∈ 𝒞 R,r
3.4 Mixed Tucker-canonical transform | 73
requires O(drRnM ) operations and O(drR)-storage. Summing up these costs over levels
m = 0, . . . , M proves the result.
Theorem 3.14 shows that Algorithm MG-C2T realizes the fast rank reduction
method that scales linearly in d, nM , R, and r. Moreover, the complexity and error
of the multigrid Tucker approximation can be effectively controlled by the tuning of
the governing parameters p, m0 , and n0 .
Figure 3.18 (left) demonstrates linear complexity scaling of the multigrid Tucker
approximation in the input rank R, and in the grid size n (electron density for the CH4
molecule). Figure 3.18 (right) shows the exponentially fast decaying singular values
of the mode-1 matrix unfolding B(1,p) with the choice p = 4, which demonstrates the
reliability of the maximal energy principal in the error control. Similar fast decay of
respective singular values is typical in most of our numerical examples in electronic
structure calculations considered so far.
Definition 3.15 (The mixed two-level Tucker-canonical format). Given the rank param-
eters r, R, we denote by 𝒯 𝒞R,r the subclass of tensors in 𝒯 r,n with the core β represented
in the canonical format, β ∈ 𝒞 R,r ⊂ 𝔹r . An explicit representation of A ∈ 𝒯 𝒞R,r is given
by
R
A = ( ∑ ξν u(1)
ν ⊗ ⋅ ⋅ ⋅ ⊗ uν ) ×1 V
(d) (1)
×2 V (2) ⋅ ⋅ ⋅ ×d V (d) , (3.38)
ν=1
rℓ
with some u(ℓ)
ν ∈ ℝ . Clearly, we have the embedding 𝒯 𝒞R,r ⊂ 𝒞 R,n with the corre-
sponding (non-orthogonal) side-matrices U (ℓ) = [V (ℓ) u(ℓ)
1 ⋅ ⋅ ⋅ V (ℓ) u(ℓ)
R ] and scaling
coefficients ξν (ν = 1, . . . , R).
I II
𝒞 R,n → 𝒯 𝒞R ,r → 𝒯 𝒞 ,r ⊂ 𝒞 R ,n . (3.39)
R
Assume that there exists the best rank-R approximation A(R) ∈ 𝒞 R,n of A, then there is
the best rank-R approximation β(R) ∈ 𝒞 R,r of β, such that
Proof. We present the more detailed proof compared with the sketch in Lemma 2.5,
[173]. Notice that the canonical vectors y(ℓ)
k
of any test element (see (2.13)) in the left-
hand side of (3.40),
R
Z = ∑ λk y(1)
k
⊗ ⋅ ⋅ ⋅ ⊗ y(d)
k
∈ 𝒞 R,n , (3.42)
k=1
rℓ
y(ℓ)
k
= ∑ μ(ℓ) v(ℓ) ,
k,m m
k = 1, . . . , R, ℓ = 1, . . . , d. (3.43)
m=1
Indeed, assuming
rℓ
yk(ℓ) = ∑ μ(ℓ) v(ℓ) + Ek(ℓ)
k,m m
with Ek(ℓ) ⊥ span{v(ℓ)
1 , . . . , vrℓ },
(ℓ)
m=1
we conclude that Ek(ℓ) does not effect the cost function in (3.40) because of the orthog-
onality of V (ℓ) . Hence, setting Ek(ℓ) = 0 and substituting (3.43) into (3.42), we arrive at
the desired Tucker decomposition of Z,
This implies
2
‖A − Z‖2 = (βz − β) ×1 V (1) ×2 ⋅ ⋅ ⋅ ×d V (d) = ‖β − βz ‖2 ≥ min ‖β − μ‖2 .
μ∈𝒞 R,r
r
where u(ℓ)
k
= {μ(ℓ) }ℓ
k,mℓ mℓ =1
∈ ℝrℓ are calculated by using representation (3.43). Now
changing the order of summation, we have
R
A(R) = ∑ λk y(1)
k
⊗ ⋅ ⋅ ⋅ ⊗ y(d)
k
k=1
R r1 rd
= ∑ λk ( ∑ μ(1) v(1) ) ⊗ ⋅ ⋅ ⋅ ⊗ ( ∑ μ(d)
k,m m1
v(d) )
k,m md
1 d
k=1 m1 =1 md =1
r1 rd R d
= ∑ ⋅ ⋅ ⋅ ∑ { ∑ λk ∏ μ(ℓ)
k,m m1 ⊗ ⋅ ⋅ ⋅ ⊗ vmd .
}v(1) (d)
ℓ
m1 =1 md =1 k=1 ℓ=1
‖A − AR ‖ = ‖β − β(R) ‖,
since the ℓ-mode multiplication with orthogonal side matrices V (ℓ) does not change
the cost function. Using the already proven relation (3.40), this indicates that β(R) is
the minimizer in the right-hand side of (3.40).
Lemma 3.16 means that the corresponding low-rank Tucker-canonical approxima-
tion of A ∈ 𝒯 r,n can be reduced to the canonical approximation of a small size core
tensor.
Lemma 3.16 suggests a two-level dimensionality reduction approach that leads to
a sparser data structure compared with the standard Tucker model. Though A(R) ∈ 𝒞 R,n
can be represented in the mixed Tucker-canonical format, its efficient storage depends
on further multilinear operations. In fact, if the resultant tensor is further used in
scalar, Hadamard, or convolution products with canonical tensors, it is better to store
A(R) in the canonical format of the complexity rdn.
The numerics for illustrating the performance of the multigrid canonical-to-
Tucker algorithm will be presented in Section 8.1 describing calculation of the Hartree
potential and Coulomb matrix in the Hartree–Fock equation.
Remark 3.17. The canonical rank of a tensor A ∈ 𝕍n has the upper bound
R ≤ min nℓ . (3.45)
1≤ℓ≤d
Proof. First, consider the case d = 3. Let n1 = max1≤ℓ≤d nℓ for definiteness. We can
represent a tensor A as
n3
A = ∑ Bk ⊗ Zk , Bk ∈ ℝn1 ×n2 , Zk ∈ ℝn3 ,
k=1
rank(Bk ⊗ Zk ) = rank(Bk ) ≤ n2 ,
and we obtain
n3
rank(A) ≤ ∑ rank(Bk ) ≤ n2 n3 = min nℓ .
1≤ℓ≤3
n=1
The next remark shows that the maximal canonical rank of the Tucker core of 3rd-
order tensor can be easily reduced to the value ≤ r 2 by the SVD-based procedure.
Though being not practically attractive for arbitrary high-order tensors, the simple
algorithm described in Remark 3.18 is proved to be useful for the treatment of small
size 3rd-order Tucker core tensors in the rank reduction algorithms described in the
previous sections.
Remark 3.18. There is a simple procedure based on SVD to reduce the canonical rank
of the core tensor β within the accuracy ε > 0. Let d = 3 for the sake of clearness.
78 | 3 Rank-structured grid-based representations of functions in ℝd
r r r r 2
2 ε
‖β − β(R) ‖ ≤ ∑ ‖Bm − Bpm ‖ = ∑ √ ∑ (σk(m) ) ≤ ∑ √r 3 = ε.
m=1 m=1 k =p +1
m
m=1 r
m m
Representation (3.47) is a sum of rank-pm terms, so that the total rank is bounded by
R ≤ p1 + ⋅ ⋅ ⋅ + pr ≤ r 2 .
This approach can be easily extended to arbitrary d ≥ 3 with the bound R ≤ r d−1 .
Figure 3.21 illustrates the canonical decomposition of the core tensor by using the
pm pm
SVD of slices Bm of the core tensor β, yielding matrices Um = {ukm }k=1 , Vm = {vkm }k=1
and a diagonal matrix of small size pm × pm containing the truncated singular values.
It also shows the vector zm = [0, . . . , 0, 1, 0, . . . , 0], containing all entries equal to 0
except 1 at the mth position.
Figure 3.21: Tucker-to-canonical decomposition for a small core tensor, see Remark 3.18.
4 Multiplicative tensor formats in ℝd
4.1 Tensor train format: linear scaling in d
The product-type representation of dth-order tensors, which is called the matrix prod-
uct states (MPS) decomposition in the physical literature, was introduced and success-
fully applied in DMRG quantum computations [302, 294, 293], and, independently, in
quantum molecular dynamics as the multilayer (ML) MCTDH methods [297, 221, 211].
Representations by MPS-type formats in multidimensional problems reduce the com-
plexity of storage to O(dr 2 N), where r is the maximal rank parameter.
In recent years, the various versions of the MPS-type tensor format were discussed
and further investigated in mathematical literature, including the hierarchical dimen-
sion splitting [161], the tensor train (TT) [229, 226], the tensor chain and combined
Tucker-TT [167], the QTT-Tucker [66] formats, and the hierarchical Tucker (HT) repre-
sentation [110], which belongs to the class of ML-MCTDH methods [297], or more gener-
ally tensor network states models. The MPS-type tensor approximation was proved by
extensive numerics to be efficient in high-dimensional electronic/molecular structure
calculations, in molecular dynamics, and in quantum information theory (see survey
papers [293, 138, 169, 264]).
Note that although the multiplicative TT and HT parametrizations formally apply
to any full format tensor in higher dimensions, they become computationally feasible
only when using the RHOSVD-like procedures applied either to the canonical format
input or to tensors already given in the TT form. The HOSVD in MPS-type formats was
discussed in [294, 100, 226].
The TT format that is the particular case of MPS-type factorization in the case
of open boundary conditions, can be defined as follows: for a given rank parameter
r = (r0 , . . . , rd ) and the respective index sets Jℓ = {1, . . . , rℓ } (ℓ = 0, 1, . . . , d) with the
constraint J0 = Jd = {1} (i. e., r0 = rd = 1), the rank-r TT format contains all elements
A = [a(i1 , . . . , id )] ∈ ℝn1 ×⋅⋅⋅×nd that can be represented as the contracted product of
3-tensors over the d-fold product index set 𝒥 := ×dℓ=1 Jℓ such that
Nℓ Nℓ ×rℓ ×rℓ+1
where a(ℓ)
αℓ ,αℓ+1 ∈ ℝ (ℓ = 1, . . . , d), and A
(ℓ)
= [a(ℓ)
αℓ ,αℓ+1 ] ∈ ℝ is the vector-valued
rℓ ×rℓ+1 matrix (3-tensor). Here, and in the following (see Definition 4.3), the rank prod-
uct operation “⋈” is defined as a regular matrix product of the two core vector-valued
matrices, their fibers (blocks) being multiplied by means of tensor product [142]. The
particular entry of A is represented by
r1 rd
a(i1 , . . . , id ) = ∑ ⋅ ⋅ ⋅ ∑ a(1) (2) (d) (1) (2) (d)
α1 (i1 )aα1 ,α2 (i2 ) ⋅ ⋅ ⋅ aαd−1 (id ) ≡ A (i1 )A (i2 ) ⋅ ⋅ ⋅ A (id ),
α1 =1 αd =1
https://doi.org/10.1515/9783110365832-004
80 | 4 Multiplicative tensor formats in ℝd
so that the latter is written in the matrix product form (explaining the notion MPS),
where A(ℓ) (iℓ ) is an rℓ−1 × rℓ matrix.
Example 4.1. Figure 4.1 illustrates the TT representation of a 5th-order tensor; each
particular entry a(i1 , i2 , . . . , i5 ) is presented as a product of five matrices (and vectors)
corresponding to indexes iℓ of three-tensors, iℓ ∈ {1, . . . , nℓ }, ℓ = 1, 2, . . . , 5.
In case J0 = Jd ≠ {1}, we arrive at the more general form of MPS, the so-called tensor
chain (TC) format [167]. In some cases, TC tensor can be represented as a sum of not
more than r∗ TT-tensors (r∗ = min rℓ ), which can be converted to the TT tensor based
on multilinear algebra operations like sum-and-compress. The storage cost for both
TC and TT formats is bounded by O(dr 2 N), r = max rℓ .
Clearly, one and the same tensor might have different ranks in different formats
(and, hence, different number of representation parameters). The next example con-
siders the Tucker and TT representations of a function-related canonical tensor F :=
T(f ), obtained by sampling of the function f (x) = x1 + ⋅ ⋅ ⋅ + xd , x ∈ [0, 1]d , on the Carte-
sian grid of size N ⊗d and specified by N-vectors Xℓ = {ih}Ni=1 (h = 1/N, ℓ = 1, . . . , d),
and all-ones vector 1 ∈ ℝN . The canonical rank of this tensor can be proven to be
exactly d [201].
2 d
F = ∑ bk Vk(1) ⊗ ⋅ ⋅ ⋅ ⊗ Vk(d) , V1(ℓ) = 1, V2(ℓ) = Xℓ , [bk ] ∈ ⨂ ℝ2 .
1 d
k=1 ℓ=1
1 0 1 0 1
F = [X1 1] ⋈ [ ] ⋈ ⋅⋅⋅ ⋈ [ ] ⋈ [ ].
X2 1 Xd−1 1 Xd
The rank-structured tensor formats like canonical, Tucker and MPS/TT-type de-
compositions induce the important concept of canonical, Tucker or matrix product op-
4.1 Tensor train format: linear scaling in d | 81
erators (CO/TO/MPO) acting between two tensor-product Hilbert spaces, each of di-
mension d,
d d
(ℓ)
𝒜 : 𝕏 = ⨂X → 𝕐 = ⨂ Y (ℓ) .
ℓ=1 ℓ=1
R d
(ℓ) (ℓ) (ℓ)
𝒜 = ∑ ⨂ 𝒜α , 𝒜α : X → Y (ℓ) .
α=1 ℓ=1
R RX d
(ℓ) (ℓ)
𝒜X = ∑ ∑ ⨂ 𝒜α xβ ∈ 𝕐.
α=1 β=1 ℓ=1
⋅ A(d−1) (d)
αd−2 αd−1 (id−1 , jd−1 )Aαd−1 (id , jd ). (4.1)
where, in the brackets, we use the standard matrix–vector multiplication. The TT-rank
of Y is bounded by rY ≤ r ⊙ rX , where ⊙ means the standard Hadamard (entry-wise)
product of two vectors.
To describe the index-free operator representation of the TT matrix–vector prod-
uct, we introduce the tensor operation denoted by ⋈∗ that can be viewed as dual to ⋈;
it is defined as the tensor (Kronecker) product of the two corresponding core matrices,
82 | 4 Multiplicative tensor formats in ℝd
their blocks being multiplied by means of a regular matrix product operation. Now,
with the substitution Y(ℓ) = 𝒜(ℓ) ⋈∗ X(ℓ) , the matrix–vector product in TT format takes
the operator form,
AX = (𝒜(1) ⋈∗ X(1) ) ⋈ ⋅ ⋅ ⋅ ⋈ (𝒜(d) ⋈∗ X(d) ).
Δd = A ⊗ IN ⊗ ⋅ ⋅ ⋅ ⊗ IN + IN ⊗ A ⊗ IN ⊗ ⋅ ⋅ ⋅ ⊗ IN + ⋅ ⋅ ⋅ + IN ⊗ IN ⊗ ⋅ ⋅ ⋅ ⊗ A ∈ ℝN
⊗d
×N ⊗d
, (4.2)
where the rank product operation “⋈” in the matrix case is defined as above [142]. The
similar statement is true concerning the Tucker rank, rankTuck (Δd ) = 2.
Application of tensor methods for multidimensional PDEs are reported in [65, 67,
68], [212, 214, 213, 21, 257] and in [182, 188]. The basic mathematical models in quantum
molecular dynamics have been previously described in [210, 211]. Greedy algorithms
for high-dimensional non-symmetric linear problems have been considered in [48].
Basic multilinear algebra operations and solution of linear systems in TT and HT
formats have been addressed [10, 228, 11, 21, 195]. The corresponding theoretical anal-
ysis can be found in [57, 196, 250, 249, 8] and [136, 137, 250]. Some applications of HT
tensor format have been discussed in [261, 262, 242].
Recently the TT and QTT tensor formats were applied in electronic structure cal-
culations for small molecules [240, 239].
where for fixed i, we have Y(j) := x(i), and jν = jν (i) is defined via q-coding, jν −1 = C−1+ν ,
such that the coefficients C−1+ν are found from the q-adic representation of i − 1,
L
i − 1 = C0 + C1 q1 + ⋅ ⋅ ⋅ + CL−1 qL−1 ≡ ∑ (jν − 1)qν−1 .
ν=1
providing log-volume scaling in the size of initial tensor O(N d ). The optimal choice of
the base q is shown to be q = 2 or q = 3 [167]. However, the numerical realizations are
usually implemented by using binary coding, i. e., for q = 2. Figure 4.2 illustrates the
QTT tensor approximation in cases L = 3 and L = 10.
The principal question arises: either there is the rigorous theoretical substantia-
tion of the QTT approximation scheme that establishes it as the new powerful approx-
imation tool applicable to the broad class of data, or this is simply the heuristic alge-
braic procedure that may be efficient in certain numerical examples.
The answer is positive: the power of QTT approximation method is due to the
perfect rank-r decomposition discovered in [165, 167] for the wide-ranging class of
function-related tensors obtained by sampling a continuous functions over uniform
(or properly refined) grid. In particular, we have
– r = 1 for complex exponents;
84 | 4 Multiplicative tensor formats in ℝd
Figure 4.2: Visualizing the QTT tensor approximation in cases L = 3 and L = 10.
The above rank bounds remain valid independently on the vector size N, and they are
applicable to the general case q = 2, 3, . . ..
Approximation of 2d × 2d Laplacian-type matrices using TT tensor decomposition
was introduced in [225].
Notice that the name quantics (or quantized) tensor approximation (with a short-
hand QTT), originally introduced in 2009 [165], is a reminiscent of the entity “quan-
tum of information” that mimics the minimal possible mode size (q = 2 or q = 3) of
the quantized image. Later on, in some publications the QTT approximation method
was renamed as ”vector tensorization” [101, 110].
L p−1 p−1 T
ℱq,L : z → Z = ⨂ [1 zq ⋅⋅⋅ z (q−1)q ] ∈ ℚq,L . (4.3)
p=1
4.3 Low-rank representation of functions in quantized tensor spaces | 85
The number of representation parameters specifying the QTT image is reduced dramat-
ically from N to qL = q log N.
The trigonometric N-vector t = ℑm(z) := {tn = sin(ω(n − 1))}Nn=1 , ω ∈ ℝ, can be
reshaped by the successive q-adic folding
ℱq,L : t → T ∈ ℚq,L
to the q⊗L -tensor T that has both the canonical ℂ-rank and the TT-rank equal exactly
to 2. The number of representation parameters does not exceed 4qL.
Example 4.5. In case q = 2, the single sin-vector has the explicit rank-2 QTT-represen-
tation in {0, 1}⊗L (see [69, 227]) with kp = 2p−L ip − 1, ip ∈ {0, 1},
Example 4.6 illustrates below the uniform bound on the QTT rank for nontrivial highly
oscillating functions. Here and in the following the threshold error like ϵQTT corre-
sponds to the Euclidean norm.
Example 4.6. Highly oscillating and singular functions on [0, A], ω = 100, ϵQTT = 10−6 ,
where the function f3 (x), x ∈ [0, 10], k = 1, . . . , p, p = 16, ak = 0.3 + 0.05(k − 1), is
recognized on three different scales.
Notice that in the following, in all numerical results, we use the average QTT rank
r defined as
1 d−1
r := √ ∑r r . (4.4)
d − 1 k=1 k k+1
The average QTT ranks over all directional ranks for the corresponding functional vec-
tors are given in Table 4.1. The maximum rank over all the fibers is nearly the same as
the average one.
Further examples concerning the low-rank QTT tensor approximation will be pre-
sented in sections related to computation of the two-electron integrals and the sum-
mation of electrostatic potentials over large lattice structured system of particles.
Notice that 1D and 2D numerical quadratures, based on interpolation by Cheby-
shev polynomials, have been developed [120]. Taking into account that Chebyshev
polynomial, sampled on Chebyshev grid, has the exact rank-2 QTT representation [167]
allows us to perform the efficient numerical integration via Chebyshev interpolation
by using the QTT approximation.
In application to multidimensional PDEs, the tensor representation of operators
in quantized spaces is also important. Several results on the QTT approximation of dis-
cretized multidimensional operators (matrices) were presented in [179, 177, 176, 178,
155] and in [142, 66, 67].
Superfast FFT, wavelet and circulant convolution-type data transforms of loga-
rithmic complexity have been introduced [69, 143, 175].
Various applications of the QTT format to the solution of PDEs were reported in
[68, 188, 67, 65, 144, 180, 181].
5 Multidimensional tensor-product convolution
The important prerequisites for the grid-based calculation of the convolution integrals
in ℝd arising in computational quantum chemistry are the multidimensional tensor-
product convolution techniques and the efficient canonical tensor representation of
the Green’s kernels by using the Laplace transform and sinc-quadrature methods.
The tensor-product approximation of multidimensional convolution transform
discretized via collocation-projection scheme on the uniform or composite refined
grids was introduced in 2007 (see [173, 166]). In what follows, we present some of the
results in [166], where the examples of convolving kernels are given by the classical
Newton, Slater (exponential), and Yukawa potentials, 1/‖x‖, e−λ‖x‖ , and e−λ‖x‖ /‖x‖ with
x ∈ ℝd . For piecewise constant elements on the uniform grid of size nd , the quadratic
convergence rate O(h2 ) in the mesh parameter h = 1/n is proved in [166], where it
was also shown that the Richardson extrapolation method on a sequence of grids
improves the order of approximation up to O(h3 ). The fast algorithm of complexity
O(dR1 R2 n log n) is described for tensor-product convolution on the uniform/compos-
ite grids of size nd , where R1 , R2 are the tensor ranks of convolving functions. We also
discuss the tensor-product convolution scheme in the two-level Tucker-canonical
format and discuss the consequent rank reduction strategy. The numerical illustra-
tions confirming the approximation theory for convolution schemes of order O(h2 )
and O(h3 ) can be found in [166]. The linear-logarithmic complexity scaling in n of 1D
discrete convolution on large composite grids and for convolution method on n × n × n
grids in the range n ≤ 16 384 was also demonstrated.
https://doi.org/10.1515/9783110365832-005
88 | 5 Multidimensional tensor-product convolution
d
where c(d) = −2 4−d /Γ(d/2 − 1). This example will be considered in more detail.
There are three commonly used discretization methods for the integral operators
the so-called Nyström, collocation and Galerkin-type schemes. Below, we consider
the case of uniform grids, referring to [166] for complete theory, including the case of
composite grids.
Introduce the equidistant tensor-product lattice ωd := ω1 ×⋅ ⋅ ⋅×ωd of size h = 2A/n
by setting ωℓ := {−A + (k − 1)h : k = 1, . . . , n + 1}, where, for the sake of convenience, n =
2p, p ∈ ℕ, and define the tensor-product index set ℐ := {1, . . . , n}d . Hence Ω = ⋃i∈ℐ Ωi
becomes the union of closed boxes Ωi = ⨂dℓ=1 Ωiℓ specified by segments
where, for the ease of presentation, the evaluation points xj , and the collocation points
yi , i, j ∈ ℐ are assumed to be located on the same cell-centered tensor-product grid
corresponding to ωd . The Nyström-type scheme applies to the continuous functions
f , g, which leads to certain limitations in the case of singular kernels g.
The collocation-projection discretization can be applied to a much more general
class of integral operators than the Nyström methods, including Green’s kernels with
the diagonal singularity, say to the Newton potential g(x) = 1/‖x‖. We consider the
case of tensor-product piecewise constant basis functions {ϕi } associated with ωd , so
that ϕi = χΩi is the characteristic function of Ωi ,
d
ϕi (x) = ∏ ϕiℓ (xℓ ), where ϕiℓ = χΩi . (5.3)
ℓ
ℓ=1
gi = ∫ ϕi (y)g(−y)dy, i ∈ ℐ, (5.5)
ℝd
define the dth-order tensors F = {fi }, G = {gi } ∈ ℝℐ , and introduce the d-dimensional
discrete convolution
where the sum is taken over all i ∈ ℐ , which leads to legal subscripts for gj−i+1 ,
j − i + 1 ∈ ℐ . Specifically, for jℓ = 1, . . . , 2n − 1,
The discrete convolution can be gainfully applied to fast calculation of {wm }m∈ℳ
in the collocation scheme (5.4) as shown in the following statement.
Then we find that elements {wm } coincide with {zj }|j=j0 +m , m ∈ ℳ, j0 = n/2. The general
case d ≥ 1 can be justified by applying the above argument to each spatial variable.
with the choice fi = ⟨f , ϕi ⟩L2 . The Galerkin scheme is known as the most convenient
for theoretical error analysis. However, compared with the collocation method, it has
higher implementation cost because of the presence of double integration. Hence,
90 | 5 Multidimensional tensor-product convolution
classical discretization methods mentioned above may differ from each other by con-
struction of the tensor-product decompositions. To keep a reasonable compromise be-
tween the numerical complexity of the scheme and its generality, in the following we
focus on the collocation method by simple low-order finite elements.
Recall that in the case of piecewise constant basis functions the error bound O(h2 )
for the collocation scheme is proved in [166], whereas the Richardson extrapolation
method on a sequence of grids proved to provide the improved approximation error
O(h3 ). Such an extrapolation, when available, allows a substantial reduction of the
approximation error without extra cost. It is worth noting that the Richardson extrap-
olation can also be applied to some functionals of the convolution product, say to
eigenvalues of the operator that includes the discrete convolution.
the convolution product can be represented in the separable form (cf. [173])
r r
k m k m
F ∗ G := ∑ ∑ βk1 ...kd γm1 ...md (f1 1 ∗ g1 1 ) ⊗ ⋅ ⋅ ⋅ ⊗ (fdd ∗ gd d ). (5.7)
k=1 m=1
k m
Computing 1D convolution fℓℓ ∗gℓ ℓ ∈ ℝ2n−1 in O(n log n) operations leads to the overall
linear-logarithmic complexity in n,
2
𝒩T∗T = O(dr n log n + #β ⋅ #γ).
In general one might have #β ⋅ #γ = O(r 2d ), which may be restrictive even for moder-
ate d.
5.2 Tensor approximation to discrete convolution on uniform grids | 91
r R
k k
F ∗ G = ∑ ∑ βk1 ...kd γm (f1 1 ∗ gm m
1 ) ⊗ ⋅ ⋅ ⋅ ⊗ (fd ∗ gd ).
d
(5.8)
k=1 m=1
However, the calculation by (5.8) still scales exponentially in d, which leads to certain
limitations in the case of higher dimensions.
To get rid of this exponential scaling, it is better to perform the convolution trans-
form using the two-level tensor format, i. e., F ∈ 𝒯 𝒞R ,r (see Definition 3.15) in such
1
a way that the result U = F ∗ G with G ∈ 𝒞 RG is represented in the two-level Tucker
format 𝒯 𝒞R R ,rR . Recall that an explicit representation for F ∈ 𝒯 𝒞R ,r is given by
1 G G 1
R1
F = ( ∑ βν zν1 ⊗ ⋅ ⋅ ⋅ ⊗ zνd ) ×1 F (1) ×2 F (2) ⋅ ⋅ ⋅ ×d F (d) , (5.9)
ν=1
RG R1
F ∗ G = ∑ γm ( ∑ βν zν1 ⊗ ⋅ ⋅ ⋅ ⊗ zνd ) ×1 (F (1) ∗ gm
1 ) ×2 ⋅ ⋅ ⋅ ×d (F
(d)
∗ gm
d ), (5.10)
m=1 ν=1
such that the above expansion can be evaluated by the following algorithm.
columns uk,m 1 m r m
ℓ as Um = [fℓ ∗ gℓ ⋅ ⋅ ⋅ fℓ ∗ gℓ ], all at the cost O(drRG n log n).
(ℓ)
(3) Build the core tensor ω = blockdiag{γ1 β, . . . , γR β} and represent the resultant two-
level Tucker tensor in the form (storage demand is RG + R1 + drR1 + drRG n),
U = ω ×1 U (1) ×2 ⋅ ⋅ ⋅ ×d U (d) ∈ 𝒯 𝒞R R .
1 G ,rRG
In some cases, one may require the consequent rank reduction for the target ten-
sor U to the two-level format 𝒯 𝒞R ,r with moderate rank parameters R0 and r0 =
0 0
(r0 , . . . , r0 ) [166].
92 | 5 Multidimensional tensor-product convolution
If both convolving tensors are given in the canonical format, F ∈ 𝒞 RF with coeffi-
cients βk , k = 1, . . . , RF and G ∈ 𝒞 RG , with coefficients γm , m = 1, . . . , RG , then
RF RG
F ∗ G = ∑ ∑ βk γm (fk1 ∗ gm k m
1 ) ⊗ ⋅ ⋅ ⋅ ⊗ (fd ∗ gd ), (5.11)
k=1 m=1
leading to the reduced cost that scales linearly in dimensionality parameter d and
linear-logarithmically in n,
the product of the interaction potential V(x) with the electron orbitals, V(x)ψ(x), or
by some related terms. In this way, we make an a priori assumption on the existence
of low-rank approximation to the corresponding tensors. In general, this assumption
is not easy to justify. However, it works well in practice.
e−‖x‖ 1
ρ(x) = e−2‖x‖ and V(x)ψ(x) = with V(x) = , x ∈ ℝ3 ,
‖x‖ ‖x‖
Without loss of generality, we introduce one and the same scaling function
for all spatial dimensions ℓ = 1, . . . , d, where h > 0 is the mesh parameter, so that the
corresponding tensor-product basis function ϕi is defined by (5.3).
Using sinc-quadrature methods, [271], we approximate the collocation coefficient
tensor G = [gi ]i∈ℐ in (5.5) via the rank-(2M + 1) canonical decomposition
M
g ≈ ∑ wk E(τk ) with E(τk ) = [ei (τk )], i ∈ ℐ , (5.13)
k=−M
d 2 2
ei (τk ) = ĝ (τk2 ) ∏ ∫ e−yℓ τk ϕiℓ (yℓ )dyℓ . (5.14)
ℓ=1 ℝ
For a class of analytic functions the exponentially fast convergence in M of the above
quadrature can be proven (see [111, 163]). Notice that the quadrature points τk can be
94 | 5 Multidimensional tensor-product convolution
chosen symmetrically, i. e., τk = τ−k , hence reducing the number of terms in (5.13) to
r = M + 1.
In the particular applications in electronic structure calculations, we are inter-
ested in fast convolution with the Newton or Yukawa kernels. In the case of the New-
ton kernel, g(x) = 1/‖x‖, the approximation theory can be found in [111]. In the case
of the Yukawa potential e−κ‖x‖ /‖x‖ for κ ∈ [0, ∞), we apply the generalized Laplace
transform (cf. (5.12))
e−κ√ρ 2
g(ρ) = = ∫ exp(−ρτ2 − κ2 /4τ2 )dτ, (5.15)
√ ρ √π
ℝ+
(3) Agglomerate by a summation all terms supported by Ω(2) \ Ω(1) in one tensor A2
(with maximal rank 3), approximate with the tensor rank r2 ≤ 3, and so on until
we end up with tensor Ap supported by Ω(p) \ Ω(p−1) \ ⋅ ⋅ ⋅ \ Ω(1) .
(4) Approximate the canonical sum A1 + ⋅ ⋅ ⋅ + Ap by a low-rank tensor.
Notice that in the sinc-quadrature approximations most of these “local” terms are
supported by only one point, say by Ω(1) , hence they are all agglomerated in the rank-1
tensor. In approximation of the classical potentials like 1/‖x‖ or e−‖x‖ /‖x‖ the usual
choice is p = 1, 2.
The simple rank recompression procedure described above allows to reduce no-
ticeably the initial rank R = M + 1 appearing in the (symmetric) sinc quadratures. Nu-
merical examples on the corresponding rank reduction by Algorithm 5.3 are depicted
in [163], Figure 2.
Figure 5.1: Tensor rank of the sinc- and recompressed sinc-approximation for 1/‖x‖ (left). Conver-
gence history for the O(h2 ) and O(h3 ) Richardson extrapolated convolution schemes (right).
Figure 5.1 (left) presents the rank parameters obtained from the sinc approximations
of g(x) = 1/‖x‖ up to threshold ε = 0.5 ⋅ 10−6 in max-norm, computed on n × n × n grids
with n = 2L+3 for the level number L = 1, . . . , 8 (upper curve), and the corresponding
values obtained by Algorithm 5.3 with p = 1 (lower curve). One observes the significant
reduction of the tensor rank.
R0 2
M
βk −λk (x−xk )2
f (x) := ∑ ( ∑ cν,k (x − xk ) e ), x ∈ ℝ3 , R0 = 50, M = 4, (5.16)
ν=1 k=1
with xk corresponding to the locations of the C and H atoms. We extract the “principal
exponential” approximation of the electron density, f0 , obtained by setting βk = 0
(k = 1, . . . , R0 ) in (5.16). Using the fast tensor-product convolution method, the Hartree
potential of f0 ,
f0 (y)
VH (x) = ∫ dy, x ∈ Ω = [−A, A]3 ,
‖x − y‖
Ω
is computed with high accuracy on a sequence of uniform n×n×n grids with n = 2p +1,
p = 5, 6, . . . , 12, and A = 9.6. The initial rank of the input tensor F = [f0 (yi )]i∈ℐ , pre-
R (R +1)
sented in the canonical format, is bounded by R ≤ 0 20 (even for simple molecules
it normally amounts about several thousands). The collocation coefficients tensor G
in (5.5) for the Newton kernel is approximated by the sinc-method with the algebraic
rank-recompression described in Algorithm 5.3.
Note that the Hartree potential has slow polynomial decay, i. e.,
1
VH (x) = O( ) as ‖x‖ → ∞.
‖x‖
However, the molecular orbitals decay exponentially. Hence, the accurate tensor ap-
proximation is computed in some smaller box Ω = [−B, B]3 ⊂ Ω, B < A.
In this numerical example the resultant convolution product with the Newton con-
volving kernel can be calculated exactly by using the analytic representation for each
individual Gaussian,
1 α 1
−3/2
2
(e−α‖⋅‖ ∗ )(x) = ( ) erf(√α‖x‖),
‖⋅‖ π ‖x‖
The Hartree potential VH = f0 ∗ 1/‖ ⋅ ‖ attains its maximum value at the origin x = 0
that is VH (0) = 7.19. Figure 5.1 (right) demonstrates the accuracy O(h2 ) of our tensor
approximation and O(h3 ) of the corresponding improved values, due to the Richard-
son extrapolation. Here, the grid-size is given by n = nℓ = 2ℓ+4 for the level number
ℓ = 1, . . . , 7, with the finest grid-size n7 = 2048. It can be seen that beginning from the
level number ℓ = 5 (n5 = 512) the extrapolated scheme already achieves the saturation
5.5 Numerical verification on quantum chemistry data | 97
error 10−6 of the tensor approximation related to the chosen Tucker rank r = 22. This
example demonstrates high accuracy of the Richardson extrapolation.
The numerical results on tensor product approximation of the convolution oper-
ators in the Hartree–Fock equation compared with the commonly used MOLPRO cal-
culations will be presented in the forthcoming Chapter 11.
6 Tensor decomposition for analytic potentials
Methods of separable approximation of the 3D Newton kernel (electrostatic potential
of the Hydrogen atom) using Gaussian sums have been addressed in the chemical and
mathematical literature since [38] and [39, 40]. However, these methods were based on
non-explicit heuristic approaches, not explaining how to derive such Gaussian sums
in an optimal way and with controllable accuracy. A constructive tensor-product ap-
proximation to the multivariate Newton kernel was first proposed in [96, 111] based
on the sinc approximation [271], and then efficiently implemented and analyzed for a
three-dimensional case in [30]. This tensor decomposition has been already success-
fully applied to assembled tensor-based summation of electrostatic potentials on 3D
rectangular lattices invented in [148, 153], and it was one of the basic tools in the con-
struction of the range-separated tensor format introduced in [24].
An alternative method for computation of the convolution transform with the
Newton kernel is based on the direct solution of the Poisson equation. The data-
sparse elliptic operator inverse based on explicit approximation to the Green function
is presented in [159].
2 2
M 2 2
z
p(z) = ∫ a(t)e−t dt ≈ ∑ ak e−tk z for |z| > 0, (6.2)
ℝ+ k=−M
https://doi.org/10.1515/9783110365832-006
100 | 6 Tensor decomposition for analytic potentials
Under the assumption 0 < a ≤ ‖z‖ < ∞, this quadrature can be proven to provide
the exponential convergence rate in M for a class of analytic functions p(z); see [271,
111, 163, 166]. For example, in the particular case p(z) = 1/z, which can be adapted
to the Newton kernel by substitution z = √x12 + x22 + x32 , we apply the Laplace–Gauss
transform
1 2 2 2
= ∫ e−t z dt.
z √π
ℝ+
Now for any fixed x = (x1 , x2 , x3 ) ∈ ℝ3 such that ‖x‖ > 0, we apply the sinc-quad-
rature approximation to obtain the separable expansion
2
M M 3
‖x‖2 2 2 2 2
p(‖x‖) = ∫ a(t)e−t dt ≈ ∑ ak e−tk ‖x‖ = ∑ ak ∏ e−tk xℓ , ak = a(tk ). (6.4)
ℝ+ k=−M k=−M ℓ=1
Under the assumption 0 < a ≤ ‖x‖ ≤ A < ∞, this approximation can be proven to
provide the exponential convergence rate in M:
M
p(‖x‖) − ∑ ak e−tk ‖x‖ ≤ C e−β√M
2 2
Combining (6.1) and (6.4) and taking into account the separability of the Gaussian
functions, we arrive at the separable approximation for each entry of the tensor P,
M 2 2
M 3 2 2
−tk xℓ
pi ≈ ∑ ak ∫ ψi (x)e−tk ‖x‖ dx = ∑ ak ∏ ∫ ψ(ℓ)
i (xℓ )e dxℓ .
ℓ
k=−M k=−M ℓ=1 ℝ
ℝ3
n
nℓ −tk xℓ 2 2
b(ℓ) (tk ) = [b(ℓ) ℓ
i (tk )]i =1 ∈ ℝ with b(ℓ)
i (tk ) = ∫ ψi (xℓ )e
(ℓ)
dxℓ .
ℓ ℓ ℓ ℓ
ℝ
Then the 3rd-order tensor P can be approximated by the R-term canonical representa-
tion
M 3 R
n×n×n
P ≈ PR = ∑ ak ⨂ b(ℓ) (tk ) = ∑ p(1)
q ⊗ pq ⊗ pq ∈ ℝ
(2) (3)
, (6.6)
k=−M ℓ=1 q=1
where R = 2M + 1. For the given threshold ε > 0, M is chosen as the minimal number
such that, in the max-norm,
‖P − PR ‖ ≤ ε‖P‖.
6.1 Grid-based canonical/Tucker representation of the Newton kernel | 101
R r
q }q=1 (left) and Tucker {tk }k=1 (right) tensor rep-
Figure 6.1: Examples of vectors of the canonical {p(1) (1) 1
n
The canonical skeleton vectors are renumbered by k → q = k + M + 1, p(ℓ) q ← pk ∈ ℝ ,
(ℓ)
(q = 1, . . . , R).
In the following, we also consider the Tucker approximation to the 3rd-order ten-
sor P. Given rank parameters r = (r1 , r2 , r3 ), the rank-r Tucker tensor approximating P
is defined by the following parameterization: Tr = [ti1 i2 i3 ] ∈ ℝn×n×n (iℓ ∈ {1, . . . , n}),
r
Tr := ∑ bk t(1)
k
⊗ t(2)
k
⊗ t(3)
k
≡ B ×1 T (1) ×2 T (2) ×3 T (3) , (6.7)
1 2 3
k=1
n×rℓ
where the orthogonal side-matrices T (ℓ) = [t(ℓ) 1 ⋅ ⋅ ⋅ trℓ ] ∈ ℝ
(ℓ)
, ℓ = 1, 2, 3, define the
r1 ×r2 ×r3
set of Tucker vectors, and B ∈ ℝ is the Tucker core tensor. Choose the truncation
error ε > 0 for the canonical approximation PR obtained by the quadrature method,
then compute the best orthogonal Tucker approximation of P with tolerance O(ε) by
applying the canonical-to-Tucker algorithm [174] to the canonical tensor PR → Tr .
The latter algorithm is based on the rank optimization via ALS iteration. The rank pa-
rameter r of the resultant Tucker approximand Tr is minimized subject to the ε-error
control,
‖PR − Tr ‖ ≤ ε‖PR ‖.
Remark 6.1. Since the maximal Tucker rank does not exceed the canonical one, we
apply the approximation results for canonical tensor to derive the exponential con-
vergence in the Tucker rank for a wide class of functions p. This implies the relation
max{rℓ } = O(| log ε|2 ), which can be observed in all numerical tests implemented so
far.
102 | 6 Tensor decomposition for analytic potentials
Table 6.1: CPU times (Matlab) to compute with tolerance ε = 10−6 canonical and Tucker vectors of PR
for the single Newton kernel in a box.
Figure 6.1 displays several skeleton vectors of the canonical and Tucker tensor repre-
R
sentations for a single Newton kernel along the x-axis from a set {p(1)
q }q=1 . Symmetry
of the tensor PR implies that the canonical vectors p(2)
q and pq corresponding to y
(3)
and z-axes, respectively, are of the same shape as p(1) q . It is clearly seen that there are
canonical/Tucker vectors representing the long-, intermediate- and short-range con-
tributions to the total electrostatic potential. This interesting feature will be also rec-
ognized for the low-rank lattice sum of potentials (see Section 14.2).
Table 6.1 presents CPU times (sec) for generating a canonical rank-R tensor ap-
proximation of the single Newton kernel over n×n×n 3D Cartesian grid corresponding
to Matlab implementation on a terminal of the 8 AMD Opteron Dual-Core processor.
The corresponding mesh sizes are given in Angstroms. We observe the logarithmic
scaling of the canonical rank R in the grid size n, whereas the maximal Tucker rank
has the tendency to decrease for larger n. The compression rate related to the grid
73 7683 , which is the ratio n3 /(nR) for the canonical format and n3 /(r 3 + rn) for the
Tucker format, is of orders 108 and 107 , respectively.
Notice that the low-rank canonical/Tucker approximation of the tensor P is the
problem independent task, hence the respective canonical/Tucker vectors can be pre-
computed at once on large enough 3D n × n × n grid, and then stored for the multiple
use. The storage size is bounded by Rn or rn + r 3 in the case of canonical and Tucker
formats, respectively.
12 6
σ σ
Lennard-Jones potential: p(‖x‖) = 4ϵ[( ) −( ) ].
‖x‖ ‖x‖
The electrostatic potential energy for the dipole–dipole interaction due to Van der
Waals forces is defined by
C0
Dipole–dipole interaction energy: p(‖x‖) = .
‖x‖3
√κ
e−2√κρ = ∫ t −3/2 e−κ/t e−ρt dt, (6.8)
√π
ℝ+
e−κ√ρ 2 2 2 2
= ∫ e−κ /t e−ρt dt, (6.9)
√ρ √π
ℝ+
1 2 2
= ∫ e−ρt dt, (6.10)
√ρ √π
ℝ+
1 1
= ∫ t n−1 e−ρt dt, n = 1, 2, . . . . (6.11)
ρn (n − 1)!
ℝ+
Remark 6.2. The idea behind the low-rank tensor representation for a sum of spheri-
cally symmetric potentials on a 3D lattice can be already recognized on the continuous
level by introducing the Laplace transform of the generating kernel. For example, in
representation (6.9) with the particular choice κ = 0, given by (6.10), we can set up
104 | 6 Tensor decomposition for analytic potentials
ρ = x12 + x22 + x32 , i. e., p(‖x‖) = 1/‖x‖ (1 ≤ xℓ < ∞), and apply the sinc-quadrature
approximation as in (6.2)–(6.3),
M
2 2 2 2 2
p(z) = ∫ e−t z dt ≈ ∑ ak e−tk z for |z| > 0. (6.12)
√π k=−M
ℝ+
L
1
ΣL (x) = ∑
i1 ,i2 ,i3 =1 √(x1 + i1 b)2 + (x2 + i2 b)2 + (x3 + i3 b)2
L
2 2 2 2 2
ΣL (x) = ∫ [ ∑ e−[(x1 +i1 b) +(x2 +i2 b) +(x3 +i3 b) ]t ]dt
√π i ,i ,i =1
ℝ+ 1 2 3
L L L
2 2 2 2
= ∫ ∑ e−(x1 +i1 b) t ∑ e−(x2 +i2 b) t ∑ e−(x3 +i3 b) t dt, (6.13)
√π i =1 i =1 i =1
ℝ+ 1 2 3
where the integrand is separable. Representation (6.13) indicates that applying the
same quadrature approximation to the lattice sum integral (6.13) as that for the single
kernel (6.12) leads to the decomposition of the total sum of potentials with the same
canonical rank as for the single one.
ℋe Ψ = EΨ, (7.1)
which describes the energy of an N-electron molecular system in the framework of the
so-called Born–Oppenheimer approximation, implying a system with clapped nuclei.
Here, M is the number of nuclei, ZA are nuclei charges located at the distinct points aA ,
A = 1, . . . , M. Since the nuclei are much heavier than electrons, and their motion is
much slower, the nuclei and electronic parts of the energy can be considered sepa-
rately. Thus, the electronic Schrödinger equation specifies the energy of a molecular
system at a fixed nuclei geometry. The Hamiltonian (7.2) includes the kinetic energy of
electrons, the potential energy of the interaction between nuclei and electrons, and
the electron correlation energy. The electronic Schrödinger equation is a multidimen-
sional problem in ℝ3N , and it is computationally unfeasible except for the simple Hy-
drogen or Hydrogen-like atoms.
The Hartree–Fock equation is a 3D eigenvalue problem in space variables ob-
tained as a result of the minimization of the energy functional for the electronic
Schrödinger equation [277, 128]. The underlying condition for the wavefunction is
that it should be a single Slater determinant containing the products of molecular
https://doi.org/10.1515/9783110365832-007
106 | 7 The Hartree–Fock equation
∫ φi φj = δij , i = 1, . . . , Norb , x ∈ ℝ3 ,
ℝ3
ℱ = Hc + VH − 𝒦. (7.4)
The core Hamiltonian part Hc of the Fock operator consists of the kinetic energy of
electrons specified by the Laplace operator and the nuclear potential energy of inter-
action of electrons and nuclei,
M
1 ZA
Hc (x) = − Δ − ∑ , ZA > 0, x, aA ∈ ℝ3 , (7.5)
2 A=1
‖x − aA ‖
7.3 The standard Galerkin scheme for the Hartree–Fock equation | 107
where M is the number of nuclei in a molecule, and ZA and aA are their charges and
positions, respectively. Here,
M
ZA
Vc (x) = − ∑
A=1
‖x − aA ‖
is the nuclear potential operator. The electron correlation parts of the Fock operator
are described by the Hartree potential
ρ(y)
VH (x) := ∫ dy (7.6)
‖x − y‖
ℝ3
Norb
2
ρ(y) = 2 ∑ (φi (y)) , x, y ∈ ℝ3 , (7.7)
i=1
Norb
τ(x, y)
(𝒦φ)(x) := ∫ φ(y)dy, τ(x, y) = ∑ φi (x)φi (y), x ∈ ℝ3 , (7.8)
‖x − y‖ i=1
ℝ3
where τ(x, y) is the density matrix. Since both operators VH and 𝒦 depend on the so-
lution of the eigenvalue problem (7.3), the nonlinear Hartree–Fock equation is solved
iteratively by using self-consistent field (SCF) iteration [238, 44].
The Hartree–Fock model is often called a mean-field approximation, since the en-
ergy of electrons in a molecule is computed with respect to the mean field created by
all electrons in a molecular system, including the target electrons.
Nb
φi (x) = ∑ ciμ gμ (x), i = 1, . . . , Norb , x ∈ ℝ3 , (7.9)
μ=1
which yields the system of nonlinear equations for the coefficients matrix C = {ciμ } ∈
ℝNorb ×Nb (and the density matrix D = 2CC ∗ ∈ ℝNb ×Nb ),
where S = {sμν } is the overlap matrix for the chosen Galerkin basis, where sμν =
∫ℝ3 gμ gν dx. The Galerkin counterpart of the Fock operator
includes the core Hamiltonian H discretizing the Laplacian and the nuclear potential
operators (7.5), and the matrices J(C) and K(C) corresponding to the Galerkin projec-
tions of the operators VH and 𝒦, respectively.
In this way, one can precompute the one-electron integrals in the core Hamilto-
Nb
nian H = {hμν }μ,ν=1 ,
1
hμν = ∫ ∇gμ ⋅ ∇gν dx + ∫ Vc (x)gμ gν dx 1 ≤ μ, ν ≤ Nb , (7.12)
2
ℝ3 ℝ3
and the so-called two-electron integrals (TEI) tensor, also known as electron repulsion
integrals,
since they depend only on the choice of the basis functions in (7.9).
Then, the solution is sought by the self-consistent fields (SCF) iteration using the
core Hamiltonian H as the initial guess, and by updating the Coulomb
Nb
J(C)μν = ∑ bμν,κλ Dκλ , (7.14)
κ,λ=1
at every iteration step. The direct inversion of iterative subspaces (DIIS) method, in-
troduced in 1982 by Pulay [238], provides stable convergence of iteration. The DIIS
method is based on defining the weights of the previous solutions to be used as the
initial guess for the current step of iteration.
Finally, the Hartree–Fock energy (or electronic energy, [277]) is computed as
Norb Norb
EHF = 2 ∑ λi − ∑ (̃Ji − K
̃i ),
i=1 i=1
where
and
K
̃i = (φi , Kφi ) 2 = ⟨Ci , KCi ⟩,
L i = 1, . . . , Norb ,
are the Coulomb and exchange integrals in the basis of Hartree–Fock orbitals φi .
Given the geometry of nuclei, the resulting ground state energy E0 of the molecule
is defined by
In the next chapters, we describe the two ab initio Hartree–Fock solvers using the
tensor-structured grid-based calculation of all quantities involved, including the rank-
structured calculation of the core Hamiltonian introduced in [156]. We briefly summa-
rize the basic approaches as follows:
– In Section 8, we describe the multilevel Hartree–Fock solver using the nontradi-
tional concept for the numerical solution of the eigenvalue problem, which avoids
the computation of TEI. Instead, it employes the grid-based rank-structured com-
putation of the Coulomb and exchange matrices on the fly in the course of SCF
iterations. This solver was introduced in [174, 145, 146, 187]. Though this approach
eliminates the requirement for the challenging computation of the two-electron
integrals, it exhibits time limitations in the loops for computation of the exchange
operator 𝒦. Therefore, its MATLAB implementation is non-competitive with the
standard packages based on the analytical calculations. However, it may be good
for parallel implementations.
– In Section 11, we present the fast TESC1 Hartree–Fock solver2 introduced in
[157, 147], which is comparable in time and accuracy (in MATLAB implemen-
tation) with the benchmark packages. It is based on efficient rank-structured
calculation of the TEI tensor in a factorized form by using an algebraic “1D den-
sity fitting” scheme and the truncated Cholesky decomposition algorithm. Due
to the rank-structured representation of the two-electron integrals, this tensor-
based solver proved to be attractive as a starting point for computation of the
excitation energies of molecules.
Note that the grid-based approaches are not restricted to Gaussian-type basis func-
tions and may be applied for the construction of new well-separable grid-based basis
functions, for example, combination of Gaussians with plane waves or/and the Slater-
type orbitals.
https://doi.org/10.1515/9783110365832-008
112 | 8 Multilevel grid-based tensor-structured HF solver
For the multilevel Hartree–Fock solver [146, 187], fast and accurate evaluation of the
Galerkin matrices J(D) and K(D) is based on a certain reorganization of the standard
iterative computational scheme for the eigenvalue problem (7.10) given in Section 7.3.
Specifically, instead of precomputing the full set of two-electron integrals bμν,κλ and
computation of matrices in (7.14) and (7.15) employing the updated elements of the
density matrix D, we use the explicit integral representations for J(D) and K(D). In
particular, the Galerkin representation of the Hartree operator (the Coulomb matrix)
is now calculated by the grid-based quadrature integration of
In turn, as proposed in [145], we represent the matrix entries of the exchange operator
K(D) by the following loops. For a = 1, . . . , Norb , compute the convolution integrals
N
gν (y) ∑κ=1
b
Cκa gκ (y)
Waν (x) = ∫ dy, ν = 1, . . . , Nb , (8.3)
‖x − y‖
ℝ3
Finally, the entries of the exchange matrix are given by sums over all orbitals,
Norb
K(C)μν = ∑ Kμν,a , μ, ν = 1, . . . , Nb . (8.5)
a=1
with the mesh-size h = 2b/n. Define the set of piecewise constant basis functions {ϕi },
i ∈ ℐ := {1, . . . , n}3 , associated with the respective grid-cells in ω3,n (indicator func-
tions), and the corresponding set {χj }, j ∈ 𝒥 := {1, . . . , n − 1}3 , of tensor-product con-
tinuous piecewise linear polynomials in each spacial variable. We denote the corre-
sponding finite element spaces as
Now the basis set {gμ } is supposed to satisfy the following properties:
– The Galerkin approximation error over the reduced basis set {gμ } is physically ad-
missible, presupposing the sufficient approximation quality.
– Each basis function gμ (x) ∈ H01 (Ω) can be represented (approximated) by the
RG -term separable expansion in x = (x1 , x2 , x3 ) with moderate number of terms RG ,
RG
gμ (x) = ∑ gμ,k
(1) (2)
(x1 )gμ,k (3)
(x2 )gμ,k (x3 ), μ = 1, . . . , Nb . (8.8)
k=1
– Using the orthogonal Tucker vectors computed for simplified problems, whose
Tucker rank is supposed to be weakly dependent on the particular molecule and
the grid parameters [173, 174, 186].
All these concepts still require further theoretical and numerical analysis.
The main advantage of the low tensor rank approximating basis sets is the linear
scaling of the resultant algorithms in the univariate grid size n, which already allows
employing huge n×n×n-grids in ℝ3 (specifically, n ≤ 2⋅104 for the current computations
in the framework of the multilevel Hartree–Fock solver). This could be beneficial in the
FEM-DFT computations applied to large molecular clusters.
8.1.3 Tensor computation of the Galerkin integrals in matrices J(D) and K (D)
The beneficial feature of our method is that functions and operators involved in the
computational scheme for the Coulomb and exchange matrices (8.1)–(8.5) are effi-
ciently evaluated using (approximate) low-rank tensor-product representations in the
discretized basis sets {Gμ } and {Xμ } at the expense that scales linear-logarithmic in n,
O(n log n).
To that end, we introduce some interpolation/prolongation operators intercon-
necting the continuous functions on Ω and their discrete representation on the grid via
the coefficient tensors in ℝℐ (or in ℝ𝒥 ). Note that the coefficient space of tri-tensors in
𝕍n = ℝℐ := V1 ⊗ V2 ⊗ V3
where {yi } is the set of cell-centered points with respect to the grid ω3,n . Furthermore,
for functions f ∈ L2 (Ω), we define the L2 -projection by
Using the discrete representations as above, we are able to rewrite all functional
and integral transforms in (8.1)–(8.5) in terms of tensor operations in 𝕍n . In particular,
for the continuous targets, the function-times-function and the L2 -scalar product can
be discretized by tensor operations as
f ⋅ g → F ⊙ G ∈ 𝕍n and ⟨f , g⟩ → h3 ⟨F, G⟩
with
F = 𝒫C (f ), G = 𝒫C (g),
f ∗ g → F ∗T G ∈ 𝕍n , with F = 𝒫C (f ) ∈ 𝕍n , G = 𝒫0 (g) ∈ 𝕍n ,
where the tensor operation ∗T stands for the tensor-structured convolution transform
in 𝕍n described in [166] (see also [186, 174] for application of fast ∗T transform in elec-
tronic structure calculations). We notice that under certain assumptions on the regu-
larity of the input functions (see Section 5) the tensor product convolution ∗T can be
proven to provide an approximation error of order O(h2 ), whereas the two-grid version
via the Richardson extrapolation leads to the improved error bound O(h3 ) (cf. [166]).
Tensor-structured calculation of the multidimensional convolution integral operators
with the Newton kernel have been introduced and implemented in [174, 187, 145], see
also [108].
Representations (8.1)–(8.2) for the Coulomb operator can be now rewritten (ap-
proximately) in terms of the discretized basis functions by using tensor operations:
Norb Nb
ρ ≈ Θ := ∑ ( ∑ Cκa Cλa Gκ ⊙ Gλ ), where Gκ = 𝒫C (gκ ),
a=1 κ,λ=1
implying
1
VH = ρ ∗ g ≈ Θ ∗T PN , where PN = 𝒫0 (g), g = , (8.11)
‖⋅‖
with PN ∈ 𝕍n being the collocation tensor for the Coulomb potential. This implies the
tensor representation of the Coulomb matrix,
The separability property of basis functions ensures that rank(Gμ ) ≤ RG , whereas ten-
sors Θ and PN are to be approximated by low-rank tensors. Hence, in our method,
the corresponding tensor operations are implemented using fast multilinear algebra
equipped with the corresponding rank optimization (tensor truncation) [173, 174, 186].
8.2 Numerics on three-dimensional convolution operators | 117
The numerical examples of other rank decompositions to electron density (not in-
cluding the calculation of the three-dimensional convolution operator) have been pre-
sented in [52, 81]. The tensor product convolution was introduced in [173, 174] and also
discussed in [108, 109, 166].
Likewise, tensor representations (8.3)–(8.5) for the exchange operator realized in
[145] now look as follows:
Nb
Waν ≈ ϒaν := [Gν ⊙ ∑ Cκa ⊙ Gκ ] ∗T PN , ν = 1, . . . , Nb , (8.13)
κ=1
finally providing the entries of the exchange matrix by summation over all orbitals
Norb
K(D)μν = ∑ χμν,a , μ, ν = 1, . . . , Nb . (8.15)
a=1
Again, the auxiliary tensors and respective algebraic operations have to be imple-
mented with the truncation to low-rank tensor formats.
where pℓ,k = 0, 1, . . . is the polynomial degree, and the points (A1,k , A2,k , A3,k ) ∈ ℝ3
specify the positions of nuclei in a molecule.
The molecule is embedded in a certain fixed computational box Ω = [−b, b]3 ∈ ℝ3 ,
as in Figure 11.1.1 For a given discretization parameter n ∈ ℕ, we use the equidistant
n × n × n tensor grid ω3,n = {xi }, i ∈ ℐ := {1, . . . , n}3 , with the mesh-size h = 2b/(n + 1).
1 In the case of small to moderate size molecules, usually, we use the computational box of size
403 bohr.
118 | 8 Multilevel grid-based tensor-structured HF solver
Figure 8.1: Approximation of the Gaussian-type basis function by a piecewise constant function.
The Gaussian-type basis functions are used for the representation of orbitals (8.9). In
calculations of integral terms, the separable type basis functions gk (x), x ∈ ℝ3 are
approximated by sampling their values at the centers of discretization intervals, as in
Figure 8.1, using the product of univariate piecewise constant basis functions gk (x) ≈
g k (x) = ∏3ℓ=1 g (ℓ)
k (x ), ℓ = 1, 2, 3, yielding their rank-1 tensor representation,
(ℓ)
gk → Gk = g(1)
k
⊗ g(2)
k
⊗ g(3)
k
∈ ℝn×n×n , k = 1, . . . , Nb . (8.16)
ρ(y)
VH (x) := ∫ dy
‖x − y‖
ℝ3
we use the discrete tensor representation of basis functions (8.16). Then the electron
density is approximated by using 1D Hadamard products of skeleton vectors in rank-1
tensors (instead of product of Gaussians)
Norb Nb Nb
n×n×n
ρ ≈ Θ = 2 ∑ ∑ ∑ ca,m ca,k (g(1)
k
⊙ g(1)
m ) ⊗ (gk ⊙ gm ) ⊗ (gk ⊙ gm ) ∈ ℝ
(2) (2) (3) (3)
.
a=1 k=1 m=1
1
Further, the representation of the Newton convolving kernel ‖x−y‖
by a canonical
rank-RN tensor [30] is used (see Section 6.1 for details):
RN
n×n×n
PN → PR = ∑ p(1)
q ⊗ pq ⊗ pq ∈ ℝ
(2) (3)
. (8.17)
q=1
Since large ranks make tensor operations inefficient, the multigrid canonical-to-
Tucker and Tucker-to-canonical algorithms (see Sections 3.3.3 and 3.5) should be
8.2 Numerics on three-dimensional convolution operators | 119
j=1 q=1
Finally, the entries of the Coulomb matrix Jkm are computed by 1D scalar products of
the canonical vectors of VH with the Hadamard products of the rank-1 tensors repre-
senting the Galerkin basis:
Jkm ≈ ⟨Gk ⊙ Gm , VH ⟩, k, m = 1, . . . Nb .
The cost of 3D tensor product convolution is O(n log n) instead of O(n3 log n) for the
standard benchmark 3D convolution using the 3D FFT. Table 8.1 shows CPU times (sec)
for the Matlab computation of VH for H2 O molecule [174] on a SUN station using a
cluster with 4 Intel Xeon E7-8837/32 cores/2.67 GHz and 1024 GB storage (times for 3D
FFT for n ≥ 4096 are obtained by extrapolation). It is easy to notice cubic scaling
of the 3D FFT time in dyadic increasing of the grid size n and approximately linear-
logarithmic scaling for 3D convolution on the same grids (see C ∗ C row). C2T shows
the time for the canonical-to-Tucker rank reduction.
Following [166], we apply the Richardson extrapolation technique (see [218]) to
obtain higher accuracy approximations of order O(h3 ) without extra computational
cost. The numerical gain of using an extrapolated solution is achieved due to the fact
that the approximation error O(h3 ) on the single grid would require the univariate grid
size n1 = n3/2 ≫n. The corresponding Richardson extrapolant VH,Rich
(n)
approximating
VH (x) over a pair of nested grids ω3,n and ω3,2n , and defined on the “coarse” n⊗3 -grid,
is given by
VH,Rich
(n)
= (4 ⋅ VH(2n) − VH(n) )/3 in the grid-points on ω3,n .
The next numerical results show the accuracy of the tensor-based calculations using
n×n×n 3D Cartesian grids with respect to the corresponding output from the MOLPRO
package [299].
Table 8.1: Times (sec) for the 3D tensor product convolution vs. convolution by 3D FFT in computation
of VH for H2 O molecule.
Figure 8.2: Left: Absolute error in tensor computation of the Coulomb matrix for CH4 and C2 H6
molecules.
Figure 8.3: Left: Absolute approximation error (blue line: ≈10−6 au) in the tensor-product computa-
tion of the Hartree potential of C2 H6 , measured in the grid line Ω = [−5, 7] × {0} × {0}. Right: Times
versus n in MATLAB for computation of VH for C2 H6 molecule.
Figure 8.2 demonstrates the accuracy (∼10−5 ) of the calculation of the Coulomb matrix
for CH4 and C2 H6 molecules using the Richardson extrapolation on a sequence of grids
ω3,n with n = 4096 and n = 8192.
Figure 8.3 (left) shows the accuracy in calculation of the Hartree potential (in com-
parison with the benchmark calculations from MOLPRO) for the C2 H6 molecule com-
puted on n × n × n grids of size n = 4096 and n = 8192 (dashed lines). The solid line in
Figure 8.3 shows the accuracy of the Richardson extrapolation of the results from two
grids of size n = 4096 and n = 8192. One can observe essential improvement of accu-
racy for the Richardson extrapolation. Figure 8.3 (right) shows the CPU times versus n
in MATLAB indicating the linear complexity scaling in the univariate grid size n. See
also Figure 8.4 illustrating accuracy for the exchange matrix K = Kex .
In a similar way, the algorithm for 3D grid-based tensor-structured calculation
of 6D integrals in the exchange potential operator was introduced in [145], Kkm =
8.3 Multilevel rank-truncated self-consistent field iteration | 121
Figure 8.4: L∞ -error in Kex = K for the density of H2 O and pseudodensity of CH3 OH.
N
orb
∑a=1 Kkm,a with
φa (x)φa (y)
Kkm,a := ∫ ∫ gk (x) gm (y)dxdy, k, m = 1, . . . Nb .
|x − y|
ℝ3 ℝ3
The contribution from the ath orbital are approximated by tensor anzats,
Nb Nb
Kkm,a ≈ ⟨Gk ⊙ [ ∑ cμa Gμ ], [Gm ⊙ ∑ cνa Gν ] ∗ PR ⟩.
μ=1 ν=1
Here, the tensor product convolution is first calculated for each ath orbital, and then
scalar products in canonical format yield the contributions to entries of the exchange
Galerkin matrix from the a-th orbital. The algorithm for tensor calculation of the ex-
change matrix is described in detail in [145].
These algorithms were introduced in the first tensor-structured Hartree–Fock
solver using 3D grid-based evaluation of the Coulomb and exchange matrices in 1D
complexity at every step of self-consistent field (SCF) iteration [146, 187].
and the Nb × Norb matrices Ck+1 contain the respective Norb orthonormal eigenvectors
Nb ×Nb
u1 , . . . , uNorb . We denote by C
̃
k+1 ∈ ℝ the matrix representing the full set of orthog-
onal eigenvectors in (8.19).
We use the particular choice of F̃k , k = 0, 1, . . ., via the DIIS-algorithm (cf. [238]),
with the starting value F̃0 = F(C0 ) = H, where the matrix H corresponds to the core
Hamiltonian.
In [146, 187] a modification to the standard DIIS iteration was proposed by carrying
out the iteration on a sequence of successively refined grids with the grid-dependent
stopping criteria. The multilevel implementation provides robust convergence from
the zero initial guess for the Hartree and exchange operators. The coarse-to-fine grids
iteration, in turn, accelerates the solution process dramatically due to low cost of the
coarse grid calculations.
The principal feature of the tensor-truncated iteration is revealed on the fast up-
date of the Fock matrix F(C) by using tensor-product multilinear algebra of 3-tensors
accomplished with the rank truncation. Moreover, the multilevel implementation pro-
vides a simple scheme for constructing good initial guess on the fine grid-levels.
For each fixed discretization, we use the original version of DIIS scheme (cf. [128]),
defined by the following choice of the residual error vectors (matrices):
̃ T F(C )C
Ei := [C ̃ ∈ ℝNorb ×(Nb −Norb ) (8.20)
i+1 i i+1 ]|{1≤μ≤N orb ;Norb +1≤ν≤Nb }
for iteration number i = 0, 1, . . . , k, which should vanish on the exact solutions of the
Hartree–Fock Galerkin equation due to the orthogonality property. Hence, some stop-
ping criterion applies to residual error vector Ei for i = 0, 1, 2, . . .. Here the subindexes μ
and ν specify the relevant range of entries in the coefficients for molecular orbitals C ̃ .
i+1
The minimizing coefficient vector c̃ := (c0 , . . . , ck )T ∈ ℝk+1 is computed by solv-
ing the constrained quadratic minimization problem for the respective cost functional
(the averaged residual error vector over previous iterands):
2
1 k k
1
f (c̃) := ∑ ci Ei ≡ ⟨Bc̃, c̃⟩ → min, provided that ∑ ci = 1,
2 i=0
F 2 i=0
8.3 Multilevel rank-truncated self-consistent field iteration | 123
where
where 1 = (1, . . . , 1)T ∈ ℝk+1 , which leads to the linear augmented system of equations
Bc̃ − ξ 1 = 0, (8.21)
⟨1, c̃⟩ = 1.
k−1
F̃k = ∑ ciopt F̃i + ckopt F(Ck ), k = 0, 1, 2, . . . , (8.22)
i=0
where the minimizing coefficients ciopt = c̃i (i = 0, 1, . . . , k) solve the linear system
(8.21). For k = 0, the first sum in (8.22) is assumed to be zero, hence providing c0opt = 1
and F̃0 = F(C0 ).
Recall that if the stopping criterion on Ck , k = 1, . . ., is not satisfied, then one
updates F̃k by (8.22) and solves the eigenvalue problem (8.18) for Ck+1 .
Note that in practice one can use the averaged residual vector only on a reduced
subsequence of iterands, Ek , Ek−1 , . . . , Ek−k0 , k − k0 > 0. In our numerical examples
below, we usually set k0 = 4.
In this section, we describe the resultant numerical algorithm. Recall that the discrete
nonlinear Fock operator is specified by a matrix
where H corresponds to the core Hamiltonian (fixed in our scheme), and the discrete
Hartree and exchange operators are given by tensor representations (8.12) and (8.4),
respectively.
First, we describe the unigrid tensor-truncated DIIS scheme [146, 187].
‖Ck+1 − Ck ‖F ≤ ε.
(2) For p = 0, apply the unigrid Algorithm U_DIIS with n = n0 , εp = ε0 , and re-
turn the number of iterations k0 , matrix Ck0 +1 , and a sequence of Fock matrices
F̃0 , F̃1 , . . . , F̃k0 .
(3) For p = 1, . . . , M, apply successively Algorithm U_DIIS(kp−1 + 1), with the input
parameters np := 2p n0 , εp := ε0 2−2p , Ckp−1 +1 . Keep continuous numbering of the
DIIS iterations through all levels such that the maximal iteration number at level
p is given by
p
kp = ∑ mp
p=0
Assume that the number of multigrid DIIS iterations at each level is bounded by the con-
stant I0 . Then the total cost of Algorithm M_DIIS does not exceed the double cost at the
finest level n = nM , 2WM = O(I0 Nb3 r0 Norb n).
N
Proof. The rank bound rank(Gk ) = 1 implies rank(∑m=1
b
cma Gm ) ≤ Nb . Hence, the nu-
merical cost to compute the tensor-product convolution ϒaν in (8.3) amounts to
Since the initial canonical rank of ϒaν is estimated by rank(ϒaν ) ≤ Nb RN , the multigrid
rank reduction algorithm, having linear scaling in rank(ϒaν ), see Section 3, provides
126 | 8 Multilevel grid-based tensor-structured HF solver
the complexity bound O(r0 Nb RN np ). Hence the total cost to compute scalar products
in χμν,a (see (8.4)) can be estimated by
which completes the first part of our proof. The second assertion follows due to linear
scaling in np of the unigrid algorithm, which implies the following bound:
Remark 8.2. In the case of large molecules and RG = rank(Gμ ) ≥ 1, further optimiza-
tion of the algorithm up to O(RN Nb2 np )-complexity may be possible on the base of rank
reduction applied to the rank-RG Nb orbitals and by using an iterative eigenvalue solver
instead of currently employed direct solver via matrix diagonalization, or by using di-
rect minimization schemes [263].
Figure 8.5: Multilevel convergence of the DIIS iteration applied to the all electron case of H2 O (left),
and convergence in the energy in n (right).
problem. The minimization of the Frobenius norm of the virtual block of the Fock op-
erator evaluated on eigenvectors of the consequent iterations, C ̃ ,C
k k−1 , . . ., is utilized
̃
for the DIIS scheme.
The multilevel solution of the nonlinear eigenvalue problem (8.18) is realized via
the SCF iteration on a sequence of uniformly refined grids, beginning from the initial
coarse grid, say, with n0 = 64, and proceeding on the dyadically refined grids np =
n0 2p , p = 1, . . . , M. We use the grid-dependent termination criterion εnp := ε0 2−2p ,
keeping a continuous numbering of the iterations.
Figure 8.5 (left) shows the convergence of the iterative scheme in the case of H2 O
molecule. Figure 8.5 (right) illustrates the convergence in the total Hartree–Fock en-
ergy reaching the absolute error about 10−4 , which implies the relative error 9 ⋅ 10−6 in
the case of grid size n = 1024. The total energy is calculated by
Norb Norb
EHF = 2 ∑ λa − ∑ (̃Ja − K
̃a )
a=1 a=1
with ̃Ja = ⟨ψa , VH ψa ⟩L2 , and K̃a = ⟨ψa , 𝒱ex ψa ⟩ 2 , being the so-called Coulomb and
L
exchange integrals, respectively, computed in the molecular orbital basis ψa (a =
1, . . . , Norb ).
The detailed discussion of the multilevel DIIS iteration, including various numer-
ical tests, can be found in [187, 146].
9 Grid-based core Hamiltonian
In this section, following [156], we discuss the grid-based method for calculating the
core Hamiltonian part in the Fock operator (7.4)
1
ℋ = − Δ + Vc
2
with respect to the Galerkin basis {gm (x)}1≤m≤Nb , x ∈ ℝ3 , where Vc (x) is given by (7.4),
and Δ represents the 3D Laplacian subject to Dirichlet boundary conditions.
N
(I1 w)(xℓ ) := ∑ w(xiℓ )ξiℓ (xℓ ), xi ∈ ω3,N , ℓ = 1, 2, 3.
iℓ =1
This leads to the separable grid-based approximation of the initial Gaussian-type basis
functions gk (x),
3 3 N
by using the piecewise linear representation of the basis functions g k (x) ∈ ℝ3 (see (9.1))
constructed on N × N × N Cartesian grid (see [41] for general theory of finite element
https://doi.org/10.1515/9783110365832-009
130 | 9 Grid-based core Hamiltonian
Figure 9.1: Using hat functions ξi (x1 ) for a single-mode basis function gk (x1 ) yielding the piecewise
linear representation gk (x1 ) of a continuous function gk (x1 ).
The accuracy of this approximation is of order ‖akm −akm ‖ = O(h2 ), where h is the mesh
size (see [156], Theorem A.4, and numerics in Section 9.3).
Recall that the Laplace operator applies to a separable function η(x), x =
(x1 , x2 , x3 ) ∈ ℝ3 , having a representation η(x) = η1 (x1 )η2 (x2 )η3 (x3 ) as follows:
which ensures the standard Kronecker rank-3 tensor representation of the respective
Galerkin FEM stiffness matrix AΔ in the tensor basis {ξi (x1 )ξj (x2 )ξk (x3 )}, i, j, k = 1, . . . N,
Here, the 1D stiffness and mass matrices A(ℓ) , S(ℓ) ∈ ℝN×N , ℓ = 1, 2, 3, are given by
N 1
A(ℓ) := {⟨∇(ℓ) ξi (xℓ ), ∇(ℓ) ξj (xℓ )⟩}i,j=1 = tridiag{−1, 2, −1},
h
N h
S(ℓ) = {⟨ξi , ξj ⟩}i,j=1 = tridiag{1, 4, 1},
6
d
respectively, and ∇(ℓ) = dxℓ
. Since {ξi }Ni=1 are the same for all modes ℓ = 1, 2, 3, (for
simplicity of notation) we further denote A(ℓ) = A1 and S(ℓ) = S1 .
Lemma 9.1 (Galerkin matrix AG , [156]). Assume that the basis functions {g k (x)}, x ∈ ℝ3 ,
k = 1, . . . Nb , are rank-1 separable, i. e., g k (x) = g (1)
k (x1 )g k (x2 )g k (x3 ). Then matrix en-
(2) (3)
+ ⟨S1 g(1)
k
, g(1)
m ⟩⟨A1 gk , gm ⟩⟨S1 gk , gm ⟩
(2) (2) (3) (3)
+ ⟨S1 g(1)
k
, g(1)
m ⟩⟨S1 gk , gm ⟩⟨A1 gk , gm ⟩
(2) (2) (3) (3)
= ⟨AΔ Gk , Gm ⟩, (9.4)
N
where g(ℓ)
k
, g(ℓ)
m ∈ ℝ (k, m = 1, . . . , Nb ) are the vectors of collocation coefficients of
rank-1.
i=1 j=1
N N
= ⟨∑ gk i ∇(1) ξi (x1 ), ∑ gk j ∇(1) ξj (x1 )⟩
i=1 j=1
N N
= ∑ gk i ∑ gk j ⟨∇(1) ξi (x1 ), ∇(1) ξj (x1 )⟩ = ⟨A1 g(1)
k
, g(1)
m ⟩,
i=1 j=1
and
k , g m ⟩ = ⟨S1 gk , gm ⟩,
⟨g (1) (1) (1) (1)
akm = ⟨AΔ Gk , Gm ⟩,
AG = GT AΔ G ∈ ℝNb ×Nb ,
Lemma 9.1 implies that in case of basis functions having ranks larger than one
Rm
gm (x) = ∑ ηp (x), Rm ≥ 1, (9.6)
p=1
where ηp (x) is the rank-1 separable function, representation (9.4) takes the following
form:
Rk Rm
akm = ∑ ∑ [⟨A1 g(1) , g(1) ⟩⟨S1 g(2)
k,p m,q
, g(2) ⟩⟨S1 g(3)
k,p m,q
, g(3) ⟩
k,p m,q
p=1 q=1
+ ⟨g(1)
k
, g(1)
m ⟩⟨A1 gk , gm ⟩⟨gk , gm ⟩
(2) (2) (3) (3)
+ ⟨g(1)
k
, g(1)
m ⟩⟨gk , gm ⟩⟨A1 gk , gm ⟩
(2) (2) (3) (3)
= ⟨AΔ,FD Gk , Gm ⟩,
akm = ⟨AΔ,d Gk , Gm ⟩
Newton kernel PR in the bounding box, translated and restricted according to coordi-
nates of the nuclei in a box. The approach is applicable, for example, in tensor-based
calculation of the nuclear potential operator describing the Coulombic interaction of
electrons with the nuclei in a molecular system in a box or in a (cubic) unit cell. It is
defined by the function Vc (x) in the scaled unit cell Ω = [−b/2, b/2]3 ,
M0
Zν
Vc (x) = ∑ , Zν > 0, x, aν ∈ Ω ⊂ ℝ3 , (9.8)
ν=1 ‖x − aν ‖
where M0 is the number of nuclei in Ω, and aν and Zν represent their coordinates and
charges, respectively.
1
We start with approximating the non-shifted 3D Newton kernel ‖x‖ on the auxiliary
extended box Ω̃ = [−b, b]3 by its projection onto the basis set {ψ } of piecewise constant
i
functions defined on the uniform 2n × 2n × 2n tensor grid Ω2n with the mesh size h
described in Section 6.1. This defines the “reference” rank-R canonical tensor as above:
R
̃ R = ∑ p(1) ⊗ p(2) ⊗ p(3) ∈ ℝ2n×2n×2n .
P (9.9)
q q q
q=1
for ν = 1, . . . , M0 by
n×n×n
𝒲ν P
̃ R := P
̃ R (iν + n/2 : iν + 3/2n; jν + n/2 : jν + 3/2n; kν + n/2 : kν + 3/2n) ∈ ℝ . (9.10)
With this notation, the total electrostatic potential Vc (x) in the computational box
Ω is approximately represented by a direct canonical tensor sum
134 | 9 Grid-based core Hamiltonian
M0
Pc = ∑ Zν 𝒲ν P
̃R
ν=1
M0 R
n×n×n
= ∑ Zν ∑ 𝒲ν(1) p(1)
q ⊗ 𝒲ν pq ⊗ 𝒲ν pq ∈ ℝ
(2) (2) (3) (3)
(9.11)
ν=1 q=1
rank(Pc ) ≤ M0 R, (9.12)
Remark 9.3. The rank estimate (9.12) for the sum of arbitrarily positioned electrostatic
potentials in a box (unit cell) Rc = rank(Pc ) ≤ M0 R is usually too pessimistic. Our nu-
merical tests for moderate size molecules indicate that the rank of the (M0 R)-term
canonical sum in (9.11) can be reduced considerably. This rank optimization can
be implemented by the multigrid version of the canonical rank-reduction algorithm,
canonical-Tucker-canonical [174] (see also Section 3.3). The resultant canonical tensor
will be denoted by P̂c.
n
Gμ = [gμ (x1 (i), x2 (j), x3 (k))]i,j,k=1 ∈ ℝn×n×n
9.2 Nuclear potential operator by direct tensor summation | 135
obtained by sampling of gμ (x) at the midpoints (x1 (i), x2 (j), x3 (k)) of the grid-cells in-
dexed by (i, j, k). Suppose, for simplicity, that it is a rank-1 canonical tensor
rank(Gμ ) = 1, i. e.
n×n×n
Gμ = g(1)
μ ⊗ gμ ⊗ gμ ∈ ℝ
(2) (3)
n
with the canonical vectors g(ℓ)
μ ∈ ℝ associated with modes ℓ = 1, 2, 3.
The sum of potentials in a box Vc (x) (9.8) is represented in the given basis set (9.13)
by a matrix Vg = [vkm ] ∈ ℝNb ×Nb . The entries of the nuclear potential operator matrix
are calculated (approximated) by the simple tensor operation (see [156, 147])
Gk ⊙ Gm := (g(1)
k
⊙ g(1)
m ) ⊗ (gk ⊙ gm ) ⊗ (gk ⊙ gm )
(2) (2) (3) (3)
denotes the Hadamard (entrywise) product of tensors representing the basis functions
(9.13), which is reduced to 1D products. The scalar product ⟨⋅, ⋅⟩ in (9.14) is also reduced
to 1D scalar products due to separation of variables.
We notice that the approximation error ε > 0 caused by a separable representa-
tion of the nuclear potential is controlled by the rank parameter Rc = rank(Pc ) ≈ CR,
where C weakly depends on the number of nuclei M0 . Now letting rank(Gm ) = 1 im-
plies that each matrix element is to be computed with linear complexity in n, O(Rn).
The exponential convergence of the canonical approximation in the rank parameter
R allows us the optimal choice R = O(|log ε|) adjusting the overall complexity bound
O(|log ε|n) almost independent on M0 .
Remark 9.4. It should be noted that since we remain in the concept of global basis
functions for the Galerkin approximation to the HF eigenvalue problem, the sizes of
the grids used in discretized representation of these basis functions can be different
in the calculation of the kinetic and potential parts in the Fock operator. The corre-
sponding choice is only controlled by the respective approximation error and by the
numerical efficiency.
Finally, we note that the Galerkin tensor representation of the identity operator
leads to the following mass matrix: S = {skm }, where
To conclude this section, we note that the error bound ‖Vg − VG ‖ ≤ Ch2 can be
proven along the line of the discussion in [166].
136 | 9 Grid-based core Hamiltonian
for a single Gaussian with sufficiently large α > 0 and using large N × N × N Cartesian
grids. Functions are discretized with respect to the basis set (9.1) in the computational
box [−b, b]3 with b = 14.6 au ≈ 8 Å.
For a single Gaussian, we compare 𝒥h computed as in Lemma 9.1 with the exact
expression
2
𝒥 = ∫ ∇g(x) ⋅ ∇g(x)dx = 3J1 J01 ,
ℝ3
where
∞ ∞
2 π 2
J1 = 4α2 ∫ x2 e−2αx dx = √ √α,
√π
J01 = ∫ e−αx dx = .
2 √α
−∞ −∞
Table 9.1 shows the approximation error |𝒥 − 𝒥h | versus the grid size, where 𝒥h corre-
sponds to the grid-based evaluation of the matrix element on the corresponding grid
for α = 2500, 4 ⋅ 104 , and 1.2 ⋅ 105 , which exceed the largest exponents α in the conven-
tional Gaussian sets for hydrogen (α = 1777), carbon (α = 6665), oxygen (α = 11 720),
and mercury (α = 105 ) atoms.
Computations confirm the results of Theorem A4 in [156] on the error bound O(h2 ).
It can be seen that the errors reduce by a distinct factor of 4 for the diadically refined
Table 9.1: Approximation error |𝒥 −𝒥h | for the grid-based evaluation of the Laplacian Galerkin matrix
2
entry for a Gaussian g(x) = e−α‖x‖ , x ∈ ℝ3 , N = 2p − 1.
grids. Therefore, in spite of sharp “needles” of Gaussians due to large α, the Richard-
son extrapolation [218] (RE column) on a sequence of large grids provides a higher
accuracy of order O(h3 )–O(h4 ).
In Table 9.1, the largest grid size N = 219 − 1 corresponds to the computational box
Ω ∈ ℝ3 with the huge number of entries of order 257 ≈ 1017 . The corresponding mesh
size is of order h ∼ 10−5 Å. Computing times in Matlab range from several millisec-
onds up to 1.2 sec for the largest grid.
2
3
Notice that the integral ⟨g, g⟩ = ∫ℝ3 e−2α‖x‖ dx = J01 (α) involved in the calculation
of the mass-matrix Sg is approximated with the same accuracy.
In the following, we consider an example on the grid-based approximation to the
Schrödinger equation for the hydrogen atom (see [156]), that is, we verify the proposed
algorithms for the Hartree–Fock equation in the simplest case of the hydrogen atom
1 1
ℋψ = λψ, ℋ=− Δ+ , x ∈ ℝ3 , (9.15)
2 ‖x‖
Example 9.1. Consider the traditional expansion of the solution using the ten s-type
primitive Gaussian functions from the cc-pV6Z basis set [234, 265]
Nb
ψ(x) ≈ ∑ ck φk (x), Nb = 10, x ∈ ℝ3 ,
k=1
1 1
F = ⟨ℋg k , g m ⟩ := − ⟨Δg k , g m ⟩ + ⟨ g , g ⟩, k, m = 1, . . . Nb ,
2 ‖x‖ k m
with respect to the Galerkin basis {g k }. We choose the appropriate size of the compu-
tational box as b ≈ 8 Å and discretize {g k } using N × N × N Cartesian grid, obtain-
ing the canonical rank-1 tensor representation Gk of the basis functions. Then, the
kinetic energy and the nuclear potential parts of the Fock operator are computed by
(9.4) and (9.14).
Table 9.2, line (1), presents numerical errors in energy |λ − λh | of the grid-based
calculations using the cc-pV6Z basis set of Nb = 10 Gaussians generated by Molpro
[299], providing an accuracy of order ∼10−6 . Notice that this accuracy is achieved al-
ready at the grid-size N = 8192, hence, further grid refinement does not improve the
results.
Example 9.2. Here, we study the effect of basis optimization by adding an auxiliary
basis function to the Gaussian basis set from the previous example, thus increasing
the number of basis functions to Nb = 11. The second line (2) in Table 9.2 shows
improvement of accuracy for the basis augmented by a rank-1 approximation to the
138 | 9 Grid-based core Hamiltonian
Table 9.2: Examples 9.1–9.3 for hydrogen atom: |λ − λh | vs. grid size N3 for (1) the discretized basis
of Nb = 10 Gaussians, (2) 11 basis functions consisting of Gaussians augmented by a rank-1 func-
tion φ0 , (3) discretized single rank-Rb Slater function.
(1) |λ − λh | 4.1 ⋅ 10−4 1.0 ⋅ 10−4 2.7 ⋅ 10−5 7.5 ⋅ 10−6 2.4 ⋅ 10−6 1.0 ⋅ 10−6
(2) |λ − λh | 1.5 ⋅ 10−5 7.2 ⋅ 10−6 2.7 ⋅ 10−6 1.1 ⋅ 10−6 8.0 ⋅ 10−7 7.8 ⋅ 10−7
(3) |λ − λh | 1.0 ⋅ 10−4 2.7 ⋅ 10−5 6.8 ⋅ 10−6 1.7 ⋅ 10−6 4.3 ⋅ 10−7 –
Slater function given by the grid representation of φ0 = e−(|x1 |+|x2 |+|x3 |) . Augmenting by
a piecewise linear hat function of the type ξi centered at the origin gives similar results
as for φ0 .
Example 9.3. In this example, we present computations with the controlled accuracy
using a single rank-Rb basis function generated by the sinc-approximation to the Slater
function. Using the Laplace transform
∞
√α
G(ρ) = e−2√αρ = ∫ τ−3/2 exp(−α/τ − ρτ)dτ,
√π
0
the Slater function can be represented as a rank-R canonical tensor by computing the
sinc-quadrature decomposition [161, 163] and setting ρ = x12 + x22 + x32 :
√α L 3
G(ρ) ≈ ∑ wk τk−3/2 exp(−α/τk ) ∏ exp(−τk xℓ2 ),
√π k=−L ℓ=1
Table 6.3 in [156] presents the Richardson extrapolation for Examples 9.1 and 9.3.
Due to noticeable convergence rate of order O(h2 ), the Richardson extrapolation (RE)
gives further improvement of the accuracy up to O(h3 ). It can be seen in Table 6.3,
[156], that the Richardson extrapolation for the results of Example 9.3 gives accuracy
of order 10−7 , beginning from the grid size 4096. Note that with the choice L = 60, the
accuracy is improved on one order of magnitude compared to those obtained for the
standard Gaussian basis set in Example 9.1.
Table 9.3 presents numerical examples of the grid-based approximation to the
Galerkin matrices for the Laplace operator AG and nuclear potential VG using (9.4)
and (9.14) for C2 H5 OH molecule. The mesh size of the N × N × N Cartesian grid ranges
9.3 Numerical verification for the core Hamiltonian | 139
‖Ag − AG ‖ ‖Vg − VG ‖
Er(AG ) = , Er(VG ) = .
‖Ag ‖ ‖Vg ‖
The quadratic convergence of both quantities along the line of dyadic grid refinement
is in good agreement with the theoretical error estimates O(h2 ). Therefore, the employ-
ment of the Richardson approximation providing the error
4 ⋅ VG,h − VG,2h
ERi,2h,h = Er( )
3
suggests further improvement of the accuracy up to order O(h4 ) for the Laplace opera-
tor. The “RE” lines in Table 9.3 demonstrate the results of the Richardson extrapolation
applied to corresponding quantities at the adjacent grids.
Note that for the grid-based representation of the collective nuclear potential Pc ,
the univariate grid size n can be noticeably smaller than the size of the grid used for
the piecewise linear discretization for the Laplace operator.
Table 9.3: Ethanol (C2 H5 OH): accuracy Er(AG ) and Er(VG ) of the Galerkin matrices AG and VG corre-
sponding to the Laplace and the nuclear potential operators, respectively, using the discretized
basis of 123 primitive Gaussians (from the cc-pVDZ set [75, 265]).
p 13 14 15 16 17
N3 = 23p 81923 16 3843 32 7683 65 5363 131 0723
Er(AG ) 0.032 0.0083 0.0021 5.2 ⋅ 10−4 1.3 ⋅ 10−4
RE – 4.0 ⋅ 10−4 3.3 ⋅ 10−5 6.0 ⋅ 10−6 5.0 ⋅ 10−8
Er(VG ) 0.024 0.0083 0.0011 3.1 ⋅ 10−4
RE – 0.0031 0.0013 5.9 ⋅ 10−5
Figure 9.2 displays the nuclear potential for the molecule C2 H5 OH (ethanol) computed
in a box [−b, b]3 with b = 16 au. We show two cross-sections of the 3D function at the
level x = 0.0625 au and of the permuted function at the level y = −0.3125 au. It can be
seen from the left figure that three non-hydrogen atoms with the largest charges (two
Carbon atoms with Z = 6 and one Oxygen atom with Z = 8) are placed on the plane
x = 0. The right figure shows the location close to one of Hydrogen atoms.
The error ε > 0 arising due to the separable approximation of the nuclear po-
tential is controlled by the rank parameter of the nuclear potential RP = rank(Pc ).
Now letting rank(Gm ) = Rm implies that each matrix element is to be computed with
140 | 9 Grid-based core Hamiltonian
Figure 9.2: Nuclear potential Pc for the C2 H5 OH molecule, shown for the cross sections along x-axis
at the level x = 0.0625 au and along y-axis at level y = 1.6 au.
linear complexity in n, O(Rk Rm RP n). The almost exponential convergence of the rank
approximation in RP allows us the choice RP = O(|log ε|).
The maximum computational time for AG with N 3 = 131 0723 is of the order of
hundred seconds in MATLAB. For the coarser grid with N 3 = 81923 , CPU times are in
the range of several seconds for both AG and VG .
Comprehensive error estimates for the grid-based calculations of the core Hamil-
tonian are formulated in [156], where a number of numerical experiments for various
molecules is presented as well.
10 Tensor factorization of grid-based two-electron
integrals
10.1 General introduction
The efficient tensor-structured method for the grid-based calculation of the two-
electron integrals (TEI) tensor was introduced by V. Khoromskaia, B. Khoromskij, and
R. Schneider in 2012 (see [157]). In this chapter, following [157, 150], we describe the
fast algorithm for the grid-based computation of the fourth-order TEI tensor in a form
of the Cholesky factorization by using the grid-based algebraic 1D “density fitting”
scheme, which applies to the products of basis functions. It is worth to note, that the
described approach does not require calculation of the full TEI matrix, but only re-
lies on computation of its few selected columns evaluated by using 1D density fitting
factorizations (see Remark 10.3).
Imposing the low-rank tensor representation of the product basis functions and
the Newton convolving kernel, all discretized on large n × n × n Cartesian grid, the 3D
integral transforms are calculated in O(n log n) complexity. This scheme provides the
storage for TEI of the order of O(Nb3 ) in the number of basis functions Nb .
The TEI tensor, also known as the Fock integrals or electron repulsion integrals,
is the principal ingredient in electronic and molecular structure calculations. In par-
ticular, the corresponding coefficient tensor arises in ab initio Hartree–Fock (HF) cal-
culations, in post Hartree–Fock models (MP2, CCSD, Jastrow factors, etc.), and in the
core Hamiltonian appearing in FCI-DMRG calculations [6, 298, 241, 128].
Given the finite basis set {gμ }1≤μ≤Nb , gμ ∈ H 1 (ℝ3 ), the associated fourth-order two-
electron integrals tensor B = [bμνκλ ] ∈ ℝNb ×Nb ×Nb ×Nb is defined entrywise by
The fast and accurate evaluation and effective storage of the fourth-order TEI ten-
sor B of size Nb4 is the challenging computational problem since it includes multiple 3D
convolutions of the Newton kernel 1/‖x − y‖, x, y ∈ ℝ3 , with strongly varying product-
basis functions. Hence, in the limit of large Nb , the efficient numerical treatment and
storage of the TEI tensor is considered as one of the central tasks in electronic structure
calculations [247].
The traditional analytical integration using the representation of electronic or-
bitals in a Gaussian-type basis is the basement of most ab initio quantum chemical
packages. Hence, the choice of a basis set {gμ }1≤μ≤Nb is essentially restricted by the
“analytic” integrability for efficient computations of the tensor entries represented by
6D integrals in (10.1). This approach possesses intrinsic limitations concerning the
non-alternative constraint to the Gaussian-type basis functions, which may become
https://doi.org/10.1515/9783110365832-010
142 | 10 Tensor factorization of grid-based two-electron integrals
unstable and redundant for higher accuracy, larger molecules, or when considering
heavy nuclei.
It is known in quantum chemistry simulations [17, 298, 303] that, in the case of
compact molecules, the (pivoted) incomplete Cholesky factorization of the Nb2 × Nb2
TEI matrix unfolding
like the scalar, Hadamard, and convolution products with linear 1D complexity O(n).
On the one hand, this weak dependence on the grid-size is the ultimate payoff for gen-
erality, in the sense that rather general approximating basis sets may be equally used
instead of analytically integrable Gaussians. On the other hand, the approach also
serves for structural simplicity of implementation, since the topology of the molecule
is caught without any physical insight, and only by the algebraically determined rank
parameters of the fully grid-based numerical scheme.
Due to O(n log n) complexity of the algorithms, there are rather weak practical re-
strictions on the grid-size n, allowing calculations on really large n × n × n 3D Cartesian
grids in the range n ∼ 103 –105 , thereby avoiding grid refinement. The corresponding
mesh sizes enable high resolution of the order of the size of atomic nuclei. For stor-
age consuming operations, the numerical expense can be reduced to logarithmic level
O(log n) by using the QTT representation of the discretized 3D basis functions and their
convolutions.
In [157] it is shown that the rank-O(Nb ) Cholesky decomposition of the TEI ma-
trix B, combined with the canonical-QTT data compression of long vectors, allows the
reduction of the asymptotic complexity of grid-based tensor calculations in HF and
some post-HF models. Alternative approaches to optimization of the HF, MPx, CCSD,
and other post-HF models can be based on using physical insight to sparsify the TEI
tensor B by zeroing-out all “small” elements [298, 241, 6, 268, 311].
n
Gμ = [gμ (x1 (i), x2 (j), x3 (k))]i,j,k=1 ∈ ℝn×n×n , μ = 1, . . . , Nb ,
obtained by sampling of gμ (x) over the midpoints (x1 (i), x2 (j), x3 (k)) of the grid-cells
with index (i, j, k). Given the discretized basis function Gμ , (μ = 1, . . . , Nb ), we assume
(without loss of generality) that it is a rank-1 tensor, rank(Gμ ) = 1, i. e.,
n×n×n
Gμ = g(1)
μ ⊗ gμ ⊗ gμ ∈ ℝ
(2) (3)
(10.3)
144 | 10 Tensor factorization of grid-based two-electron integrals
n
with the skeleton vectors g(ℓ)
μ ∈ ℝ , ℓ = 1, 2, 3, obtained as projections of the basis
functions gμ (x) on the uniform grid. Then the entries of B can be represented by using
the tensor scalar product over the “grid” indices
where
1
the Newton potential ‖x‖ (see Section 6.1). We recall that ∗ stands for the 3D tensor
convolution (5.11) and ⊙ denotes the 3D Hadamard product (2.37).
The element-wise accuracy of the tensor representation (10.4) is estimated by
O(h2 ), where h = 2b/n is the step-size of the Cartesian grid [166]. The Richardson
extrapolation reduces the error to O(h3 ).
It is worth to emphasize that in our scheme the n⊗3 tensor Cartesian grid does
not depend on the positions of nuclei in a molecule. Consequently, the simultaneous
rotation and translation of the nuclei positions still preserve the asymptotic approxi-
mation error on the level of O(h2 ).
The result is a direct consequence of definition (10.1) and symmetry of the convo-
lution product. The above symmetry relation allows reducing the number of precom-
puted entries in the full TEI tensor to Nb4 /8. This property is also mentioned in [291].
Let us introduce the 5th-order tensors
Then (10.4) is equivalent to the contracted product representation over n⊗3 -grid in-
dexes
where the right-hand part is recognized as the discrete counterpart of the Galerkin
representation (10.1) in the full product basis. When using the full grid calculations,
the total storage cost for the n × n × n product-basis tensor G and its convolution H
N (N +1) N (N +1)
amounts to 3 b 2b n and 3RN b 2b n, respectively. The numerical cost of Nb2 tensor-
product convolutions to compute H is estimated by O(RN Nb2 n log n) [166]. Based on
representation (10.6), each entry in the TEI tensor B of size Nb4 can be calculated with
the cost O(RN n), which might be too expensive for the large grid-size n. Thus a direct
tensor calculation of TEI seems to be unfeasible except for small molecules, even when
using the QTT tensor representation of the basis functions, as it was shown in [157].
10.3 Redundancy-free factorization of the TEI matrix B | 145
Remark 10.2. If the separation rank of a basis set is larger than 1, then the complexity
of scalar products in (10.6) increases quadratically in the rank parameter. However,
the use of basis functions with the greater than one rank parameter (say, Slater-type
functions) can be motivated by the reduction of the basis size Nb , which has a fourth-
order effect on the complexity.
T T
G(ℓ) ≅ U (ℓ) V (ℓ) such that G(ℓ) − U (ℓ) V (ℓ) F ≤ ε, ℓ = 1, 2, 3, (10.9)
2
with an orthogonal matrix U (ℓ) ∈ ℝn×Rℓ and a matrix V (ℓ) ∈ ℝNb ×Rℓ , where Rℓ is the
corresponding matrix ε-rank. Here, U (ℓ) , V (ℓ) represent the so-called left and right
146 | 10 Tensor factorization of grid-based two-electron integrals
redundancy-free basis sets, where only the grid-depending part U (ℓ) is to be used in
the convolution products.
2
Since the direct SVD of large rectangular matrices G(ℓ) ∈ ℝn×Nb can be prohibitively
expensive, even for the moderate size molecules (n ≥ 213 , Nb ≥ 200), the five-step algo-
rithm was introduced in [157, 150], which reduces computational and storage costs to
T
compute the low-rank approximation G(ℓ) ≅ U (ℓ) V (ℓ) with the guaranteed tolerance
ε > 0, see Algorithm 1.
Numerical experiments show that the Frobenius error of these rank decompositions
decays exponentially in the rank parameter Rℓ :
(ℓ) (ℓ) T
≤ Ce−γℓ Rℓ ,
G − U V ℓ = 1, 2, 3, γℓ > 0.
(ℓ)
F
Figure 10.1 illustrates the exponential decay in singular values of G(ℓ) for several mod-
erate size molecules.
Step (3) in Algorithm 1 requires an access to the full matrix G(ℓ) . However, when
this matrix allows data-sparse representation, the respective matrix–vector multipli-
Figure 10.1: Singular values of G(ℓ) for ℓ = 1, 2, 3: NH3 (left), glycine (middle) and Alanine (right)
molecules with the numbers Nb and Norb equal to 48, 5; 170, 20 and 211, 24, respectively.
10.3 Redundancy-free factorization of the TEI matrix B | 147
cations can be implemented with reduced cost. For example, given the low-rank QTT
representation of the column vectors in G(ℓ) , the matrix–matrix product at Step (3) can
be implemented in O(Nb2 Rℓ log n) operations. Notice that the QTT ranks of the column
vectors are estimated in numerical experiments by O(1) for all molecular systems con-
sidered so far, see also [68] concerning the QTT rank estimate of the Gaussian.
Another advantageous feature is due to a perfect parallel structure of the matrix–
vector multiplication procedure at Step (3). Here, the algebraically optimized separa-
tion ranks Rℓ are mostly determined by the geometry of a molecule, whereas the num-
ber Nb2 −Rℓ indicates the measure of redundancy in the product basis set. In numerical
experiments we observe Rℓ ≤ Nb and Rℓ ≪ n for large n.
Figure 10.2, left, represents the ε-rank Rℓ , ℓ = 1, 2, 3, and RB , computed on the ex-
amples of some compact molecules with ε = 10−6 . We observe that the Cholesky rank
of B, RB (see Section 10.3.2) is a multiple of Nb with a factor ∼6 (see also Figure 10.3).
Remarkably, the RHOSVD separation ranks Rℓ ≤ Nb remain to be very weakly depen-
dent on Nb , but primarily depend on the topology of a molecule.
Figure 10.2 (right) provides average QTT ranks of column-vectors in U (1) ∈ ℝn×R1
for NH3 , H2 O2 , N2 H4 , and C2 H5 OH molecules. Again, surprisingly, the rank portraits
Figure 10.2: Left: ε-ranks Rℓ and RB for HF, NH3 , H2 O2 , N2 H4 , and C2 H5 OH molecules versus the num-
ber of basis functions Nb = 34, 48, 68, 82, and 123, respectively. Right: Average QTT ε-ranks of
column-vectors in U (1) ∈ ℝn×Rℓ for NH3 , H2 O2 , N2 H4 , and C2 H5 OH molecules, ε = 10−6 .
Table 10.1: Average QTT ε-ranks of U (1) and V (1) in G(1) -factorization, ε = 10−6 .
Molecules NH3 H2 O2 N2 H4 C2 H5 OH
appear to be nearly the same for different molecules, and the average rank over all
indexes m = 1, . . . , R1 is a small constant, about r0 ⋍ 7. The more detailed results are
listed in Table 10.1.
Now we are in a position to represent the TEI matrix B in the factorized form using
a reduced set of convolving functions. First, we recall that using the scalar product
representation of n × n × n arrays, we can rewrite the discretized integrals (10.1) in
terms of tensor operations as in (10.4), (10.5). Then using representations (10.7) and
(10.8) for each fixed multiindex μνκλ, we arrive at the following tensor factorization of
B [157]:
RN
T
B = ∑ ⊙3ℓ=1 G(ℓ) (p(ℓ)
k
∗n G(ℓ) ), (10.10)
k=1
where p(ℓ)
k
, ℓ = 1, 2, 3, are the column vectors in the side matrices of the rank-RN
1
canonical tensor representation PN of the Newton kernel ‖x‖ [166]. Substitution of
the side matrix decomposition (10.9) to (10.10) leads to the redundancy-free factorized
ε-approximation of the matrix B [157]:
RN RN
T T
B = ∑ ⊙3ℓ=1 G(ℓ) (p(ℓ)
k
∗n G(ℓ) ) ≅ ∑ ⊙3ℓ=1 V (ℓ) Mk(ℓ) V (ℓ) =: Bε , (10.11)
k=1 k=1
where V (ℓ) represents the corresponding right redundant free basis and
T
Mk(ℓ) = U (ℓ) (p(ℓ)
k
∗n U (ℓ) ) ∈ ℝRℓ ×Rℓ , k = 1, . . . , RN , (10.12)
stands for the Galerkin convolution matrix on the left redundant free basis U (ℓ) ,
ℓ = 1, 2, 3. We notice that equation (10.12) includes only Rℓ ≪ Nb2 convolution products.
The computational scheme for convolution matrices Mk(ℓ) is described in Algorithm 2.
Inspection of Algorithm 2 shows that the storage demand for representations (10.11)–
(10.12) can be estimated by RN ∑3ℓ=1 R2ℓ + Nb2 ∑3ℓ=1 Rℓ and O((RG + RN )n), respectively.
The following lemma proves the complexity and error estimates for tensor representa-
tions (10.11)–(10.12). Given the ε-truncated SVD-based left-orthogonal decomposition
T
of G(ℓ) , G(ℓ) ≅ U (ℓ) V (ℓ) , ℓ = 1, 2, 3, with n × Rℓ and Nb2 × Rℓ matrices, U (ℓ) (orthogonal)
and V (ℓ) , respectively, we denote RG = max Rℓ .
Lemma 10.4 ([157, 150]). Given ε > 0, the redundancy-free factorized ε-approximations
to the matrix B (10.11) and to the convolution matrix (10.12) exhibit the following proper-
ties:
(A) The storage demand for factorizations (10.11) and (10.12) is estimated by
3 3
RN ∑ R2ℓ + Nb2 ∑ Rℓ , and O((RG + RN )n),
ℓ=1 ℓ=1
3
rank(Bε ) ≤ min{Nb2 , RN ∏ Rℓ }. (10.13)
ℓ=1
T
(C) Denote Aℓ (k) = G(ℓ) (p(ℓ)
k
∗n G(ℓ) ). Then we have the following error estimate in the
Frobenius norm:
RN
2
‖B − Bε ‖F ≤ 6ε max G(ℓ) F ∑ max Aℓ (k)F p(ℓ) (10.14)
ℓ ℓ k F .
k=1
Proof. (A) Using the Galerkin-type representation of the TEI tensor B as in (10.6), we
obtain
RN
T
B = mat(B) = ∑ ⊙3ℓ=1 G(ℓ) [p(ℓ)
k
∗n G(ℓ) ].
k=1
150 | 10 Tensor factorization of grid-based two-electron integrals
Plugging the truncated SVD factorization of G(ℓ) into the right-hand side leads to the
desired representation
RN
T T
Bε = ∑ ⊙3ℓ=1 V (ℓ) U (ℓ) [p(ℓ)
k
∗n (U (ℓ) V (ℓ) )]
k=1
RN
T T
= ∑ ⊙3ℓ=1 V (ℓ) [U (ℓ) (p(ℓ)
k
∗n U (ℓ) )]V (ℓ)
k=1
RN
T
= ∑ ⊙3ℓ=1 V (ℓ) Mk(ℓ) V (ℓ) . (10.15)
k=1
The storage cost for the RHOSVD-type factorization (10.15) to the Nb2 × Nb2 matrix B is
bounded by RN ∑3ℓ=1 R2ℓ + Nb2 ∑3ℓ=1 Rℓ independently on the grid-size n.
The computational complexity at this step is dominated by the cost of the reduced
T
Cholesky algorithm applied to the matrix G(ℓ) G(ℓ) that computes truncated SVD of the
side matrices G(ℓ) at the cost O(RG (Nb2 +n)) and by the total cost of convolution products
in (10.12), O(RN RG n log n).
(B) Using the rank properties of Hadamard product of matrices, it is easy to see
that (10.15) implies the direct ε-rank estimate for the matrix Bε as in (10.13), where Rℓ ,
ℓ = 1, 2, 3 characterizes the effective rank in “1D density fitting”.
(C) The error bound can be derived along the line of [174], Theorem 2.5(d), related
to the RHOSVD error analysis. Indeed, the approximation error can be represented
explicitly by
RN
T T
B − Bε = ∑ (⊙3ℓ=1 G(ℓ) p(ℓ)
k
∗n G(ℓ) − ⊙3ℓ=1 V (ℓ) U (ℓ) p(ℓ)
k
∗n U (ℓ) V (ℓ) ).
k=1
T T
̃ ℓ (k) = V (ℓ) U (ℓ) (p(ℓ) ∗n U (ℓ) V (ℓ) ). Then for each fixed k = 1, . . . RN , we have
Denote A k
‖Aℓ − A
̃ ℓ ‖ ≤ 2εp(ℓ) G(ℓ)
k (10.16)
T
because of the stability in the Frobenius norm ‖U (ℓ) V (ℓ) ‖ ≤ ‖G(ℓ) ‖. Now, for fixed k,
we obtain
A1 ⊙ A2 ⊙ A3 − A
̃1 ⊙ A
̃2 ⊙ A
̃ 3 = A1 ⊙ A2 ⊙ A3 − Ã 1 ⊙ A2 ⊙ A3
+Ã 1 ⊙ A2 ⊙ A3 − Ã1 ⊙ A
̃ 2 ⊙ A3
+A
̃1 ⊙ A
̃ 2 ⊙ A3 − A
̃1 ⊙ A
̃2 ⊙ A
̃ 3.
Proof of Lemma 10.4 is constructive and outlines the way to an efficient imple-
mentation of (10.11), (10.12). Some numerical results on the performance of the corre-
sponding black-box algorithm are shown in Sections 10.3.3 and 11.4.
The RHOSVD factorization (10.11), (10.12) is reminiscent of the exact Galerkin rep-
resentation (10.6) in the right redundancy free basis, whereas matrices Mk(ℓ) play the
role of “directional” Galerkin projections of the Newton kernel onto the left redun-
dancy free basis. This factorization can be applied directly to fast calculation of the
reduced Cholesky decomposition of the matrix B considered in the next section.
Finally, we point out that our RHOSVD-type factorization can be viewed as the
algebraic tensor-structured counterpart of the density fitting scheme commonly used in
quantum chemistry [3, 217, 237]. We notice that in our approach the “1D density fitting”
is implemented independently for each space dimension, reducing the ε-ranks of the
dominating directional bases to the lowest possible value. The robust error control
in the proposed basis optimization approach is based on purely algebraic SVD-like
procedure that allows eliminating the redundancy in the product basis set up to given
precision ε > 0.
Further storage reduction can be achieved by the quantized-TT (QTT) approxima-
tion of the column vectors in U (ℓ) and V (ℓ) in (10.12). Specifically, the required storage
amounts to O((RG + RN ) log n) reals.
In some cases the representation (10.11) may provide the direct low-rank decom-
position of the matrix B. In fact, suppose that Rℓ ≤ Cℓ |log ε|Norb with constants Cℓ ≤ 1,
ℓ = 1, 2, 3. Then the ε-rank of the matrix B is bounded by
3
rank(Bε ) ≤ min{Nb2 , RN |log ε|3 Norb
3
∏ Cℓ }. (10.18)
ℓ=1
Indeed, in accordance to [157], we have the rank estimate rank(Bε ) ≤ min{Nb2 , RN ∏3ℓ=1 Rℓ },
which proves the statement.
Rank estimate (10.13) outlines the way to efficient implementation of (10.11),
(10.12). Here, the algebraically optimized directional separation ranks Rℓ , ℓ = 1, 2, 3,
are only determined by the entanglement properties of a molecule, whereas the num-
bers Nb2 − Rℓ indicate the measure of redundancy in the product basis set. Normally,
we have Rℓ ≪ n and Rℓ ≤ Nb , ℓ = 1, 2, 3. The asymptotic bound Rℓ ≤ Cℓ |log ε|Norb
can be seen in Figure 10.1. One can observe that in the case of glycine molecule, the
first mode-rank is much smaller than others, indicating the flattened shape of the
molecule. However, the a priori rank estimate (10.13) looks too pessimistic compared
to the results of numerical experiments, though in the case of flattened or extended
molecules (some of directional ranks are small), this estimate provides much lower
bound.
152 | 10 Tensor factorization of grid-based two-electron integrals
The Hartree–Fock calculations for the moderate size molecules are usually based on
the incomplete Cholesky decomposition [303, 130, 17] applied to the symmetric and
positive definite TEI matrix B,
2
B ≈ LLT , L ∈ ℝNb ×RB , (10.19)
where the separation rank RB ≪ Nb2 is of order O(Nb ). This decomposition can be ef-
ficiently computed by using the precomputed (off-line step) factorization of B as in
(10.11), which requires only a small number of adaptively chosen column vectors in B,
[157]. The detailed computational scheme is presented in Algorithm 3.
In this section, we describe the economical computational scheme introduced
in [157, 150], providing the O(Nb )-rank truncated Cholesky factorization of the TEI
matrix B with complexity O(Nb3 ). This approach requires only computation of the se-
lected columns in B, without the need to compute the whole TEI matrix. The Cholesky
scheme requires only O(Nb ) adaptively chosen columns in B, calculated on-line using
the results of redundancy-free factorization (10.11).
2
Further the complexity can be reduced to O(Norb Nb ) using the quantized repre-
sentation of the Cholesky vectors.
We denote the long indexes in the N × N (N = Nb2 ) matrix unfolding B by
Lemma 10.5 ([157]). The unfolding matrix B is symmetric and positive semidefinite.
Proof. The symmetry is enforced by the definition (see Lemma 10.1). The positive
semi-definiteness follows from the observation that the matrix B can be viewed as the
Galerkin matrix ⟨−Δ−1 ui , uj ⟩, i, j ∈ IN , in the finite product basis set {ui } = {gμ gν }, where
Δ−1 is the inverse of the self-adjoint and positive definite in H 1 (ℝ3 ) Laplacian operator
subject to the homogeneous Dirichlet boundary conditions as x → ∞.
T
B − LL ≤ Cε, L ∈ ℝN×RB .
Based on the previous observation, we will postulate rather general ε-rank estimate
(in electronic structure calculations this conventional fact traces back to [17]); see nu-
merics on Figure 10.3.
Remark 10.6. Given a fixed truncation error ε > 0, for the Gaussian-type AO basis
functions, we have RB = rank(LLT ) ≤ CNb , where the constant C > 0 is independent
of Nb .
10.3 Redundancy-free factorization of the TEI matrix B | 153
and
RN
T
B(i, i) = ∑ ⊙3ℓ=1 V (ℓ) (i, : )Mk(ℓ) V (ℓ) ( : , i) ,
k=1
Table 10.2: Average QTT ranks of the Cholesky vectors vs. Norb for some molecules.
Molecule HF H2 O NH3 H2 O2 N2 H4 C2 H5 OH
Norb 5 5 5 9 9 13
rQTT 12 13.6 15 21 24 37
kchol = rQTT /Norb 2.4 2.7 3 2.3 2.6 2.85
Hypothesis 10.7. The structural complexity of the Cholesky factor L of the matrix B in
the QTT representation is characterized by the rank parameter
Figure 10.4: (Left): Average QTT ranks of the column vectors in L, rQTT (L), and in the vectorized coeffi-
cient matrix, rQTT (C), for several compact molecules. The “constant” lines at the level 2.35–2.85 indi-
cate the corresponding ratios rQTT (L)/Norb and rQTT (C)/Norb for the respective molecule. (Right): QTT
ranks of skeleton vectors in factorization (10.11)–(10.12) for H2 O, N2 H4 , C2 H5 OH, C2 H5 NO2 (glycine),
C3 H7 NO2 (alanine) calculations, with Norb equal to 5, 9, 13, 20, and 24, respectively.
(see Table 10.1). In particular, the average QTT ranks of the reduced higher-order SVD
2
factors V (ℓ) ∈ ℝNb ×Rℓ in the rank factorization of the initial product bases tensors G(ℓ) ,
ℓ = 1, 2, 3, have almost the same rank scaling, rQTT (V (ℓ) ) ≤ 3Norb , as a factor kchol ≈ 3
in the Cholesky decomposition of the matrix B (see Table 10.1). Hence, the QTT repre-
sentation complexity for the factor V (ℓ) in (10.11) can be reduced to
2 1 2
10Norb RG ≈ N R .
10 b G
Figure 10.4 illustrates QTT-ranks behavior versus Norb for skeleton vectors in fac-
torization (10.11) for some compact molecules with different numbers of electron
orbitals Norb .
https://doi.org/10.1515/9783110365832-011
158 | 11 Fast grid-based Hartree–Fock solver by factorized TEI
i ∈ ℐ := {1, . . . , n}3 , with the mesh-size h = 2b/(n + 1),, see Figure 11.1. For the set of
“global” separable Galerkin basis functions {gk }1≤k≤Nb , k = 1, 2, . . . , Nb , we define ap-
proximating functions g k := I1 gk , k = 1, . . . , Nb , by linear tensor-product interpolation
via the set of product “local” basis functions {ξi } = ξi1 (x1 )ξi2 (x2 )ξi3 (x3 ), i ∈ ℐ , asso-
ciated with the respective grid-cells in ω3,n . The local basis functions are chosen as
piecewise linear (hat functions) for tensor calculation of the Laplace operator [156] or
piecewise constant for factorized calculations of two-electron integrals [157] and the
direct tensor calculation of the nuclear potential operator Vc [156]. Recall that the lin-
ear interpolant I1 = I1 × I1 × I1 is a product of 1D interpolation operators g (ℓ) k = I1 gk ,
(ℓ)
0 n
ℓ = 1, 2, 3, where I1 : C ([−b, b]) → Wh := span{ξi }i=1 is defined over the set of (piece-
wise linear or piecewise constant) local basis functions (I1 w)(xℓ ) := ∑Ni=1 w(xℓ,i )ξi (xℓ ),
xi ∈ ω3,N . This leads to the separable grid-based approximation of the initial basis
functions gk (x),
3 3 N
gk (x) ≈ g k (x) = ∏ g (ℓ) (ℓ)
k (xℓ ) = ∏ ∑ gk (xℓ,i )ξi (xℓ ) (11.1)
ℓ=1 ℓ=1 i=1
Gk = g(1)
k
⊗ g(2)
k
⊗ g(3)
k
, k = 1, . . . , Nb , (11.2)
the two-electron integrals by using piecewise constant basis functions, which can be
much smaller than the grid-size n required for calculation of both Ag and Vg , since J
and K are integral operators. Thus, the discretization step-size for the grid represen-
tation of the Galerkin basis is specified only by accuracy needs for the particular part
of the Fock operator of interest.
which can be computed by using simple multilinear algebra with rank-1 tensor Gk .
The exact Galerkin matrix Ag is approximated using (11.2) as in [156], Ag ≈ AG = {akm },
k, m = 1, . . . Nb , with
which should be calculated with large grid-size n that resolve the sharp Gaussian basis
functions.
To overcome the limitations caused by the large mode size n of the target tensors,
the QTT tensor format [167, 165] can be used for calculation of the Laplace part in the
Fock operator [147]. This allows calculation of the multidimensional functions and
operators in logarithmic complexity O(log n). For the Laplace operator
AΔ = Δ(1)
1 ⊗I
(2)
⊗ I (3) + I (1) ⊗ Δ(2)
1 ⊗I
(3)
+ I (1) ⊗ I (2) ⊗ Δ(3)
1 , (11.4)
I 0 I
ΔTT = [Δ1 I] ⊗b [ ] ⊗b [ ] , (11.5)
Δ1 I Δ1
where the sign ⊗b (sometimes also denoted by ⋈) means the matrix product of block
core matrices with blocks being multiplied by means of the tensor product. Suppose
that n = 2L . Then the quantized representation of Δ1 takes the form [142, 170]
⊗b (L−2)
I J J 2I − J − J
Δ1Q = [I (11.6)
[ ] [ ]
J J] ⊗b [ J ] ⊗b [ −J ] ,
[ J] [ −nJ ]
160 | 11 Fast grid-based Hartree–Fock solver by factorized TEI
where L is equal to the number of the virtual dimensions in the quantized format, and
1 0 0 1
I=( ), J=( ).
0 1 0 0
For the discretized representation (11.2) of basis functions, the entries of the matrix
AG = {akm }, k, m = 1, . . . Nb , are calculated as
where the matrix ΔQTT is obtained by plugging the QTT Laplace representation (11.6)
into (11.5), and a tensor Q(ℓ)
k
, ℓ = 1, 2, 3, is the quantized representation of a vector
n
g(ℓ)
k
∈ ℝ .
p 15 16 17 18 19 20
n3 = 23p 32 7673 65 5353 131 0713 262 1433 524 2873 1 048 5753
err(AG ) 0.0027 6.8 ⋅ 10−4 1.7 ⋅ 10−4 4.2 ⋅ 10−5 1.0 ⋅ 10−5 2.6 ⋅ 10−6
RE – 1.0 ⋅ 10−5 8.3 ⋅ 10−8 2.6 ⋅ 10−9 3.3 ⋅ 10−10 0
time (sec) 12.8 17.4 25.7 42.6 77 135
Table 11.1 demonstrates weak dependence of the calculation time on the size of the
3D Cartesian grid. In the case of water molecule, it shows the approximation error for
the Laplacian matrix err(AG ) = ‖AMolpro − AG ‖ represented in the discretized basis of
Nb = 41 Cartesian Gaussians, where AMolpro is the result of analytical computations
with the same Gaussian basis from MOLPRO program [299]. Time is given for MATLAB
implementation. The line “RE” in Table 11.1 represents the approximation error for
the discrete Laplacian AG obtained by the Richardson extrapolation on two adjacent
grids, where the grid-size is given by n = 2p , p = 1, . . . , 20. The QTT ranks of the canon-
ical vectors g(ℓ)
k
are bounded by several ones. The approximation order O(h2 ) can be
observed.
M0
Zν
Vc (x) = ∑ , Zν > 0, x, aν ∈ Ω ⊂ ℝ3 , (11.8)
ν=1 ‖x − aν ‖
11.4 Coulomb and exchange operators by factorized TEI | 161
R
̃ R = ∑ p(1) ⊗ p(2) ⊗ p(3) ∈ ℝ2n×2n×2n
P (11.9)
q q q
q=1
M0 R
n×n×n
Pc = ∑ Zν ∑ 𝒲ν(1) p(1) (2) (2) (3) (3)
q ⊗ 𝒲ν pq ⊗ 𝒲ν pq ∈ ℝ . (11.10)
ν=1 q=1
Then for a given tensor representation of the basis function as a rank-1 canonical ten-
sor (11.2), the sum Vc (x) of potentials in a box as in (11.8) is represented in a given
basis set by a matrix Vg ≈ VG = {vkm } ∈ ℝNb ×Nb whose entries are calculated by simple
tensor operations [156, 147]:
Note that for the grid-based representation of the core potential, Vc (x), Pc , the
univariate grid size n can be noticeably smaller than the size of the grid used for the
piecewise linear discretization for the Laplace operator.
Nb
J(D)μν = ∑ bμν,κλ Dκλ . (11.12)
κ,λ=1
Vectorizing matrices J = vec(J) and D = vec(D) and taking into account the rank struc-
ture in TEI matrix B, we arrive at the simple matrix representation for the Coulomb
matrix
which diminishes the advantages of the low-rank structure in the matrix B. Introduc-
ing the permuted tensor B
̃ = permute(B, [2, 3, 1, 4]) and the respective unfolding matrix
B
̃ = mat(B),
̃ we then obtain
vec(K) = K = BD.
̃ (11.15)
The direct calculation by (11.15) amounts to O(RB Nb3 ) operations. However, using
the rank-Norb decomposition of the density matrix D = 2CC T reduces the cost to
O(RB Norb Nb2 ) via the representation
Norb T
K(D)μν = − ∑ (∑ Lμλ Cλi )(∑ Lκν Cκi ) ,
i=1
where Lμν = reshape(L, [Nb , Nb , RB ]) ∈ ℝNb ×Nb ×RB is the Nb × Nb × RB -folding of the
Cholesky factor L.
Figure 11.2: Approximation accuracy for the Coulomb matrix of glycine molecule using TEI computed
on the grid with n3 = 32 7683 (left) and n3 = 65 5363 (right).
Figure 11.2 presents the error in computation of the Coulomb matrix for glycine amino
acid (Nb = 170) using TEI computed on the grids n3 = 32 7683 (left) and n3 = 65 5363
(right). The numerical error scales quadratically in the grid size O(h2 ) and can be im-
proved to O(h3 ) by the Richardson extrapolation. The observed decay ratio 1 : 4 in-
dicates the applicability of the Richardson extrapolation to the results on a pair of
11.5 Algorithm of the black-box HF solver | 163
Figure 11.3: Left: the error in density matrix for the amino acid alanine (Nb = 210) for the TEI com-
puted with n3 = 131 0723 . Right: the error in exchange matrix for H2 O2 (Nb = 68) computed by TEI
using the grid of size n3 = 131 0723 .
diadically refined grids. Figure 11.3 (left) demonstrates the error in computation of the
density matrix of alanine molecule (Nb = 210) using TEI computed on the grid with
n3 = 131 0723 . Figure 11.3 (right) displays the error in exchange matrix computation for
the H2 O2 molecule (Nb = 68) using TEI with n3 = 131 0723 .
with the overlap matrix S for the chosen Galerkin basis (7.10) and the Fock operator
where the matrices J(C) and K(C) depend on the solution matrix C. To solve the eigen-
value problem (11.16), we start self-consistent field iteration with F(C) = H and with
zero matrices for both Coulomb J and exchange K operators. In the course of SCF iter-
ation, we control the residual, computing the maximum-norm of the difference in the
virtual part of the eigenvectors from two consequent iterations
Iteration may be terminated when this value becomes smaller than a given ε-threshold
or the number of iterations may be predefined. Since iteration times are negligibly
small, we usually use a predefined number of iterations.
The first step is defining the global Galerkin basis. In what follows, for compari-
son with MOLPRO output, we discretize the rank-1 basis functions given as a product
of polynomials with Gaussians. We choose in advance the appropriate grid sizes ac-
cording to the desired accuracy of calculations. In general, one can set an nx × ny × nz
3D Cartesian grid, but in our current calculations, we use a cubic box with equal sizes
n in every space variable. As it was already noted, the univariate grid-size n of the
n × n × n 3D Cartesian grid can be chosen differently in calculation of the discretized
Laplacian, the nuclear potential operator, and the two-electron integrals tensor. Us-
ing finer (larger) grids need more CPU time. Therefore, there is a playoff between the
required accuracy and computational cost.
Given the coordinates of nuclei and the Galerkin basis, the black-box HF solver
performs the following computation steps.
(1) Choose the grid size n and the ε-threshold for rank truncation. Set up the grid
representation of the basis functions.
(2) Compute the nuclear energy shift Enuc , by (7.17).
(3) Compute the core Hamiltonian H by the three-dimensional grid-based calculation
of the Galerkin matrix AG for the Laplacian by (11.3) or (11.7) and for the nuclear
potential operator VG by (11.11).
(4) Using grid-based “1D density fitting”, compute the factorized TEI matrix in a form
of low-rank Cholesky decomposition B = LLT by (10.11), (10.12).
(5) Set up the input data for SCF iteration:
– threshold ε for the residual (alternatively, a maximal number of iterations);
– number Mopt specifying the design of DIIS scheme [238];
– define initial Coulomb and exchange matrices as J = 0 and K = 0.
(6) Start the SCF iteration for solving nonlinear eigenvalue problem:
– solve the linear spectral problem (11.16) with the current Fock matrix
1
F = AG − VG + J − K;
2
– update the residual (11.18) (difference in the virtual parts of the eigenvectors);
– update matrices J(C) and K(C) by computing (11.12) and (11.14);
– compute the ground-state energy E0,it at current iteration.
When the residual arrives at the given ε (or when the maximal iteration number
is reached), iteration is terminated.
(7) Compute the ground-state energy E0,n .
(8) Calculate the MP2 corrections by factorizations introduced in [150]; see Sec-
tion 11.8.
11.6 Ab initio ground state energy calculations for compact molecules | 165
Figure 11.4: The largest molecules considered for numerical examples (below): amino acids glycine
C2 H5 NO2 (left) and alanine C3 H7 NO2 (right). The ball-stick picture of molecules is generated by the
MOLDEN program [258].
Table 11.2: Times for one SCF iteration in the tensor-based Hartree–Fock solver (step 6) in MATLAB
implementation.
For small and moderate size molecules the solver in MATLAB works in one run from
the first step to the end of SCF iteration using 3D Cartesian grids for TEI calculations up
to n3 = 131 0723 . The total computation time usually does not exceed several minutes,
see Table 11.2 illustrating times for one SCF iteration by fast TESC Hartree-Fock solver
in MATLAB implementation.
For larger molecules (amino acids, see Figure 11.4), accurate calculations with the
grids exceeding n3 = 65 5363 need an off-line precomputing of TEI, which requires less
than one hour of Matlab calculations. CPU time for TEI calculations depends mostly
on the number of basis functions rather than on the size of the grid. The grid size is
mainly limited by the available storage of the computer: storage demand for the first
2
step in TEI calculations (factorization of the side matrices G(ℓ) ∈ ℝn×Nb , ℓ = 1, 2, 3),
is estimated by O(3nNb2 ), whereas for the second step of TEI calculations (Cholesky
decomposition of the TEI matrix B), it is bounded by O(Nb3 ).
the core Hamiltonian calculations finer grids are required with the mesh size about
h = 3.5 ⋅ 10−5 au (∼1.8 ⋅ 10−5 Å). These corresponds to large 3D Cartesian grids of size
n3 = 65 5353 and n3 = 1 048 5763 entries, correspondingly. In the following examples,
we present calculations of the ground-state energy for several compact molecules.
Figure 11.5 shows convergence of the SCF iterations for glycine (Nb = 170) amino
acid (left) and water (Nb = 41) molecule (right) using the factorized representation of
TEI precomputed with n3 = 131 0723 . The black line shows convergence of the resid-
ual computed as the maximum-norm of the difference of the eigenvectors from two
consequent iterations ‖C(1, : )it−1 − C(1, : )it ‖∞ . The green line presents the difference
between the lowest eigenvalue computed by the grid-based solver and the respective
eigenvalue from MOLPRO calculations, Δλ1,it = |λ1,Molpro − λ1,it |. The red line is the dif-
ference in ground-state energy with the MOLPRO results, ΔE0,it = |E0,Molpro − E0,it |.
Figures 11.6–11.8 demonstrate the convergence of the ground-state energy versus
self-consistent field iteration for glycine amino acids (Nb = 170), NH3 (Nb = 48) and
water (Nb = 41) molecules. Left figures show convergence history over 70 iterations;
right figures show the zoom of last 30 iterations. The black line corresponds to E0,Molpro
computed by MOLPRO for the same Gaussian basis.
Figure 11.9 presents the output of the solver for alanine molecule. Figure 11.10
presents the last 30 + k iterations on convergence of the ground-state energy for H2 O2
molecule. The red, green and blue lines correspond to grid sizes n3 = 32 7683 , 65 5363 ,
and 131 0723 , correspondingly.
Table 11.3: Glycine, basis of 170 Gaussians (cc-pVDZ): error in ground-state energy versus the mesh
size h. MOLPRO result is E0,Molpro = −282.8651.
p 13 15 16 17
n3 = 23p 81923 32 7673 65 5353 131 0723
h 0.0039 9.7 ⋅ 10−4 4.9 ⋅ 10−4 2.5 ⋅ 10−4
E0,n −282.8679 −282.8655 −282.8654 −282.8653
er(E0 ) 0.0024 3.5 ⋅ 10−4 2.2 ⋅ 10−4 2.2 ⋅ 10−4
11.6 Ab initio ground state energy calculations for compact molecules | 167
Figure 11.6: Convergence of the ground-state energy for the glycine molecule (left), with the grid size
for TEI calculation, n⊗3 = 131 0723 ; zoom for last 30 iterations (right).
Figure 11.7: Convergence of the ground-state energy for the NH3 molecule (left), with TEI grid size
n⊗3 = 131 0723 ; zoom of the last 30 iterations (right).
Figure 11.8: Convergence of the ground-state energy for the H2 O molecule (left), with the TEI grid size
n⊗3 = 131 0723 ; zoom of the last 30 iterations (right).
168 | 11 Fast grid-based Hartree–Fock solver by factorized TEI
Figure 11.9: Left: SCF iteration for alanine molecule (Nb = 211) with TEI computed on the grid
n⊗3 = 32 7683 . Right: convergence of E0,it at last 30 iterations.
Table 11.3 presents the error in the ground-state energy for glycine molecule er(E0 ) =
E0,n − E0,Molpro versus the mesh size of the grid for calculating TEI tensor, n. Notice that
the absolute error of calculations with grid-based TEI changes only mildly for grids
with size n ≥ 65 5353 , remaining at the level of about 10−4 hartree. This corresponds
to the relative error of the order of 10−7 hartree. Figure 11.11 demonstrates the absolute
error in the density matrix for some molecules.
Figure 11.11: Absolute error of the density matrix for NH3 molecule (left) and alanine amino acid
(right) compared with MOLPRO output.
Several basis functions (e. g., Gaussians) taken for a single atom as the “initialization
basis” are duplicated for the lattice atoms, thus, creating the basis set for the whole
molecular system. For model problems, we construct artificial structures using the Hy-
drogen atoms, for example, in a form of the 4 × 4 × 2 lattice, using Hydrogen molecule
H2 as the “initiating” building block, with the distance between atoms 1.5 Å. Then for
a lattice system as described above, one can apply the fast Hartree-Fock solver. Fig-
ure 11.13 shows the slice of the nuclear potential calculated for the slab with 4 × 4 × 2
Hydrogen atoms. Figure 11.14 shows the output of the Hartree–Fock eigenvalue prob-
lem solver for a cluster of 4 × 4 × 2 Hydrogen atoms. The left figure shows the conver-
gence of the ground-state energy, and the right one demonstrates the lower part of the
spectrum {λμ }, μ = 1, . . . , Nb , where every line corresponds to one λμ .
Tensor Hartree–Fock calculations do not have special requirements on the posi-
tions of nuclei on the 3D grid; the nuclei in the investigated molecular systems may
have an arbitrary position in (x, y, z)-coordinates in the computational box.
Solving the ab initio Hartree–Fock problem for larger clusters of Hydrogen-like
atoms by using block circulant and Toeplitz structures in the framework of the lin-
earized Fock operator is considered in [151, 154]. The reformulation of the nonlinear
Hartree–Fock equation for periodic molecular systems, based on the Bloch theory [37],
has been addressed in the literature for more than forty years ago, and nowadays there
are several implementations mostly relying on the analytic treatment of arising in-
tegral operators [72, 235, 88]. Mathematical analysis of spectral problems for PDEs
170 | 11 Fast grid-based Hartree–Fock solver by factorized TEI
Figure 11.13: Left: cross-section of the nuclear potential for the 8 × 4 × 1 cluster of H atoms. Right:
convergence of the residual in SCF iteration.
Figure 11.14: Convergence of the ground-state energy for the 4 × 4 × 2 cluster of H atoms (left) and
a part of its spectrum (right).
with the periodic-type coefficients was an attractive topic in the recent decade; see
[46, 47, 45, 77] and the references therein.
In [154], the new grid-based tensor approach to approximate solution of the el-
liptic eigenvalue problem for the 3D lattice-structured systems is introduced and an-
alyzed, where the linearized Hartree–Fock equation is considered over a spatial L1 ×
L2 × L3 lattice for both periodic and non-periodic problem settings, was discretized in
the localized Gaussian-type orbitals basis. In the periodic case, the Galerkin system
matrix obeys a three-level block-circulant structure that allows the FFT-based diago-
nalization, whereas for the finite extended systems in a box (Dirichlet boundary con-
ditions) this matrix allows the perturbed block-Toeplitz representation providing fast
matrix-vector multiplication and low storage size.
The above mentioned grid-based tensor techniques manifest the twofold benefits:
(a) the entries of the Fock matrix are computed by 1D operations using low-rank ten-
sors represented on a 3D grid; (b) in the periodic case, the low-rank tensor structure in
11.8 MP2 calculations by factorized TEI | 171
the diagonal blocks of the Fock matrix in the Fourier space reduces the conventional
3D FFT to the product of 1D FFTs.
Lattice-type systems in a box with Dirichlet boundary conditions are treated nu-
merically by the tensor solver as for single molecules, which makes possible calcula-
tions on rather large L1 ×L2 ×L3 lattices due to reduced numerical cost for 3D problems.
The numerical simulations for both box-type and periodic L × 1 × 1 lattice chains in a
3D rectangular “tube” with L up to several hundred confirm the theoretical complexity
bounds for the block-structured eigenvalue solvers at the limit of large L, see [154].
reduce the storage consumption and CPU times by a factor of about ≃10 in both TEI
and MP2 calculations.
The efficiency of MP2 energy correction algorithm was tested in [150] for some
compact molecules, including glycine and alanine amino acids. Due to factorized ten-
sor representations of the involved multidimensional data arrays, the MP2 calculation
times turned out to be rather moderate compared to those for TEI tensor, ranging from
one second for water molecule to approximately 4 minutes for glycine molecule. The
numerical accuracy is controlled by the given threshold ε > 0 due to stable tensor-rank
reduction algorithms.
In what follows, we describe the main ingredients of the computational scheme in-
troduced in [150], which reduces the cost by using low-rank tensor decompositions of
arising multidimensional data arrays.
Let C = {Cμi } ∈ ℝNb ×Nb be the coefficient matrix representing the Hartree–Fock
molecular orbitals (MO) in the atomic orbitals (AO) basis set {gμ }1≤μ≤Nb (obtained in
the Hartree–Fock calculations). First, one has to transform the TEI tensor B = [bμνλσ ]
computed in the initial AO basis set to that represented in the MO basis
Nb
B → V = [viajb ] : viajb = ∑ Cμi Cνa Cλj Cσb bμνλσ , a, b ∈ Ivir , i, j ∈ Iocc , (11.19)
μ,ν,λ,σ=1
where Iocc := {1, . . . , Norb }, Ivir := {Norb + 1, . . . , Nb }, with Norb denoting the number of
occupied orbitals. In what follows, we shall use the notation
⊗4
ℐ := (Ivir × Iocc ) × (Ivir × Iocc ) ⊂ Ib .
for i ∈ Iocc . The latter conditions (nonzero homo lumo gap) will be assumed in the
following.
Introduce the so-called doubles amplitude tensor T,
(2viajb − vibja )
T = [tiajb ] : tiajb = , a, b ∈ Ivir ; i, j ∈ Iocc ,
εa + εb − εi − εj
then the MP2 perturbation takes the form of a scalar product of rank-structured ten-
sors:
where the summation is restricted to the subset of indices ℐ , and 1 denotes the rank-1
all-ones tensor. Define the reciprocal “energy“ tensor
1
E = [eabij ] := [ ], a, b ∈ Ivir ; i, j ∈ Iocc , (11.21)
εa + εb − εi − εj
V = [viajb
] := [vibja ].
Now the doubles amplitudes tensor T will be further decomposed into the sum
where each term in the right-hand side above will be treated separately.
In this section, we show that the rank RB = O(Nb ) approximation to the symmetric TEI
matrix B ≈ LLT with the Cholesky factor L ∈ ℝN×RB leads to the low-rank representa-
tion of the tensor V and the RB -term decomposition of T. This reduces the asymptotic
complexity of MP2 calculations to O(Nb3 Norb ) and also provides certain computational
benefits. In particular, it reduces the storage costs.
Lemma 11.1 ([150]). Given the rank-RB Cholesky decomposition of the matrix B, the ma-
trix unfolding V = [via;jb ] allows a rank decomposition with rank ≤ RB . Moreover, the
tensor V = [vibja ] enables an RB -term decomposition of mixed form.
where
Nb
viajb = ∑ Cμi Cνa Cλj Cσb bμνλσ
μ,ν,λ,σ=1
RB Nb
≈∑ ∑ Cμi Cνa Cλj Cσb Lk (μ; ν)Lk (σ; λ)
k=1 μ,ν,λ,σ=1
RB Nb Nb
= ∑ ( ∑ Cμi Cνa Lk (μ; ν))( ∑ Cλj Cσb Lk (σ; λ))
k=1 μ,ν=1 λ,σ=1
RB
= ∑ (CiT Lk Ca )(CbT LTk Cj ). (11.23)
k=1
This proves the first statement. Furthermore, the partly transposed tensor V := [vibja ]
allows an RB -term decomposition derived similarly to (11.23):
RB
viajb = vibja = ∑ (CiT Lk Cb )(CaT LTk Cj ). (11.24)
k=1
Lemma 11.2 ([150]). Suppose that the so-called homo lumo gap is estimated by
δ
min |εa − εi | ≥ > 0.
a∈Ivir ,i∈Iocc 2
11.8 MP2 calculations by factorized TEI | 175
M
ea,b,i,j ≈ ∑ cp e−αp (εa +εb −εi −εj ) , αp > 0, (11.25)
p=−M
‖E − ERE ‖F ≤ O(ε).
M
1 ∞
= ∫ e−t(x1 +x2 +x3 +x4 ) dt ≈ ∑ cp e−αp (x1 +x2 +x3 +x4 )
x1 + x2 + x3 + x4 0 p=−M
for xi ≥ 0 such that ∑ xi > δ, which converges exponentially in M (see [111, 93]). This
proves the statement.
Notice that the matrix V exhibits an exponential decay in the singular values (ob-
served in numerical experiments; see Figure 11.15), which means that the approxima-
tion error ε > 0 can be achieved with the separation rank RV = O(|log ε|). Figure 11.15
illustrates the exponential convergence in the rank parameter for the low-rank approx-
imation of matrices V and E = [eab;ij ].
Figure 11.15: Singular values of the matrix unfolding V (left) and E (right) for some compact
molecules, including the aminoacids glycine (C2 H5 NO2 ) and alanine (C3 H7 NO2 ). The numbers in
brackets indicate the size of a matrix, that is, Norb Nvirt , for the corresponding molecule.
176 | 11 Fast grid-based Hartree–Fock solver by factorized TEI
Lemmas 11.1 and 11.2 result in the following complexity bound: The Hadamard product
V ⊙ T and the resultant functional EMP2 can be evaluated at the expense
Indeed, the first term in the splitting T = T(1) + T(2) is represented by rank-structured
tensor operations
T(1) = 2V ⊙ E = 2[tiajb
(1)
],
where
RE RB
(1)
tiajb = ∑ cp ∑ (eαp εi CiT Lk e−αp εa Ca )(e−αp εb CbT LTk eαp εj Cj ) (11.26)
p=1 k=1
and Lk = Lk ( : , : ) stands for the Nb × Nb matrix unfolding of the Cholesky vector L(:, k).
Then the numerical complexity of this rank-(RE RB ) separable approximation is esti-
mated via the multiple of RE with the corresponding cost for the treatment of the ten-
sor V, that is, O(RE RB Nocc Nvir ). Furthermore, the RB -term decomposition of V := [vibja ]
(see (11.24)) again leads to the summation over (RE RB )-term representation of the sec-
ond term in the splitting of T,
T(2) = [tiajb
(2)
] = V ⊙ E,
where
RE RB
(2)
tiajb = ∑ cp ∑ (eαp εi CiT Lk e−αp εa Cb )(e−αp εb CaT LTk eαp εj Cj ). (11.27)
p=1 k=1
Table 11.4: MP2 correction to the ground-state energy (in hartree) for some compact molecules,
including aminoacids glycine (C2 H5 NO2 ) and alanine (C3 H7 NO2 ).
Table 11.4 presents the effect of MP2 correction for several compact molecules. In most
cases, this correction amounts to about 0.4 % of the total energy.
The tensor-structured factorization of the TEI matrix B makes it possible to reduce
the overall cost of MP2 calculations to O(Nb2 Nvir Norb ) by using the QTT approximation
of the long column vectors in the Cholesky factor L. Figure 10.4 (left) indicates that
the average QTT ranks of columns vectors in the Cholesky factor and of the vectorized
density matrix C ∈ ℝNb ×Nb remains almost the same (they depend only on the entan-
glement properties of a molecule), and they can be estimated by
This hidden structural property implies that the computation and storage cost for the
matrix V = LV LTV involved in Algorithm 4 (the most expensive part in the MP2 calcula-
2
tion) can be reduced to O(Norb ) at the main step in (11.23), that is, computing CiT Lk Ca
2
instead of Nb , thus indicating the reduced redundancy in the AO basis in the case of
compact molecules. Since the QTT rank enters quadratically the storage cost for QTT
vectors, we conclude that
(3Norb )2 ≤ CNb2 ,
where the constant C is estimated by C ≈ 0.1, taking into account that the typical
relation Nb ≈ 10 ⋅ Norb holds in the case of Gaussian-type basis sets.
178 | 11 Fast grid-based Hartree–Fock solver by factorized TEI
Further reduction of the numerical complexity can be based on taking into ac-
count the more specific properties of the matrix unfolding V when using a physical
insight to the problem (say, flat or extended molecules, multiple symmetries, lattice
type or periodic structures, accounting data sparsity, etc.).
Other methods for high-accuracy energy calculations are based on coupled clus-
ters technique, which requires much larger computations resources; see, for example,
[260, 13, 249].
12 Calculation of excitation energies of molecules
12.1 Numerical solution of the Bethe–Salpeter equation
Recently, computation of excitation energies and absorption spectra for molecules
and surfaces of solids attracted much interest due to the related promising applica-
tions, in particular, in the development of sustainable energy technologies. The tradi-
tional methods for computer simulation of excitation energies for molecular systems
require large computational facilities. Therefore, there is a steady need for new al-
gorithmic approaches for calculating the absorption spectra of molecules with less
computational cost and having a good potential for application to larger systems. The
tensor-based approach seems to present a good alternative to conventional methods.
One of the well established ab initio methods for computation of excited states is
based on the solution of the Bethe–Salpeter equation (BSE) [252, 126], which in turn
is based on the Green’s function formalism and many-body perturbation theory, pro-
viding calculation of the excitation energies in a self-consistent way [224, 259, 194,
245]. The BSE method leads to a challenging computational task of solving a large
eigenvalue problem for a fully populated (dense) matrix, which, in general, is non-
symmetric. Another commonly used approach for computation of the excitation ener-
gies is based on time-dependent density functional theory (TDDFT) [251, 107, 51, 274,
56, 248].
The size of the BSE matrix scales quadratically 𝒪(Nb2 ) in the size Nb of the atomic
orbitals basis sets commonly used in ab initio electronic structure calculations. The
direct diagonalization of 𝒪(Nb6 )-complexity becomes prohibitive even for moderate
size molecules with size of the atomic orbitals basis set Nb ≈ 100. Therefore, an ap-
proximation that relies entirely on multiplications of the governing BSE matrix, or its
approximation with vectors in the framework of some iterative procedure, is the only
feasible strategy. In turn, fast matrix–vector computations can be based on the use of
low-rank matrix representations since such data structures allow efficient storage and
basic linear algebra operations with linear complexity scaling in the matrix size.
An efficient method was introduced in [23] for approximate numerical solution
of the BSE eigenvalue problem by using the low-rank approximation, which leads to
relaxation of the numerical costs from O(N 6 ) down to O(N 2 ). It is based on the con-
struction of a simplified problem by a diagonal plus rank-structured representation of
a system matrix so that the related spectral problem can be solved iteratively. Then
model reduction via the projection onto a reduced basis is constructed by using the
representative set of eigenvectors of a simplified system matrix. The further enhance-
ment based on the block-diagonal plus low-rank approximation to BSE matrix for ac-
curacy improvement was presented in [25].
The particular construction of the BSE system matrix in [23] is based on the
non-interacting Green’s function in terms of eigenfunctions and eigenvalues of the
https://doi.org/10.1515/9783110365832-012
180 | 12 Calculation of excitation energies of molecules
Hartree–Fock operator introduced in [243, 244], where it was applied to the simple
H2 molecule in the minimal basis of two Slater functions, and where the system ma-
trix entries are evaluated analytically. In [23] it was shown that this computational
scheme for solving the BSE becomes practically applicable to moderate size com-
pact molecules when using the tensor-structured Hartree–Fock calculations [147, 152]
yielding efficient representation of the two-electron integrals (TEI) in the molecular
orbitals basis in a form of a low-rank Cholesky factorization [157, 150].
The low-rank representation of TEI tensor stipulates the beneficial structure of
the BSE matrix blocks, thus enabling efficient numerical algorithms for solution of
large structured eigenvalue problems. The simplified block decomposition in the BSE
system matrix is characterized by the separation rank of order O(Nb ), which enables
compact storage and fast matrix–vector multiplications in the framework of iterations
on a subspace for computation of a few (lowest/largest) eigenvalues. To reduce the
error of the diagonal plus low-rank approximation, it was proposed in [25] to represent
the static screen interaction part in the BSE matrix by a small fully populated sub-
block with adaptively chosen size.
In [25] efficient iterative schemes are introduced for computing several tens of the
smallest in modulo eigenvalues for both the BSE problem and its Tamm–Dancoff ap-
proximation (TDA). The most efficient subspace iteration is based on application of the
matrix inverse, which for the considered matrix formats can be evaluated in a struc-
tural form by using the Sherman–Morrison–Woodbury formula [269]. The numerical
experiments show that our method is economical (at least up to small amino-acids),
where the numerical cost for computing several hundreds eigenvalues decreases by
the orders of magnitude. Usually, the smallest in modulo singular values of BSE prob-
lem are of most interest in applications.
2
B ≈ LLT , L ∈ ℝNb ×RB , RB = O(Nb ), (12.1)
V = [viajb ] : a, b ∈ ℐv , i, j ∈ ℐo , (12.3)
V
̂ = [v̂turs ] : r, s ∈ ℐv , t, u ∈ ℐo . (12.4)
In what follows, {Ci } and {Ca } denote the sets of occupied and virtual orbitals,
respectively. We shall also use the notation.
Denote the associated matrix by V = [via,jb ] ∈ ℝNov ×Nov in case (12.3), and similarly
2 2
̂ = [v̂tu,rs ] ∈ ℝNo ×Nv in case (12.4). The straightforward computation of the matrix
by V
V by the above representations accounts for the dominating impact on the overall
numerical cost of order O(Nb5 ) in the evaluation of the block entries in the BSE matrix.
Recall that the rank RB = O(Nb ) approximation to matrix B ≈ LLT with the N × RB
Cholesky factor L allows introducing the low-rank representation of the tensor V, and
then to reduce the asymptotic complexity of calculations to O(Nb4 ), [150]; see Sec-
tion 11.8, Lemma 11.1. A similar factorization can be derived in the case of (12.4).
The following statement is a slight modification of Lemma 11.1.
Lemma 12.1. Let the rank-RB Cholesky decomposition of the matrix B be given by (12.1).
Then the RB -term representation of the matrix V = [via;jb ] takes the form
Lemma 12.1 provides the upper bounds on rank(V) in the representation (12.5),
which might be reduced by the SVD based ε-rank truncation. It can be shown that
the ε-rank of the matrix V remains of the same magnitude as that for the TEI matrix B
obtained by its ε-rank truncated Cholesky factorization (see the numerical illustration
in Section 12.4).
Numerical tests in [150] (see also Sections 10 and 11.8) indicate that the singular
values of the TEI matrix B decay exponentially as
γ
−N k
σk ≤ Ce b , (12.6)
where the constant γ > 0 in the exponential depends weakly on the molecule config-
uration. If we define RB (ε) as the minimal number satisfying the condition
RB
∑ σk2 ≤ ε2 , (12.7)
k=RB (ε)+1
then estimate (12.6) leads to the ε-rank bound RB (ε) ≤ CNb |log ε|, which will be postu-
lated in the following.
Note that the matrix rank RV (ε) increases only logarithmically in ε, similarly to
the bound for RB (ε). This can be formulated as the following lemma (see [23]),
Lemma 12.2. For given ε > 0, there exist a rank-r approximation Vr of the matrix V and
a constant C > 0 not depending on ε such that rRV (ε) ≤ RB (ε) and
Δε = Io ⊗ diag{εa : a ∈ ℐv } − diag{εi : i ∈ ℐo } ⊗ Iv ,
12.3 Tensor factorization of the BSE matrix blocks | 183
where Io and Iv are the identity matrices on respective index sets. It is worth noting
that if the so-called homo lumo gap of the system is positive, i. e.,
εa − εi > δ > 0, a ∈ ℐv , i ∈ ℐo ,
where χ 0 (ω) is the matrix form of the so-called Lehmann representation to the re-
sponse function. In turn, the representation of the inverse matrix of χ 0 (ω) is known to
have a form
Δε 0 1 0
χ −1
0 (ω) = − ( ) + ω( ),
0 Δε 0 −1
implying
Δε−1 0
χ 0 (0) = − ( ).
0 Δε−1
Define the rank-1 matrix 1 ⊗ dε , where 1 ∈ ℝNov is the all-ones vector, and dε =
diag{Δε−1 } ∈ ℝNov is the diagonal vector of Δε−1 . In this notation, the matrix Z = [zpq,rs ]
takes a compact form
Z = Io ⊗ Iv + V ⊙ (1 ⋅ dTε ). (12.8)
Introducing the inverse matrix Z −1 , we finally define the so-called static screened in-
teraction matrix by
W = Z −1 V, provided that a, b ∈ ℐv , i, j ∈ ℐo ,
where V is calculated by (12.3). Lemma 12.1 suggests the existence of a low-rank fac-
torization for the matrix W defined above.
184 | 12 Calculation of excitation energies of molecules
Lemma 12.3 ([23]). Let the matrix Z defined by (12.8) over the index set a, b ∈ ℐv , i, j ∈ ℐo
be invertible. Then the rank of the respective matrix W = Z −1 V is bounded by
rank(W) ≤ rank(V) ≤ RB .
x A B x I 0 x
F ( n) ≡ ( ∗ ) ( n ) = ωn ( ) ( n) , (12.10)
yn B A∗ yn 0 −I yn
determining the excitation energies ωn and the respective excited states. Here, the ma-
trix blocks are defined in the index notation by (see (46a) and (46b) in [243] for more
detail)
A = Δε + V − W,
̂
B=V
̃−W
̃ = V − W,
̃
V
̃ = [ṽiajb ] := [viabj ] = [viajb ],
and hence it coincides with V in (12.3) due to the symmetry properties. Here, W ̃ =
[w
̃ia,jb ] = [wib,aj ] is defined by permutation. The ε-rank structure in the matrix blocks A
and B, resulting from the corresponding factorizations of V, has been analyzed in [23].
Solutions of equation (12.10) can be grouped in pairs: excitation energies ωn with
eigenvectors (xn , yn ) and de-excitation energies −ωn with eigenvectors (x∗n , y∗n ).
The block structure in the matrices A and B is inherited from the symmetry of the
TEI matrix V, via,jb = vai,bj∗
and the matrix W, wia,jb = wbj,ai
∗
. In particular, it is known
12.4 The reduced basis approach using low-rank approximations | 185
from the literature that the matrix A is Hermitian, and the matrix B is (complex) sym-
metric (since via,bj = vjb,ai and wib,aj = wja,bi ), which we presuppose in the matrix con-
struction. The literature concerning the discussion of skew-symmetric (Hamiltonian)
block structure in BSE matrix can be found in [23].
In the following discussion, we confine ourselves to the case of real spin orbitals;
that is, the matrices A and B remain real. The dimension of the matrix in (12.10) is
2No Nv × 2No Nv , where No and Nv denote the numbers of occupied and virtual orbitals,
respectively. In general, No Nv is asymptotically of size O(Nb2 ). That is, the spectral prob-
lem (12.10) may be computationally extensive. Indeed, the direct eigenvalue solver for
(12.10) via diagonalization becomes infeasible due to O(Nb6 ) complexity scaling. Fur-
thermore, the numerical cost for calculation of the matrix elements based on the pre-
2
computed TEI integrals from the Hartree–Fock equation scales as O(Nov ) = O(Nb4 ),
where the low-rank structure in the matrix V can be adapted.
The challenging computational tasks arise in the case of lattice-structured com-
pounds, where the number of basis functions increases proportionally to the lattice
size L × L × L, that is Nb ≈ Nb,0 L3 , which quickly leads to intractable problems even for
small lattices.
where the rank of the second summand does not exceed RV . Hence, the linear system
solver W = Z −1 V can be implemented by algorithms tailored to the DPLR structure by
adapting the Sherman–Morrison–Woodbury formula.
The computational cost for setting up the full BSE matrix F in (12.10) can be es-
2
timated by O(Nov ), which includes the cost O(Nov RB ) for generating the matrix V and
the dominating cost O(N 2 ) for setting up W.
ov
̂
We further rewrite the spectral problem (12.10) in the equivalent form
x A B x x
F1 ( n ) ≡ ( ∗ ) ( n ) = ωn ( n ) . (12.14)
yn −B −A∗ yn yn
A → A0 := Δε + V − W
̂r and B → B0 := V − W
̃r , (12.16)
respectively. Here, we assume that the matrix V is already represented in the low-rank
format in the form (12.13).
The modified auxiliary problem reads
un A B0 u u
F0 ( ) ≡ ( 0∗ ) ( n ) = λn ( n ) . (12.17)
vn −B0 −A∗0 vn vn
This structured eigenvalue problem is much simpler than (12.10) since the matrix
blocks A0 and B0 , defined in (12.16), are composed of diagonal and low-rank matrices.
Figures 12.1 and 12.2 illustrate the structure of A0 and B0 submatrices in a BSE system
matrix.
Given the set of m0 eigenpairs
computed for the modified (simplified) problem (12.17), we solve the full eigenvalue
problem for the reduced matrix obtained by the Galerkin projection of the initial equa-
tion onto the problem-adapted small basis set {ψn } of size m0 , {ψn } ∈ ℝ2Nov ×1 , n =
1, . . . , m0 . Here, the quantities λn represent the closest to zero eigenvalues of F0 .
12.4 The reduced basis approach using low-rank approximations | 187
Figure 12.1: The diagonal plus low-rank structure of A0 block in the modified BSE system matrix.
Figure 12.2: The low-rank structure of the block B0 in the modified BSE matrix.
Define a matrix
whose columns are computed by the vectors in the reduced basis, and then compute
the stiffness and mass matrices by projection of the initial BSE matrix F1 onto the
reduced basis specified by the columns in G1 ,
M1 y = γn S1 y, y ∈ ℝm0 . (12.18)
A0 u = λn u, (12.19)
Table 12.1: The error |γ1 − ω1 | vs. the size of reduced basis, m0 .
m0 5 10 20 30 40 50
Table 12.2: Accuracy (in eV) for the first eigenvalue, |γ1 − ω1 |, vs. ε-ranks for V , W
̂, and W
̃.
Matrix blocks in the auxiliary equation (12.17) are obtained by rather rough ε-rank ap-
proximation to the initial system matrix. However, we observe much smaller approxi-
mations error γn − ωn for solving the projected reduced basis system (12.18) compared
with that for auxiliary equation (12.17); see Figures 12.3 and 12.4.
Numerical tests indicate that the difference γn − ωn behaves merely quadratically
in the rank truncation parameter ε; see [23] for a more detailed discussion.
In the case of a symmetric matrix, the above-mentioned effect of “quadratic” con-
vergence rate can be justified by a well-known property of the quadratic error behavior
in the approximate eigenvalue, computed by the Rayleigh quotient with respect to the
perturbed eigenvector (vectors of the reduced basis ψn in our construction), compared
with the perturbation error in the eigenvector, which is of order O(ε). This beneficial
property may explain the efficiency of the reduced basis approach in this particular
application.
In the BSE formulation based on the Hartree–Fock molecular orbitals basis, we
may have a slight perturbation of the symmetry in the matrix block W; ̂ that is, the
12.4 The reduced basis approach using low-rank approximations | 189
Figure 12.3: Comparison of m0 = 30 lower eigenvalues for the reduced and exact BSE systems vs. ε
in the case of Glycine amino acid.
Figure 12.4: Comparison of m0 = 30 lower eigenvalues for the reduced and exact BSE systems for
H2 O molecule: ε = 0.6, left; ε = 0.1, right.
above argument does not apply directly. However, we observe the same quadratic er-
ror decay in all numerical experiments implemented so far. It is also worth noting that
due to the symmetry features of the eigenproblem, the approximation computed by
the reduced basis approach is always an upper bound of the true excitation energies
obtained from the full BSE model. Again, this is a simple consequence of the varia-
tional properties of the Ritz values being upper bounds on the smaller eigenvalues for
symmetric matrices. The “upper bound” character is also clearly visible in Figures 12.3
and 12.4.
Table 12.2 shows numerics for molecular systems H2 O (360 × 360), N2 H4 (1430 ×
1430), and C2 H5 OH (2860 × 2860), where the BSE matrix size is given in brackets. It
demonstrates the quadratic decay of the error |γ1 − ω1 | in the lowest excitation energy
with respect to the approximation error |λ1 −ω1 | for the modified auxiliary BSE problem
190 | 12 Calculation of excitation energies of molecules
(12.17). The error is controlled by the tolerance ε > 0 in the rank truncation procedure
applied to the BSE submatrices V, W, ̂ and W;̃ see [23] for the detailed discussion.
Figure 12.5: Visualizing the first m0 BSE eigenvectors for the H32 chain with NW = 554 (left) and
Glycine amino acid molecule with NW = 880 (right).
NW ≈ CW √2 RV Nov , (12.20)
where the constant CW is close to 1. The approximation error introduced due to the
corresponding matrix truncation can be controlled by the choice of the constant CW .
12.5 Approximating the screened interaction matrix in a reduced-block format | 191
A → A
̂ := Δε + V − W
̂N ,
W
(12.22)
whereas the modified block B0 remains the same as in (12.16). The corresponding
structure of the simplified matrix A
̂ is illustrated in Figure 12.6.
Figure 12.6: Diagonal plus low-rank plus reduced-block structure of the matrix A.
̂
This construction guarantees that the storage and matrix–vector multiplication com-
plexity for the simplified matrix block A
̂ remains of the same order as that for the ma-
trix V, characterized by a low ε-rank. Table 12.3 demonstrates how the ratio NW /Nov
decreases with the increasing problem size.
F0 → F̂ by replacing A0 → A
̂ in (12.17),
Fψ
̂ n = λ̂n ψn (12.23)
192 | 12 Calculation of excitation energies of molecules
defined by the low-rank plus block-diagonal approximation F̂ to the initial BSE ma-
trix F. The corresponding eigenvalues γ̂n of the modified reduced system (12.23) are
computed by direct solution of the small size reduced eigenvalue problem
Mq
̂ n = γ̂n Sq
̂ ,
n qn ∈ ℝm0 , (12.24)
M ̂T F G,
̂=G ̂ ̂T G
Ŝ = G ̂ ∈ ℝm0 ×m0 .
Table 12.4 illustrates the decrease of the approximation error of the simplified and
reduced BSE problems by the order of magnitude.
Table 12.4: Accuracies (in eV) of eigenvalues for the reduced BSE problem via simple low-rank ap-
proximation |ω1 − γ1 | and for block diagonal plus low-rank approximation to BSE matrices |ω1 − γ̂1 |
with the ϵ = 0.1.
Proposition 12.4 ([25]). The numerical results indicate the important property observed
for all molecular systems tested so far: the close to zero eigenvalues λ̂k and γ̂k provide
lower and upper bounds for the exact BSE eigenvalues ωk ; that is,
λ̂k ≤ ωk ≤ γ̂k , k = 1, 2, . . . , m0 ≤ m0 .
The upper bound via the eigenvalues γ̂k can be explained by the variational form
of the reduced problem setting. However, the understanding of the lower bound prop-
erty, when using the output λ̂k from the simplified system addresses an interesting
open problem.
Figure 12.7 demonstrates the two-sided error estimates declared in Proposi-
tion 12.4. Here, the “black” line represents the eigenvalues for the auxiliary problem
(12.17), but with the modified matrix F,
̂ whereas the blue line represents the eigenval-
ues of the reduced equation (12.24) of type (12.18) with the Galerkin matrices M ̂ and
S.
̂ We observe a considerable decrease of the approximation error for both simplified
and reduced problems with the diagonal plus low rank plus small block approach for
submatrix A as compared with the error of the straightforward diagonal plus low-rank
approach presented in Figures 12.3 and 12.4.
12.5 Approximating the screened interaction matrix in a reduced-block format | 193
Figure 12.7: Two-sided bounds for the BSE excitation energies for the H32 chain (left) and C2 H5 NO2
molecule (right).
Figure 12.8: Two-sided error bounds: The errors (in eV) in m0 smallest eigenvalues for simplified and
reduced schemes; N2 H4 molecule (left) and Glycine amino acid C2 H5 NO2 (right).
Figure 12.8 represents examples of upper and lower bounds, i. e., λ̂k − ωk and ωk − γ̂k ,
for the whole sets of m0 ≤ 250 eigenvalues for larger molecules. We observe that the
lower bound is violated only by few larger excitation energies at the level below the
truncation error ϵ.
We conclude that the reduced basis approach, based on the modified auxiliary
matrix M̂ via reduced-block approximation (12.22), provides considerably better accu-
racies ωk − γ̂k than that for γk corresponding to matrix M0 . Table 12.4 compares the
accuracies |ω1 − γ1 | for the first eigenvalues of the reduced BSE problem based on the
straightforward low-rank approximation from equation (12.18) with accuracies |ω1 − γ̂1 |
resulting from combined block plus low-rank approximation all computed for several
molecules.
194 | 12 Calculation of excitation energies of molecules
T T
A−1
0 = Δε − Δε P(I + Q Δε P) Q Δε . (12.26)
−1 −1 −1 −1 −1
Here, the 2r×2r core matrix K = (I +QT Δε−1 P)−1 is small and can be computed explicitly
at the expense 𝒪(r 3 +r 2 Nov ). Hence, the matrix–vector product A−1
0 un requires multipli-
cation by the diagonal matrix Δε−1 and the low-rank matrix in the second summand.
This amounts to the overall cost 𝒪(Nov r). To invert the matrix F0 in the simplified BSE,
we first derive its LU decomposition,
A B0 A 0 I A−1
0 B0
F0 = [ 0T ] = [ 0T ][ ], S = −AT0 + BT0 A−1
0 B0 . (12.27)
−B0 −AT0 −B0 I 0 S
To solve a system
z u
F0 [ ] = [ ] ,
y v
12.6 Inverse iteration for diagonal plus low-rank matrix | 195
z̃ = A−1
0 u, ỹ = v + BT0 z,̃
(12.28)
y = S−1 y,̃ z = z̃ − A−1
0 B0 y.
Note that A−10 B0 is a low-rank matrix and can be precomputed in advance. The action
of A−1
0 is given by (12.26), so we address now the inversion of the Schur complement.
Plugging (12.26) into S, we obtain
where
Therefore,
Keeping intermediate results in these calculations, we can trade off the memory
against the CPU time. The computational cost of (12.29), and then (12.30), is again
bounded by 𝒪(r 2 Nov ), whereas the implementation of (12.28) takes 𝒪(rNov ) opera-
tions.
Table 12.5: Times (s) for eigenvalue problem solvers applied to simplified TDA matrix A0 (“−” means
that iterations did not converge).
Precomputation of intermediate matrices and their use in the structured matrix inver-
sion are shown in Algorithms 1 and 2 in [25]. Table 12.5 compares CPU times (sec) for
full eig and the rank-structured iteration for TDA problem (12.15) in Matlab implemen-
tation [25]. The rank-truncation threshold is ε = 0.1; the number of computed eigen-
values is m0 = 30. The bottom line shows the CPU times (sec) of the eigs procedure
applied with the inverse matrix–vector product A−1 0 u marked by “inv”. The other lines
show results of the corresponding algorithms, which used the traditional product A0 u
(A0 in the diagonal plus low-rank form). Notice that the results for Matlab version of
196 | 12 Calculation of excitation energies of molecules
LOBPCG by [190] are presented for comparison. We see that the inverse-based method
is superior in all tests.
Notice that the initial guess for the subspace iteration applied to the full BSE can
be constructed, thereby replicating the eigenvectors computed in the TDA model. It
provides rather accurate approximation to the exact eigenvectors for the initial BSE
system (12.14). In [23] it was shown numerically that the TDA approximation error |μn −
ωn | of order 10−2 eV is achieved for the compact and extended molecules presented in
Table 12.5.
Table 12.6 compares CPU times (sec) for the full eig-solver, and the rank-structured
eigs-iteration applied to the inverse of simplified rank-structured BSE system (12.17);
see [25] for more detail.
Table 12.6: Times (s) for the simplified rank-structured BSE matrix F0 .
No , Nb 5, 41 9, 82 13, 123 16, 128 20, 170 24, 192 24, 211
BSE matrix size 3602 13142 28602 35842 60002 80642 89762
eig(F0 ) 0.08 4.2 33.7 68.1 274 649 903
eigs(inv(F0 )) 0.13 0.28 0.7 0.77 2.2 2.3 3.9
W
̂N = blockdiag(Wb , diag(w2 )), where w2 contains the elements on the diagonal of
W
W,
̂ which do not belong to Wb . Then the implementation of the matrix inverse
Δε−1
W = blockdiag((Δε1 − Wb ) , (Δε2 − diag(w2 )) ) (12.31)
−1 −1
all steps requiring multiplication with Δε−1 in Algorithms 1, 2 in [25] can be substituted
by (12.31). The numerical complexity of the new inversion scheme is estimated in the
following lemma.
Lemma 12.5 ([25] [Complexity of the reduced-block algorithm]). Suppose that the rank
parameters in the decomposition of V and W ̃ do not exceed r and that the block-size NW
is chosen from equation (12.20). Then the rank structured plus reduced-block represen-
tations of the inverse matrices  −1 and F̂ −1 can be set up with the overall cost 𝒪(N 3/2 r 3/2 +
ov
Nov r 2 ). The complexity of each inversion A ̂ −1 u or F̂ −1 w is bounded by 𝒪(Nov r).
3
Proof. Inversion of the NW × NW dense block in (12.31) requires 𝒪(NW ) operations.
Hence, condition (12.20) ensures that the cost of setting up the matrix (12.31) is
3/2 3/2
bounded by 𝒪(Nov r ). After that, multiplication of (12.31) by an Nov × r matrix
2
requires 𝒪(NW r + NW
r) = 𝒪(Nov (r 2 + r)) operations. Multiplication of (12.31) by a
2
vector is performed with 𝒪(NW + NW
) = 𝒪(Nov r) cost. The complexity of the other
steps is the same as for diagonal plus low-rank approach.
Numerical illustrations for the enhanced data sparsity via block-diagonal plus
low-rank approximation are presented in Table 12.7.
Table 12.7: Block-sparse matrices: times (s) for eigensolvers applied to TDA and BSE systems. The
bottom line shows the error (eV) for the case of block-sparse approximation to the diagonal matrix
block A,
̂ ε = 0.1.
Notice that the performance of the diagonal plus low-rank and block-sparse plus low-
rank solvers is comparable, but the second one provides better sparsity and higher
accuracy in the computed eigenvalues (see Section 12.5). It is remarkable that the ap-
proach, based on the inverse iteration applied to the low-rank plus reduced-block ap-
proximation, outperforms the full eigenvalue solver by several orders of magnitude
(see Tables 12.6 and 12.7).
The data in previous tables correspond to the choice m0 = 30. Figure 12.9 indicates
a merely linear increase in the computational time for the eigs(inv(F))̂ solver with
respect to the increasing value of m0 .
198 | 12 Calculation of excitation energies of molecules
Table 12.8: Average QTT ranks of the column vectors in LV and the m0 eigenvectors (corresponding to
the smallest eigenvalues) in the TDA problem.
No 5 8 9 13 16 20 24
QTT ranks of LV 5.4 7 9.1 12.7 14 17.5 21
QTT ranks of eigenvect. 5.3 7.6 9.1 12.7 13.6 17.2 20.9
Nov 180 448 657 1430 1792 3000 4488
Figure 12.10: QTT ranks (left) and Nov on logarithmic scale (right) vs. No .
solver:
2 2
𝒲BSE = 𝒪(log(Nov )rQTT ) = 𝒪(log(No )No ), (12.32)
which is asymptotically on the same scale (but with smaller prefactor) as that for
the data-structured algorithms based on full-vector arithmetics (see Sections 12.6
and 12.7).
The high-precision Hartree–Fock calculations may require much larger GTO basis
sets so that the constant CGTO may increase considerably. In this situation, the QTT-
based tensor approach seems to outperform the algorithms in full-vector arithmetics.
An even more important consequence of (12.32) is that the rank behavior rQTT ≈ No
indicates that the QTT tensor-based algorithm has memory requirements and alge-
braic complexity of order 𝒪(log(No )No2 ), depending only on the fundamental physical
characteristics of the molecular system; the number of occupied molecular orbitals No
2
(but not on the system size Nov ). This remarkable property traces back to the similar
feature observed in [157, 150]; that is, QTT ranks of the column vectors in the low-rank
Cholesky factors in the TEI matrix are proportional to No (about 3 No ).
Based on the previous discussion, we introduce the following hypothesis.
Hypothesis 1. Estimate (12.32) determines the irreducible lower bound on the asymp-
totic algebraic complexity of the large-scale BSE eigenvalue problems.
200 | 12 Calculation of excitation energies of molecules
The CPU times for QTT calculations are comparable or smaller than the time of
the best Sherman–Morrison–Woodbury inversion methods in the previous sections,
as demonstrated in Table 12.9 (cf. Table 12.7). Recall that the row referred to as “abso-
m0
lute error” in Table 12.9 represents the quantity ‖μqtt − μ⋆ ‖ = (∑m=1 (μqtt,m − μ⋆,m )2 )1/2
characterizing the total absolute error in the first m0 eigenvalues calculated in the
Euclidean norm. The QTT format provides also a considerable reduction of memory
needed to store eigenvectors.
Table 12.9: Time (s) and absolute error (eV) for QTT-DMRG eigensolvers for TDA matrix.
We now summarize the important result of this section: Lower bound on the asymp-
totic algebraic complexity O(No2 ), confirmed by extensive numerical experiments,
means that solving the BSE system in the QQT tensor format leads to numerical com-
plexity O(No2 ), which explicitly indicates the dependence on the number of electrons
in a system. This seems to be the asymptotically optimal cost for solving large-scale
BSE eigenvalue problems.
Notice that in recent years the analysis of eigenvalue problem solvers for large
structured matrices has been widely discussed in the linear algebra community [20,
19, 22]. Tensor-structured approximation of elliptic equations with quasi-periodic co-
efficients has been considered in [180, 181].
13 Density of states for a class of rank-structured
matrices
In this section, we discuss the new numerical approach to approximation of the den-
sity of states (DOS) of large rank-structured symmetric matrices. This approach was
recently introduced in [27] in the application to estimation of the optical spectra of
molecules in the framework of the BSE and TDA calculations; see the discussion in
Section 12.1. In this application, the block-diagonal plus low-rank matrix structures
arise in the representation of the symmetric TDA matrix. Here, we sketch the tech-
niques for fast DOS calculation applied to the general class of rank-structured matri-
ces.
Several methods for calculating the density of states were originally developed in
condensed matter physics [74, 301, 285, 73, 296], and now this topic is also considered
in numerical linear algebra community [290, 98, 283]. We refer to a recent survey on
commonly used methodology for approximation of DOS for large matrices of general
structure [204]. The traditional methods for approximating DOS are usually based on a
polynomial or fractional-polynomial interpolation of the exact DOS function, regular-
ized by Gaussians or Lorentzians and subsequent computing traces of certain matrix-
valued functions, for example, matrix resolvents or polynomials calculated at a large
set of interpolation points within the spectral interval of interest. The trace calcula-
tions are typically executed by using the heuristic stochastic sampling over a large
number of random vectors [204].
The sizes of matrices arising in quantum chemistry and molecular dynamics com-
putations are usually large, scaling polynomially in the size of a molecular system,
whereas the DOS for these matrices often exhibits very complicated shapes. Hence,
the traditional approaches mentioned above become prohibitively expensive. More-
over, the algorithms based on polynomial type or trigonometric interpolants have poor
approximating properties when the spectrum of a matrix exhibits gaps or highly os-
cillating non-regular shapes, as is often the case in electronic structure calculations.
Furthermore, stochastic sampling relies on Monte Carlo-type error estimates charac-
terized by slow convergence rates and, as result, by low accuracy.
The method presented in [27] to approximate the DOS of Tumm–Dankoff (TDA)
Hamiltonian applies to a class of rank-structured matrices, particularly, to the block-
diagonal plus low-rank BSE/TDA matrix structures described in [23, 25]. It is based
on the Lorentzian blurring [124] such that the most computationally expensive part
of the calculation is reduced to the evaluation of traces of the shifted matrix inverses.
Fast method is presented for calculating traces of parametric matrix resolvents at in-
terpolation points by taking an advantage of the block-diagonal plus low-rank matrix
structure. This allows us to overcome the computational difficulties of the traditional
schemes and avoid the need of stochastic sampling.
https://doi.org/10.1515/9783110365832-013
202 | 13 Density of states for a class of rank-structured matrices
1 n
ϕ(t) = ∑ δ(t − λj ), t, λj ∈ [0, a], (13.1)
n j=1
where δ is the Dirac delta, and λj ’s are the eigenvalues of the symmetric matrix A =
AT ∈ ℝn×n ordered as λ1 ≤ λ2 ≤ ⋅ ⋅ ⋅ ≤ λn .
Several classes of blurring approximations to ϕ(t) have been considered in the
literature. One can replace each Dirac-δ by a Gaussian function with a small width
η > 0, i. e.,
1 t2
δ(t) gη (t) = exp (− 2 ),
√2πη 2η
where the choice of the regularization parameter η depends on the particular problem
setting. As a result, (13.1) can be approximated by
1 n
ϕ(t) → ϕη (t) := ∑ g (t − λj ), (13.2)
n j=1 η
on the whole energy interval [0, a]. Another option is the replacement of each Dirac-δ
by a Lorentzian with a small width η > 0, i. e.,
1 η 1 1
δ(t) Lη (t) := = Im( ) (13.3)
π t 2 + η2 π t − iη
1 n
ϕ(t) → ϕη (t) := ∑ L (t − λj ). (13.4)
n j=1 η
13.1 Regularized density of states for symmetric matrices | 203
Both functions ϕη (t) and Lη (t) are continuous. Hence, they can be discretized by
sampling on a fine grid Ωh over [0, a], which is assumed to be the uniform cell-centered
N-point grid with the mesh size h = a/N.
In what follows, we focus on the case of Lorentzians blurring. First, we consider
the class of matrices that can be accurately approximated by a block-diagonal plus
low-rank ansatz (see [23, 25]), which allows efficient explicit representation of the
shifted inverse matrix.
The numerical illustrations below represent the DOS for H2 O molecule broad-
ened by Gaussians (13.2). The data correspond to the reduced basis approach via rank-
structured approximation applied to the symmetric TDA model [23, 25] described by
the symmetric matrix block A of the full BSE system matrix; see Section 12. Figure 13.1
(left) represents DOS for H2 O computed by using the exact TDA spectrum (blue) and
its approximation based on simplified model via low-rank approximation to A (red),
whereas the right figure shows the relative error. This suggests that DOS for the initial
matrix A of general structure can be accurately approximated by DOS calculated for
its structural diagonal plus low rank approximation.
Figure 13.1: DOS for H2 O. Exact TDA vs. simplified TDA (left); zoom of the small spectral interval
(right).
Let us briefly illustrate another example of DOS functions arising stochastic homoge-
nization theory. The numerical examples below have been implemented in [158].
Spectral properties of the randomly generated elliptic operators play an impor-
tant role in the analysis of average quantities in stochastic homogenization. Here, we
follow [158] and present the average behavior of the density of spectrum for the family
of randomly generated 2D elliptic operators {Am } for the large sequence of stochastic
204 | 13 Density of states for a class of rank-structured matrices
Figure 13.2: Density of states for a number of stochastic processes M = 1, 2, . . . , 20 with L = 4 (left)
and L = 8 (right) for λ = 0.5, n0 = 8, and α = 0.25.
realizations. The DOS provides the important spectral characteristics to the differen-
tial operator that accumulates the crucial information on the static and dynamical
characteristics of the complex physical or molecular system. In particular, the numer-
ics below demonstrate the convergence of DOS to the sample average function at the
limit of large number of stochastic realizations with the fixed size of the so-called rep-
resentative volume element L; see [158] for more detail.
Figure 13.2 represents DOS for a sequence of M = 1, 2, . . . , 20 stochastic realiza-
tions on L × L lattice with L = 4, 8 from left to right, corresponding to the fixed model
parameters. The numerical experiments show that the DOS of the stochastic operator
is represented by the rather complicated functions whose numerical approximation
might be the challenging task.
1 n 1 1
ϕ(t) → ϕη (t) := ∑ Im( )= Im Trace[(tI − A − iηI)−1 ]. (13.5)
nπ j=1 (t − λj ) − iη nπ
1 n η 1
ϕη (t) := Trace[((tI − A)2 + η2 I) ], (13.6)
−1
∑ =
nπ j=1 (t − λj )2 + η2 nπ
The advantage of representations (13.5) and (13.6) is that in both cases computing
the DOS in the form ϕη (t) allows avoiding the explicit information on the matrix spec-
tra. Indeed, the initial task reduces to approximating the trace of the matrix resolvent
The calculation of (13.8) for f1 (A) and f2 (A), given by (13.7), reduces to solving linear
systems in the form of
or
These linear systems have to be solved for many target points t = tk ∈ [a, b] in the
course of a chosen interpolation scheme and the subset of spectrum of interest.
In the case of rank-structured matrices A, the solution of equations (13.9) or (13.10)
can be implemented with a lower cost. However, even in this favorable situation one
requires a relatively large number mr of stochastic realizations to obtain satisfactory
mean value approximation. Indeed, following the central limit theorem, the conver-
gence rate is expected to be of order O(1/√mr ) at the limit of large number of stochas-
tic realizations. On the other hand, with the limited number of interpolation points,
the polynomial type of interpolation schemes applied to highly non-regular shapes as
shown, for example, in Figure 13.1 (left), can only provide the poor resolution and it is
unlikely to reveal spectral gaps and many local peaks of interest.
both at the cost O(n) up to some logarithmic factor. For numerical efficiency, the rank
parameter R is supposed to be small compared with the matrix size, that is, R ≪ n.
Remark 13.2. Definition 13.1 applies, in particular, to the following classes of matrices
E in (13.11):
(A) E = blockdiag{B0 , D0 }, which arises when using the low-rank BSE matrix structure
as in [23, 25] (see Section 12.1).
(B) E is the multilevel block circulant matrix arising in Hartree–Fock calculations for
slightly perturbed periodic lattice-structured systems, [154].
(C) E represents homogenized matrix for the FEM-Galerkin approximation to ellip-
tic operators with quasi-periodic coefficients arising, for example, in geomet-
ric/stochastic homogenization theory; see [181, 158].
In what follows, we use the notation 1m for a length-m vector of all ones. The fol-
lowing simple result that generalizes [27] Theorem 3.1 to the more general class of
matrices, describes an efficient numerical scheme for calculation of traces of rank-
structured matrices specified by Definition 13.1 and asserts that the corresponding cost
is estimated by O(nR2 ).
Lemma 13.3. For the matrix A of the form (13.11), the trace of the matrix inverse A−1 can
be calculated explicitly by
K = IR + P T E −1 P.
where
U = E −1 PK −1/2 ∈ ℝn×R .
Proof. The proof follows the arguments similar to that in Theorem 3.1, [27]. The anal-
ysis relies on the particular favorable structure of the matrix E described in Definition
13.1. Indeed, we use the direct trace representation for both rank-R and inverse matri-
ces E −1 . The argument is based on the simple observation that the trace of a rank-R
matrix UV T , where U, V ∈ ℝn×R , U = [u1 , . . . , uR ], V = [v1 , . . . , vR ], uk , vk ∈ ℝn , can be
calculated in terms of skeleton vectors by
R
trace[UV T ] = ∑ ⟨uk , vk ⟩ = 1Tn (U ⊙ V)1R (13.12)
k=1
U = E −1 PK −1 , V = E −1 Q.
A−1 = E −1 − UV T = E −1 − E −1 PK −1 QT E −1 ,
We notice that the price to pay for the real arithmetics in equation (13.10) is the com-
putation with squared matrices, which, however, does not deteriorate the asymptotic
complexity since there is no increase of the rank parameter in the rank-structured rep-
resentation of the target matrix; see Lemma 13.4, which is the respective modification
of Theorem 3.2 in [27]. In what follows, we denote by [U, V] the concatenation of two
matrices of compatible size.
Lemma 13.4. Given the matrix B(t) = (tI − A)2 + η2 I, where A is defined by (13.11), the
trace of the real-valued matrix resolvent B−1 (t) can be calculated explicitly by
with
U ̂ −1 ∈ ℝn×2R
̂ = Ê −1 PK ̂ ∈ ℝn×2R ,
̂ = Ê −1 Q
and V
208 | 13 Density of states for a class of rank-structured matrices
̂ = (η2 + t 2 )I − 2tE + E 2 ,
E(t)
̂ Q
and the rank-2R matrices P, ̂ are represented via concatenation
such that the small core matrix K(t) ∈ ℝ2R×2R takes the form K(t) = IR + Q
̂T Ê −1 (t)P.
̂
2
The numerical cost is estimated by O(nR ) up to a low-order term.
Proof. Given the block-diagonal plus low-rank matrix A in the form (13.11), we obtain
B = (tI − A)2 + η2 I = Ê + P
̂Q̂T , (13.14)
where the block-diagonal matrix Ê and the rank-2R matrix P ̂T are defined as above.
̂Q
We apply the Sherman–Morrison–Woodbury scheme to the structured matrix B, then
Theorem 13.3 implies the desired representation. Now we take into account that Ê is
the matrix polynomial in E(t) of degree 2; then the assumptions on the trace properties
of E prove the complexity bound.
Based on Lemmas 13.3 and 13.4, the calculation of DOS can be implemented effi-
ciently in real arithmetics. Notice that a similar statement to Lemma 13.4 holds in the
case of complex arithmetics; see the discussion in [27], Theorem 3.1.
The following numerics demonstrate the efficiency of DOS calculations for the
rank-structured TDA matrix in the form (13.13) implemented in real arithmetics (MAT-
LAB). In this case the initial block-diagonal matrix E is given by E = blockdiag{B0 , D0 }
as described in Section 12.1.
Figure 13.3 illustrates that using only the structure-based trace representation
(13.13) in Lemma 13.4, we obtain the approximation that resolves perfectly the DOS
Figure 13.3: Left: DOS for H2 O vs. its recovering by using the trace of matrix resolvents; Right: zoom
in the small energy interval.
13.4 QTT approximation of DOS via Lorentzians: rank bounds | 209
function on the examples of H2 O molecule. See [27] for numerical examples for several
moderate size molecules.
Figure 13.4 shows the rescaled CPU time, that is, T/R, where T denotes the total
CPU time for computing DOS by the algorithm implementing (13.13). We applied the
algorithm to the different system size n (i. e., the size of TDA matrices considered in
Section 12.1), varying from n = 180 till n = 4488. In all cases, the N-point representa-
tion grid with fixed N = 214 was used. This indicates that the numerical performance
of the algorithm is even better than the theoretical complexity O(nR2 ); see more nu-
merics in [27].
2
parameters rqtt log N ≪ N, where the average QTT rank rqtt is a small rank parameter
depending on the truncation error ϵ > 0.
In the following numerical examples, we use a sampling vector defined on a fine
grid of size N ≈ 214 . We fix the QTT truncation error to ϵQTT = 0.04 (if not explicitly in-
dicated). For ease of interpretation, we set the pre-factor in (13.1) equal to 1. It is worth
noting that the QTT-approximation scheme is applied to the full TDA spectrum. Our
results demonstrate that the QTT-approximant renders good resolution in the whole
range of energies (in eV), including large “zero gaps”.
Figure 13.5: DOS for H2 O molecule via Lorentzians (blue) and its QTT approximation (red) (left). Zoom
in the small energy interval (right).
Figure 13.5 (left) represents the TDA DOS (blue line) for the H2 O computed via the
Lorentzian blurring with the parameter η = 0.4 and the corresponding rank-9.4 QTT
tensor approximation (red line) to the discretized function ϕη (t). For this example,
the number of eigenvalues is given by n = NBSE /2 = 180. Figure 13.5 (right) provides
a zoom of the corresponding DOS and its QTT approximant within the small energy
interval [0, 40] eV.
This means that for a fixed η, the QTT-rank remains rather modest, relative to the
molecular size. This observation confirms the QTT ranks estimates in Section 13.6. The
moderate size of QTT ranks in Figure 13.5 clearly demonstrates the potential of QTT
interpolation for modeling DOS of large lattice-type clusters.
We observe several gaps in the spectral densities with complicated shapes (see
Figures 13.5 and 13.6) indicating that the polynomial, rational, or trigonometric inter-
polation can be applied only to a small energy sub-intervals, but not in the whole
interval [0, a]. It is remarkable that the QTT approximant resolves well the DOS func-
tion in the whole energy interval, including nearly zero values within the spectral gaps
(hardly possible for polynomial/rational based interpolation).
13.5 Interpolation of the DOS function by using the QTT format | 211
samples of the target N-vector1 with a small pre-factor Cs , usually satisfying Cs ≤ 10,
that is independent of the fine interpolation grid size N = 2d ; see, for example, [183].
This cost estimate seems promising in the perspective of extended or lattice-type
molecular systems, requiring large spectral intervals and, as a result, a large inter-
polation grid of size N. Here, the QTT rank parameter rqtt naturally depends on the
required truncation threshold ε > 0, characterizing the L2 -error between the exact
DOS and its QTT interpolant. The QTT tensor interpolation adaptively reduces the
number of functional calls, that is, M < N, if the QTT rank parameters (or threshold
ε > 0) are chosen to satisfy condition (13.15). The expression on the right-hand side of
(13.15) provides a rather accurate estimate on the number of functional evaluations.
To complete this discussion, we present numerical tests on the low-rank QTT ten-
sor interpolation applied to the long vector discretizing the Lorentzian-DOS on large
representation grid.
Figure 13.6 represents the results of the QTT interpolating approximation to the
discretized DOS function for NH3 molecule. We use the QTT cross approximation al-
gorithm based on [167, 230, 256] and implemented in the MATLAB TT-toolbox [232].
Here, we set ε = 0.08, η = 0.1, and N = 214 , providing rQTT = 9.8, see [27] for more
numerical examples.
1 In our application, this is the functional N-vector corresponding to representation of DOS via matrix
resolvents in (13.6).
212 | 13 Density of states for a class of rank-structured matrices
Figure 13.6: QTT ACA interpolation of DOS for NH3 molecule (left) and its error on the whole spec-
trum.
Figure 13.7 (see [27]) illustrates the logarithmic increase in the number of samples re-
quired for the QTT interpolation of DOS (for the H2 O molecule) represented on the grid-
size N = 2d with the different quantics dimensions d = 11, 12, . . . , 16. The rank trunca-
tion threshold is chosen by ϵ = 0.05 and the regularization parameter is η = 0.2. In this
example the effective pre-factor in (13.15) is estimated by Cs ≤ 10. This pre-factor char-
2
acterizes the average number of samples required for the recovery of each of rqtt log N
representation parameters involved in the QTT tensor ansatz.
We observe that the QTT tensor interpolant recovers the complicated shape of the
exact DOS with a high precision. The logarithmic asymptotic complexity scaling M =
O(log N) (i. e., the number of functional calls required for the QTT tensor interpolation)
vs. the grid size N can be observed in Figure 13.7 (blue line) for large representation
grids.
13.6 Upper bounds on the QTT ranks of DOS function | 213
1 n
ϕη (t) → p = pη = ∑ g ∈ ℝN ,
n j=1 η,j
Lemma 13.5 ([27]). Assume that the effective support of the shifted Gaussians gη (t − λj ),
j = 1, . . . , n, is included in the computational interval [−a, a]. Then the QTT ε-rank of the
vector pη is bounded by
where the constant C = O(|log η|) > 0 depends only logarithmically on the regularization
parameter η.
Proof. The main argument of the proof is similar to that in [148, 68]: the sum of dis-
cretized Gaussians, each represented in Fourier basis, can be expanded with merely
the same number m0 of Fourier harmonics as the individual Gaussian function (uni-
form basis).
Given exponent parameter η, we first estimate the number of essential Fourier
coefficients of the Gaussian vectors gη,j ,
taking into account their exponential decay. Notice that m0 depends only logarithmi-
cally on η. Since each Fourier harmonic has the exact rank-2 QTT representation (see
Section 4.2), we arrive at the desired bound.
214 | 13 Density of states for a class of rank-structured matrices
A similar QTT rank bound can be derived for the case of Lorentzian blurred DOS.
Indeed, we observe that the Fourier transform of the Lorentzian in (13.3) is given
by [27],
ℱ (Lη (t)) = e .
−|k|η
This leads to the logarithmic bound in the number m0 of essential Fourier coefficients
in the Lorentzian vectors.
Table 13.1: QTT ranks of Lorentzians-DOS for TDA matrices of some molecules with parameters
ε = 0.04, η = 0.4, N = 16 384.
Table 13.1 shows that the average QTT tensor rank of Lorentzians-DOS for various TDA
matrices remains almost independent of the molecular size, which confirms previous
observations. The weak dependence of the rank parameter on the molecular geometry
can be observed.
14 Tensor-based summation of long-range
potentials on finite 3D lattices
In Chapter 9 we described the method for direct tensor summation of the electrostatic
potentials used for calculation of the nuclear potential operator for molecules [156],
which reduces the volume summation of the potentials to one-dimensional rank-
structured operations. However, the rank of the resulting canonical tensor increases
linearly in a number of potentials and this growth may become crucial for larger
multi-particle systems. Favorably, the tensor approach often suggests new concepts
to solution of classical problems.
In this chapter, we discuss the assembled tensor method for summation of the
long-range potentials on finite rectangular L × L × L lattices introduced recently by
the authors in [148, 153]. This technique requires only O(L) computational work for
calculation of the collective electrostatic potential of large lattice systems and O(L2 )
for computation of their interaction energy, instead of O(L3 log L) when using the tra-
ditional Ewald summation techniques. Surprisingly, the assembled tensor summation
technique does not increase the tensor rank: the rank of the tensor for collective po-
tential of large 3D lattice clusters equals to the rank of a single 3D reference potential.
The approach was initiated by our former numerical observations in [173, 146] that the
Tucker tensor rank of a sum of Slater potentials placed at nodes of a three-dimensional
finite lattice remains the same as the rank of a single Slater function.
1
A single three-dimensional potential function (electrostatic potential ‖x‖ or other
type interactions generated by radial basis function) sampled on a large N × N × N rep-
resentation grid in a bounding box is approximated with a guaranteed precision by a
low-rank Tucker/canonical reference tensor. This tensor provides the values of the dis-
cretized potential at any point of this fine auxiliary 3D grid, but needs only O(N) stor-
age. Then each 3D singular kernel function involved in the summation is represented
on the same grid by a shift of the reference tensor along lattice vectors. Directional
vectors of the Tucker/canonical tensor defining a full lattice sum are assembled by the
1D summation of the corresponding univariate skeleton vectors specifying the shifted
tensor. The lattice nodes are not required to exactly coincide with the grid points of
the global N × N × N representation grid since the accuracy of the resulting tensor sum
is well controlled due to easy availability of large grid size N (e. g., fine resolution).
The key advantage of the assembled tensor method is that the summation of po-
tentials is implemented within the skeleton vectors of the generating canonical tensor,
thus not affecting the resulting tensor rank; the number of canonical vectors repre-
senting the total tensor sum remains the same as for a single reference kernel. For a
sum of electrostatic potentials over L × L × L lattice embedded in a box, the required
storage scales linearly in the one-dimensional grid-size, that is, as O(N), whereas the
numerical cost is estimated by O(NL). The important benefit of this summation tech-
https://doi.org/10.1515/9783110365832-014
216 | 14 Tensor-based summation of long-range potentials on finite 3D lattices
nique is that the resultant low-rank tensor representation of the total sum of potentials
can be evaluated at any grid point at the cost O(1).
In the case of periodic boundary conditions, the tensor approach leads to further
simplifications. Indeed, the respective lattice summation is reduced to 1D operations
on short canonical vectors of size n = N/L, which is the restriction (projection) of the
global N-vectors onto the unit cell. Here, n denotes merely the number of grid points
per unit cell. In this case, storage and computational costs are reduced to O(n) and
O(Ln), respectively, whereas the traditional FFT-based approach scales at least cubi-
cally in L, O(L3 log L), and in N. Notice that due to low cost of the tensor method at
the limit of large lattice size L, the conditionally convergent sums in periodic setting
can be regularized by subtraction of the constant term, which can be evaluated nu-
merically by the Richardson extrapolation on a sequence of lattice parameters L, 2L,
4L, etc. (see Section 14.3). Hence, in the new framework, the analytic treatment of the
conditionally convergent sums is no longer required.
We notice that the numerical treatment of long-range potentials in large lattice-
type systems was always considered as a computational challenge (see [72, 235, 199]
and [295, 207, 47, 208, 253]). Tracing back to Ewald summation techniques [79], the
development of lattice-sum methods has led to a number of established algorithms
for evaluating long-range electrostatic potentials of multiparticle systems; see for ex-
ample [58, 236, 278, 139, 63, 205] and references therein. These methods usually com-
bine the original Ewald summation approach with the fast Fourier transform (FFT).
The commonly used Ewald summation algorithms [79] are based on a certain specific
local-global analytical decomposition of the interaction potential. In the case of elec-
trostatic potentials, the Newton kernel is represented by
1 τ(r) 1 − τ(r)
= + ,
r r r
where the traditional choice of the cutoff function τ is the complementary error func-
tion
∞
2
τ(r) = erfc(r) := ∫ exp(−t 2 )dt.
√π
r
The Ewald summation techniques were shown to be particularly attractive for comput-
ing the potential energies and forces of many-particle systems with long-range inter-
action potential in periodic boundary conditions. They are based on the spacial sepa-
ration of a sum of potentials into two parts, then the short-range part is treated in the
real space, and the long-range part (whose sum converges in the reciprocal space) re-
quires grid-based FFT calculations with unreducible O(L3 log L) computational work.
It is worth noting that the presented tensor method is applicable to the lattice
sums generated by rather general class of radial basis functions, which allow an ef-
ficient local-plus-separable approximation. In particular, along with Coulombic sys-
tems, it can be applied to a wide class of commonly used interaction potentials, for
14.1 Assembled tensor summation of potentials on finite lattices | 217
example, to the Slater, Yukawa, Stokeslet, Lennard-Jones, or van der Waals interac-
tions. In all these cases, the existence of low-rank grid-based tensor approximation
can be proved and this approximation can be constructed numerically by analytic-
algebraic methods as in the case of the Newton kernel; see the detailed discussion in
[153, 171].
The tensor approach is advantageous in other functional operations with the lat-
tice potential sums represented on a 3D grid such as integration, differentiation, or
force and energy calculations using tensor arithmetics of 1D complexity [174, 146, 152,
240, 24]. Notice that the summation cost in the Tucker/canonical formats O(L N) can
be reduced to the logarithmic scale in the grid size O(L log N) by using the low-rank
quantized tensor approximation (QTT) [167] of long canonical/Tucker vectors as it was
suggested and analyzed in [148].
ΩL = B1 × B2 × B3
L−1 M0
Zν
vk (x) = ∑ ∑ , x ∈ Ωk , (14.1)
k1 ,k2 ,k3 =0 ν=1
‖x − aν (k1 , k2 , k3 )‖
Let ΩNL be the NL × NL × NL uniform grid on ΩL with the same mesh-size h as above,
and introduce the corresponding space of piecewise constant basis functions of the
dimension NL3 . In this construction, we have
In practice, the computational box ΩL and the grid size NL can be taken larger than
(14.2) by some “dummy” distance with the grid size N0 , so that
NL = Ln + 2N0 . (14.3)
Similarly to (11.9), we employ the rank-R reference tensor defined on the auxiliary box
̃ by scaling Ω with factor 2,
Ω L L
R
̃ L,R = ∑ p(1) ⊗ p(2) ⊗ p(3) ∈ ℝ2NL ×2NL ×2NL ,
P (14.4)
q q q
q=1
and let 𝒲ν(ki ) , i = 1, 2, 3, be the directional windowing operators associated with the
lattice vector k. The next theorem proves the storage and numerical costs for the lat-
tice sum of single potentials, each represented by a canonical rank-R tensor, which
corresponds to the choice of M0 = 1 and a1 = 0 in (14.1). The ΩL -windowing operator
𝒲 = 𝒲(k) (tracing onto NL × NL × NL window) is rank-1 separable:
Theorem 14.1 ([148]). Given a canonical rank-R tensor representation of a single long-
range potential (14.4), the projected tensor of the interaction potential vcL (x), x ∈ ΩL ,
representing the collective potential sum over L3 charges of a rectangular lattice, can be
14.1 Assembled tensor summation of potentials on finite lattices | 219
The numerical cost and storage size are estimated by O(RLNL ) and O(RNL ), respectively,
where NL is the univariate grid size as in (14.2).
Proof. We fix the index ν = 1 in (14.1) and consider only the second sum defined on
the complete domain ΩL ,
L−1
Z
vcL (x) = ∑ , x ∈ ΩL . (14.6)
k1 ,k2 ,k3 =0
‖x − bk‖
Then the projected tensor representation of vcL (x) takes the form (setting Z = 1)
L−1 L−1 R
NL ×NL ×NL
PcL = ∑ 𝒲ν(k) PL,R = ∑ q ⊗ pq ⊗ pq ) ∈ ℝ
∑ 𝒲(k) (p(1) (2) (3)
,
k1 ,k2 ,k3 =0 k1 ,k2 ,k3 =0 q=1
where p(ℓ)
q , ℓ = 1, 2, 3, are vectors of the reference tensor (14.4) and the 3D shift vector
is defined by k ∈ ℤL×L×L . Now, the above summation can be represented by
R L−1
PcL = ∑ ∑ 𝒲(k1 ) pq ⊗ 𝒲(k2 ) pq ⊗ 𝒲(k3 ) pq .
(1) (2) (3)
(14.7)
q=1 k1 ,k2 ,k3 =0
To simplify the large sum over the full 3D lattice, we use the following property of a
sum of canonical tensors with equal ranks R and with two coinciding factor matri-
ces: the concatenation in the third mode ℓ can be reduced to point-wise summation
(“assembling”) of the respective canonical vectors
C (ℓ) = [a(ℓ)
1 + b1 , . . . , aR + bR ],
(ℓ) (ℓ) (ℓ)
(14.8)
a b
thus preserving the same rank parameter R for the resulting sum. Notice that, for each
fixed q, the inner sum in (14.7) satisfies the above property. By repeatedly applying this
property to all canonical tensors for q = 1, . . . , R, the 3D sum (14.7) can be simplified to
a rank-R tensor obtained by 1D summations only:
R L−1 L−1
PcL = ∑ ( ∑ 𝒲(k1 ) p(1)
q ) ⊗ ( ∑ 𝒲(k2 ) pq ⊗ 𝒲(k3 ) pq )
(2) (3)
q=1 k1 =0 k2 ,k3 =0
R L−1 L−1 L−1
= ∑ ( ∑ 𝒲(k1 ) p(1)
q ) ⊗ ( ∑ 𝒲(k2 ) pq ) ⊗ ( ∑ 𝒲(k3 ) pq ).
(2) (3)
q=1 k1 =0 k2 =0 k3 =0
The cost can be estimated by following the standard properties of canonical tensors.
220 | 14 Tensor-based summation of long-range potentials on finite 3D lattices
Figure 14.2: Assembled canonical vectors for a sum of electrostatic potentials for a cluster of 20 ×
30 × 4 Hydrogen atoms in a rectangular box of size ∼55.4 × 33.6 × 22.4 au3 . Top left-right: vectors in x-
and y-axes, respectively; bottom left: vectors along z-axis. Bottom right: the resulting sum of 2400
nuclei potentials at the middle cross-section with z = 11.2 au.
Remark 14.2. For the general case M0 > 1, the weighted summation over M0 charges
leads to the low-rank tensor representation, that is, rank(PcL ) ≤ M0 R, and
Figure 14.3: Assembled canonical vectors in x-, y-, and z-axes for a sum of 1 572 864 nuclei poten-
tials.
z-axes for a sum of 1 572 864 nuclei potentials for a cluster of 192 × 128 × 64 Hydrogen
atoms in a box of size ≈ 19.8 × 13.4 × 7 nm3 .
The canonical tensor representation (14.5) reduces dramatically the numerical
costs and storage consumptions. Figure 14.4 compares the direct and assembled tensor
summation methods (grid-size of a unit cell, n = 256). Contrary to the direct canonical
summation of the nuclear potentials on a 3D lattice, which scales at least linearly in
the size of the cubic lattice as NL3 L3 (blue line), the CPU time for directionally agglom-
erated canonical summation in a box via (14.5) scales as NL L (red line).
Table 14.1 presents the times for assembled computation of the sum of potentials
positioned in nodes of L × L × L lattice clusters. Approximate sizes of finite clusters are
given in nanometers. This table shows that computation time for the tensor approach
scales logarithmically in the cluster size. We refer to [148] for the more detailed pre-
sentation of numerical experiments.
Figure 14.5 compares the tensor sum obtained by the assembled canonical vec-
tors with the results of direct tensor sum for the same configuration. The absolute dif-
ference of the corresponding sums for a cluster of 16 × 16 × 2 cells (here a cluster of
512 Hydrogen atoms) is close to machine accuracy ∼10−14 .
222 | 14 Tensor-based summation of long-range potentials on finite 3D lattices
Table 14.1: CPU times (sec) vs. the lattice size for the assembled calculation of their sum PcL over the
L × L × L clusters. Approximate sizes of finite clusters are given in nanometers.
L 32 64 128 256
Total L3 32 768 262 144 2 097 152 16 777 216
Cluster size 3.83 73 13.43 26.23
Summation time (sec) 0.2 0.27 0.83 3.87
Figure 14.5: Left: The electrostatic potential of the cluster of 16 × 16 × 2 Hydrogen atoms in a box
(512 atoms). Right: the absolute error of the assembled tensor sum on this cluster by (14.5) with
respect to the direct tensor summation (11.10).
r
TcL = ∑ bm ( ∑ 𝒲(k1 )̃t(1)
m1 ) ⊗ ( ∑ 𝒲(k2 ) tm2 ) ⊗ ( ∑ 𝒲(k3 ) tm3 ).
̃(2) ̃(3) (14.10)
m=1 k1 ∈𝒦 k2 ∈𝒦 k3 ∈𝒦
The numerical cost and storage are estimated by O(3rLNL ) and O(3rNL ), respectively.
14.2 Assembled summation of lattice potentials in Tucker tensor format | 223
TcL = ∑ ̃ L,r
𝒲(k) T
k1 ,k2 ,k3 ∈𝒦
r
= ∑ bm ( ∑ 𝒲(k1 )̃t(1)
m1 ) ⊗ ( ∑ 𝒲(k2 ) tm2 ) ⊗ ( ∑ 𝒲(k3 ) tm3 ).
̃(2) ̃(3)
m=1 k1 ∈𝒦 k2 ∈𝒦 k3 ∈𝒦
Remark 14.4. In the general case M0 > 1, the weighted summation over M0 charges
leads to the rank-Rc canonical tensor representation on the “reference” domain Ω ̃ ,
L
which can be used to obtain the rank-Rc representation of a sum in the whole L × L × L
lattice (cf. Remark 14.2 and Theorem 14.1):
Rc
PcL = ∑ ( ∑ 𝒲(k1 ) p q ) ⊗ ( ∑ 𝒲(k2 ) pq ) ⊗ ( ∑ 𝒲(k3 ) pq ).
̃ (1) ̃ (2) ̃ (3) (14.11)
q=1 k1 ∈𝒦 k2 ∈𝒦 k3 ∈𝒦
Likewise, the rank-rc Tucker approximation of a lattice potential sum vcL can be com-
puted in the form [153]
r0
TcL = ∑ bm ( ∑ 𝒲(k1 )̃t(1)
m1 ) ⊗ ( ∑ 𝒲(k2 ) tm2 ) ⊗ ( ∑ 𝒲(k3 ) tm3 ).
̃(2) ̃(3) (14.12)
m=1 k1 ∈𝒦 k2 ∈𝒦 k3 ∈𝒦
224 | 14 Tensor-based summation of long-range potentials on finite 3D lattices
Table 14.2: MATLAB calculations: time (sec.) vs. the total number of potentials L3 for the assem-
bled Tucker representation of the lattice sum TcL on the fine NL × NL × NL grid with the mesh size
h = 0.0034 Å.
Table 14.3: Times in MATLAB for computation of the 3D FFT for a sequence of n3 grids. Times for grids
n ≥ 2048 are estimated by extrapolation.
The previous construction applies to the uniformly spaced positions of charges. How-
ever, the agglomerated tensor summation method in both canonical and Tucker for-
mats applies, with slight modification of the windowing operator, to a non-equidistant
L1 × L2 × L3 tensor lattice. Such lattice sums cannot be treated by the traditional Ewald
summation methods based on the FFT transform.
Both the Tucker and canonical tensor representations (14.10) and (14.5) reduce
dramatically the numerical costs and storage consumptions.1 Table 14.2 illustrates
complexity scaling O(NL L) for computation of L×L×L lattice sum in the Tucker format.
We observe the increase of CPU time in a factor of 4 as the lattice size doubles, con-
firming our theoretical estimates. For comparison, in Table 14.3 we present the CPU
time (sec.) for 3D FFT transform, see [153] where the initial numerical examples have
been presented.
Figure 14.7 shows the sum of Newton kernels on a lattice 8 × 4 × 1 and the respec-
tive Tucker summation error achieved with the rank r = (16, 16, 16) Tucker tensor de-
fined on the large 3D representation grid with the mesh size about 0.002 atomic units
(0.001 Å). Figure 14.8 represents the Tucker vectors obtained from the canonical-to-
Tucker (C2T) approximation of the assembled canonical tensor sum of potentials on
an 8 × 4 × 1 lattice. In this case, the Tucker vectors are orthogonal.
1 Note that the total number of potentials on a lattice 2563 is more than 16 millions. The cluster size
in every space dimension is 2 (256 + 6) = 516 au, or ∼26 nanometers. (Here 2 au is the inter-atomic
distance, and 6 is the gap between the lattice and the boundary of a box.)
14.3 Assembled tensor sums in a periodic setting | 225
Figure 14.7: Left: Sum of Newton potentials on an 8 × 4 × 1 lattice generated in a volume with the 3D
grid of size 14 336 × 10 240 × 7168. Right: the absolute approximation error (about 8 ⋅ 10−8 ) of the
rank-r Tucker representation.
Figure 14.8: Several mode vectors from the C2T approximation visualized along x-, y-, and z-axes for
a sum on a 16 × 8 × 4 lattice and the resulting 3D potential (the cross-section at level z = 0).
226 | 14 Tensor-based summation of long-range potentials on finite 3D lattices
Lemma 14.5 ([153]). The discretized potential vcL for the full sum over M0 charges can
be presented by rank-(M0 R) canonical tensor. The computational cost is estimated by
O(M0 RnL), whereas the storage size is bounded by O(M0 Rn).
Figure 14.9 (left) shows the assembled canonical vectors for a lattice structure in
a periodic setting. Recall that in the limit of large L the lattice sum PcL of the Newton
kernels is known to converge only conditionally. The same is true for a sum in a box.
The maximum norm increases as
for 1D, 2D, and 3D sums, respectively; see [153] for more detail. This issue is of spe-
cial significance in the periodic setting dealing with the limiting case L → ∞. In the
traditional Ewald-type summation techniques the regularization of lattice sums is im-
plemented by subtraction of the analytically precomputed constants describing the
asymptotic behavior in L.
To approach the limiting case, in our method, we compute PcL on a sequence of
large parameters L, 2L, 4L, etc. and then apply the Richardson extrapolation as de-
scribed in the following. As result, we obtain the regularized tensor p ̂ L obtained by
Figure 14.9: Periodic canonical vectors in the L × 1 × 1 lattice sum L = 16 (left). Regularized potential
sum p̂ L vs. m with L = 2m for L × L × 1 (middle) and L × L × L lattice sums (right).
14.4 QTT ranks of the assembled canonical vectors in the lattice sum | 227
subtraction of the leading terms in (14.14) and restricted to the reference unit cell Ω0 .
Denoting the target value of the potential by pL , the extrapolation formulas for the
linear (d = 2) and quadratic (d = 3) behavior take form
p
̂ L := 2pL − p2L and p
̂ L := (4pL − p2L )/3,
respectively.
The effect of Richardson extrapolation is illustrated in Figure 14.9. This figure in-
dicates that the potential sum computed at the same point as for the previous example
(in the case of L × L × 1 and L × L × L lattices) converges to the limiting values of p
̂ L after
applying the Richardson extrapolation (regularized sum).
The next statement presents the QTT-rank estimate for Gaussian vector obtained by
x2
uniform sampling of e on the finite interval [68]; see also Section 4.2.
−
2p2
Proposition 14.6. For the given the uniform grid −a = x0 < x1 < ⋅ ⋅ ⋅ < xN = a, xi =
−a+hi, N = 2L on an interval [−a, a], and the vector g = [gi ] ∈ ℝN defined by its elements
xi2 a2
gi = e 2p2 , i = 0, . . . , N − 1, and fixed ε > 0, assume that e 2p2 ≤ ε. Then there exists the
− −
p
rankQTT (gr ) ≤ c log( ),
ε
The next lemma proves the important result that the QTT-rank of a weighted sum
of regularly shifted bumps (see for example Figure 14.9, left) does not exceed the prod-
uct of the QTT-rank of an individual sample and the weighting factor.
228 | 14 Tensor-based summation of long-range potentials on finite 3D lattices
x (:) for i ∈ Ik ,
xk (i) = { 0 (14.15)
0 for i ∈ I \ Ik ,
Notice that Lemma 14.7 provides a constructive algorithm and rigorous proof of
the low-rank QTT decomposition for certain class of Bloch functions [37] and Wannier-
type functions.
Figure 14.10 (left) illustrates shapes of the assembled canonical vectors modulated
by a sin-harmonics.
Figure 14.10: Canonical vectors of the lattice sum modulated by a sin-function (left). Right:
QTT-ranks of the canonical vectors of a 3D Newton kernel discretized on a cubic grids of size
n3 = 16 3843 , 32 7683 , 65 5363 , and 131 0723 .
The following Lemma estimates the bounds for the average QTT ranks of the assem-
bled vectors in PcL in a periodic setting.
Lemma 14.8 ([153]). For given tolerance ε > 0, suppose that the set of Gaussian func-
2 2
tions S := {gk = e−tk ‖x‖ }, k = 0, 1, . . . , M, representing canonical vectors in tensor decom-
2 2 2 2
position PR , is specified by parameters in (6.3) and set e−tk ‖x‖ = e−‖x‖ /2pk . Let us split
the set S into two subsets S = Sloc ∪ Sglob such that
where aε (gk ) = √2pk log1/2 (1/ε). Then the QTT-rank of each canonical vector vq ,
q = 1, . . . , R in (14.5), where R = M + 1, corresponding to Sloc , obeys the uniform in L
rank bound
rQTT ≤ C log(1/ε).
rQTT ≤ C log(L/ε).
justifying the uniform bound pk ≤ C, and then the rank estimate rQTT ≤ C log(1/ε) in
view of Proposition 14.6. Now we apply Lemma 14.7 to obtain the uniform in L rank
bound.
For globally supported functions in Sglob , we have bL ≥ aε ≃ pk log1/2 (1/ε) ≥ b.
Hence, we consider all these functions on the maximal support of the size of super-cell
bL and set a = bL. Using the trigonometric representation as in the proof of Lemma 2
2 2
in [68], we conclude that for each fixed k, the shifted Gaussians gk,ℓ (x) = e−tk ‖x−ℓb‖
(ℓ = 1, . . . , L) can be approximated by shifted trigonometric series
M
−π
2 m2 p2
πm(x − bℓ)
Gr (x − bℓ) = ∑ Cm pe 2a2 cos( ), a = bL,
m=0
a
Theorem 14.9 ([153]). The tensor representation of vcL for the full lattice sum generated
by a single charge can be presented by the rank-R QTT-canonical tensor
R L L L
PcL = ∑ (𝒬 ∑ 𝒲ν(k1 ) p(1)
q ) ⊗ (𝒬 ∑ 𝒲ν(k2 ) pq ) ⊗ (𝒬 ∑ 𝒲ν(k3 ) pq ),
(2) (3)
(14.16)
q=1 k1 =1 k2 =1 k3 =1
where 𝒬p(ℓ)
q denotes the QTT tensor approximations of the canonical vector pq . Here
(ℓ)
the QTT-rank of each canonical vector is bounded by rQTT ≤ C log(L/ε). The computa-
3
tional cost and storage are estimated by O(RLrQTT ) and O(R log2 (L/ε)), respectively.
230 | 14 Tensor-based summation of long-range potentials on finite 3D lattices
Figure 14.11: Left: QTT-ranks of the assembled canonical vectors vs. L for fixed grid size N3 = 16 3843 .
Right: Average QTT-ranks over R canonical vectors vs. log L for 3D evaluation of the L × 1 × 1 chain of
Hydrogen atoms on N × N × N grids, N = 2048, 4096, 8192, 16 384.
We consider the sum of canonical tensors on a lattice with defects located at S sources.
The canonical rank of the resultant tensor may increase at a factor of S. The effective
rank of the perturbed sum may be reduced by using the RHOSVD approximation via
Can → Tuck → Can algorithm (see [174]). This approach basically provides the com-
pressed tensor with the canonical rank quadratically proportional to those of the re-
spective Tucker approximation to the sum with defects.
Here, for the readers convenience, we recall shortly the basics of the RHOSVD and
C2T decomposition described in detail in Section 3.3.2. In what follows, we focus on
the stability conditions for RHOSVD approximation and their applicability in the sum-
mation of spherically symmetric interaction potentials. The canonical rank-R tensor
representation (2.13) can be written as the rank-(R, R, R) Tucker tensor by introducing
the diagonal Tucker core tensor ξ := diag{ξ1 , . . . , ξR } ∈ ℝR×R×R such that ξν1 ,ν2 ,ν3 = 0
except when ν1 = ⋅ ⋅ ⋅ = ν3 with ξν,...,ν = ξν , ν = 1, . . . , R (see Figure 3.12)
A = ξ ×1 A(1) ×2 A(2) ×d A(3) . (14.17)
Given the rank parameter r = (r1 , r2 , r3 ), to define the reduced rank-r HOSVD-type
Tucker approximation to the tensor in (2.13), we set nℓ = n and suppose for definiteness
that n ≤ R, so that SVD of the side-matrix A(ℓ) is given by
n
T T
A(ℓ) = Z (ℓ) Dℓ V (ℓ) = ∑ σℓ,k z(ℓ)
k
v(ℓ)
k
, z(ℓ)
k
∈ ℝn , v(ℓ)
k
∈ ℝR ,
k=1
Given rank parameters r1 , . . . , rℓ < n, introduce the truncated SVD of the side-matrix
T
A(ℓ) , Z0(ℓ) Dℓ,0 V0(ℓ) (ℓ = 1, 2, 3), where Dℓ,0 = diag{σℓ,1 , σℓ,2 , . . . , σℓ,rℓ }, and Z0(ℓ) ∈ ℝn×rℓ and
V0 (ℓ) ∈ ℝR×rℓ represent the orthogonal factors being the respective sub-matrices in the
SVD factors of A(ℓ) . Here, we recall the definition of RHOSVD tensor approximation
(see Section 3.3): the RHOSVD approximation of A, further denoted as A0(r) , is defined
as the rank-r Tucker tensor obtained by the projection of A in the form (14.17) onto the
orthogonal matrices of the dominating singular vectors in Z0(ℓ) (ℓ = 1, 2, 3).
The stability of RHOSVD approximation is formulated in the following assertion.
Lemma 14.10 ([174]). Let the canonical decomposition (2.13) satisfy the stability condi-
tion R
∑ ξν2 ≤ C‖A‖2 . (14.18)
ν=1
rs
Us = ∑ bs,m u(1)
s,m1 ⊗ us,m2 ⊗ us,m3 ,
(2) (3)
s = 1, . . . , S. (14.19)
m=1
S
̂ = U0 + ∑ Us ,
U0 → U (14.20)
s=1
̂
which implies the simple upper rank estimates for best Tucker approximation of U,
S
̂rℓ ≤ r0,ℓ + ∑ rs,ℓ for ℓ = 1, 2, 3.
s=1
If the number of perturbed cells S is large enough, then the numerical computations
with the Tucker tensor of rank ̂rℓ become prohibitive, and the rank reduction proce-
dure is required.
In the case of Tucker sum (14.20), we define the assembled side matrices U ̂ (ℓ) by
concatenation of the directional side-matrices of individual tensors Us , s = 0, 1, . . . , S:
̂ (ℓ) = [u(ℓ) ⋅ ⋅ ⋅ u(ℓ) , u(ℓ) ⋅ ⋅ ⋅ u(ℓ) , . . . , u(ℓ) ⋅ ⋅ ⋅ u(ℓ) ] ∈ ℝn×(r0,ℓ +∑s=1,...,S rs,ℓ ) ,
U ℓ = 1, 2, 3.
1 r0,ℓ 1 r1,ℓ 1 rS,ℓ
(14.21)
̂ (ℓ) ,
Given the rank parameter r = (r1 , r2 , r3 ), introduce the truncated SVD of U
14.5 Summation of long-range potentials on 3D lattices with defects | 233
where Dℓ,0 = diag{σℓ,1 , σℓ,2 , . . . , σℓ,rℓ }. Here, instead of fixed rank parameter, the trun-
cation threshold ε > 0 can be chosen.
The stability criteria for RHOSVD approximation, as in Lemma 14.10, allows nat-
ural extension to the case of generalized RHOSVD approximation applied to a sum of
Tucker tensors in (14.20).
The following theorem proven in [153] provides an error estimate for the general-
ized RHOSVD approximation, converting a sum of Tucker tensors to a single Tucker
tensor with fixed rank bounds or subject to the given tolerance ε > 0.
Theorem 14.11 (Tucker-sum-to-Tucker). Given a sum of Tucker tensors (14.20) and the
rank truncation parameter r = (r1 , . . . , rd ):
(a) Let σℓ,1 ≥ σℓ,2 ≥ ⋅ ⋅ ⋅ ≥ σℓ,min(n,R) be the singular values of the ℓ-mode side-matrix
̂ (ℓ) ∈ ℝn×R (ℓ = 1, 2, 3) defined in (14.21). Then the generalized RHOSVD approx-
U
imation U0(r) obtained by the projection of U ̂ onto the dominating singular vectors
T
̂ (ℓ) ≈ Z (ℓ) Dℓ,0 V (ℓ) exhibits the error estimate
Z0(ℓ) of the Tucker side-matrices U 0 0
min(n,̂
rℓ ) 1/2
d S
̂ 0 ̂ ∑( ∑ 2 ̂ 2 = ∑ ‖Us ‖2 .
U − U(r) ≤ |U| σℓ,k ) , where |U| (14.22)
ℓ=1 k=rℓ +1 s=0
min(n,̂
rℓ ) 1/2
d
̂ 0 ̂ ∑( ∑ 2
U − U(r) ≤ C‖U‖ σℓ,k ) .
ℓ=1 k=rℓ +1
The resultant Tucker tensor U0(r) can be considered as the initial guess for the ALS
iteration to compute best Tucker ε-approximation of a sum of Tucker tensors.
Figure 14.12 (left) visualizes the result of assembled Tucker summation of the
three-dimensional grid-based Newton potentials on a 16×16×1 lattice, with a vacancy
and impurity, each of 2 × 2 × 1 lattice size. Figure 14.12 (right) shows the corresponding
Tucker vectors along x-axis, which distinctly display the local shapes of vacancies
and impurities.
Though the rectangular structures with lattice-type vacancies and impurities are the
most representative structures in crystalline-type systems, in many practically inter-
esting cases, the physical lattice may have a non-rectangular geometry that does not
234 | 14 Tensor-based summation of long-range potentials on finite 3D lattices
Figure 14.12: Left: assembled grid-based Tucker sum of 3D Newton potentials on a lattice 16 × 16 × 1
with an impurity and a vacancy, both of size 2 × 2 × 1. Right: the Tucker vectors along x-axis.
fit exactly the tensor-product structure of the canonical/Tucker data arrays. For ex-
ample, the hexagonal or parallelepiped-type lattices can be considered. Here, follow-
ing [153], we discuss how to apply tensor summation methods to certain classes of
non-rectangular geometries and show a few numerical examples demonstrating the
required (minor) modifications of the basic assembled summation schemes.
It is worth noting that most of interesting lattice structures (say, arising in crys-
talline modeling) inherit a number of spacial symmetries, which allow us, first, to
classify and then simplify the computational schemes for each particular case of sym-
metry. In this regard, we mention the following class of lattice topologies, which can
be efficiently treated by our tensor summation techniques:
– The target lattice ℒ can be split into the union of several (few) sub-lattices ℒ =
⋃ ℒq such that each sub-lattice ℒq allows a 3D rectangular grid-structure.
– Defects in the target composite lattice may be distributed over rectangular sub-
domains (clusters) represented on a coarser scale. Numerically, it reduces to sum-
mation of tensors corresponding to each of ℒq .
– Defects in the target lattice are distributed over rectangular subdomains (clusters)
represented on a coarser scale.
For such lattice topologies, the assembled tensor summation algorithm applies inde-
pendently to each rectangular sub-lattice ℒq , and then the target tensor is obtained as
a direct sum of tensors associated with ℒq accomplished with the subsequent rank-
reduction procedure. The example of such a geometry is given by hexagonal lattice
presented in Figure 14.13 (rectangular in the third axis), which can be split into a union
of two rectangular sub-lattices ℒ1 (red) and ℒ2 (blue).
Numerically it is implemented by summation of two tensors via concatenation of
the canonical vectors corresponding to “blue” and “green” lattices, both living on the
same fine 3D Cartesian grid.
14.5 Summation of long-range potentials on 3D lattices with defects | 235
Figure 14.14: Left: Sum of potentials over the hexagonal lattice of the type shown in Figure 14.13.
Right: rotated view.
The following numerical results basically reproduce those in [153]. Figures 14.14 (left
and right) show the resulting potential sum for the hexagonal lattice structure com-
posed of a sum of 7 × 7 × 1 “blue” and 7 × 7 × 1 “green” potentials. The rank of the tensor
representing the sum is two times larger than the rank of the single reference Newton
kernel.
In the case of regularly positioned vacancies, as in Figure 14.15, showing the result
of assembled canonical summation of the grid-based Newton potentials on a lattice
24 × 24 × 1 with 6 × 6 × 1 vacancies (two-level lattice), the resulting tensor rank is only
two times larger than the rank of a single Newton potential.
Figure 14.16 illustrates the situation when defects are located in a compact subdo-
main. It represents the result of assembled canonical sum of the Newton potentials on
L-shaped (left) and O-shaped (right) lattices. The resulting potentials sum for L-shape
lattice is a difference of a full 24 × 18 × 1 lattice and a sublattice of size 12 × 9 × 1. For
O-shape, the resultant tensor is obtained as the difference between the full lattice sum
over 12 × 12 × 1 and central 6 × 6 × 1 clusters. In both cases, the total canonical tensor
rank is two times larger than the rank of the single reference potential.
236 | 14 Tensor-based summation of long-range potentials on finite 3D lattices
For composite shapes of lattice geometries, one can use the canonical-to-Tucker trans-
form to reduce the canonical rank. In the case of complicated geometries, the Tucker
reference tensor for the Newton kernel may be preferable. For example, in the case of
O-shaped domain, the maximal Tucker rank of the resultant tensor is 25, whereas the
respective ranks for rectangular compounds are 17 and 15.
Since the lattice is not necessarily aligned with the 3D representation grid, it is
easy to assemble potentials centered independently on the lattice nodes, for example,
for modeling lattices with insertions having other inter-atomic displacements com-
pared with the main lattice. Figure 14.17 represents the result of assembled canonical
summation of 3D grid-based Newton potentials on a lattice 12 × 12 × 1 with an impurity
of size 2 × 2 × 1 with the interatomic distances different from the main lattice. Since the
impurity potentials are determined on the same fine NL × NL × NL representation grid,
14.6 Interaction energy of the long-range potentials on finite lattices | 237
Figure 14.17: Left: assembled canonical summation of 3D grid-based Newton potentials on a lattice
10 × 10 × 1 with an impurity of size 2 × 2 × 1. Right: the vertical projection.
as
1 Zk Zj
EL = ∑ , i. e., for ‖xj − xk ‖ ≥ b. (14.23)
2 k,j∈𝒦,k=j̸ ‖xj − xk ‖
Notice that local density approximation for long-range and short-range energy func-
tionals have been addressed in [279].
The tensor summation scheme can be directly applied to this computational prob-
lem. For this discussion, we assume that all charges are equal, that is, Zk = Z. First,
notice that the rank-R reference tensor h−3 P̃ defined in (14.4) approximates with high
2 1 ̃ (for ‖x‖ ≥ b that is required for the en-
accuracy O(h ) the Coulomb potential ‖x‖ in Ω L
ergy expression) on the fine 2n×2n×2n representation grid with mesh size h. Likewise,
the tensor h−3 PcL approximates the potential sum vcL (x) on the same fine representa-
tion grid including the lattice points xk .
We evaluate the energy expression (14.23) by using tensor sums as in (14.5), but
now applied to a small sub-tensor of the rank-R canonical reference tensor P, ̃ that is,
P ̃ |x ] ∈ ℝ2L×2L×2L obtained by tracing of P
̃ L := [P ̃ at the accompanying lattice of the
k
double size 2L × 2L × 2L, that is, ℒ ̃ . Here, P
̃L = {xk } ∪ {xk } ∈ Ω ̃ |x denotes the tensor
L k
entry corresponding to the kth lattice point designating the atomic center xk .
We are interested in the computation of the rank-R tensor P ̂ c = [Pc ]k∈𝒦 ∈
L L |xk
ℝL×L×L , where PcL |x denotes the tensor entry corresponding to the kth lattice point
k
̂ c can be computed at the expense O(L2 ) by
on ℒL . The tensor P L
R
̂ c = ∑ ( ∑ 𝒲(k ) p
P L,q ⊗ ∑ 𝒲(k2 ) pL,q ⊗ ∑ 𝒲(k3 ) pL,q ).
L 1
̃ (1) ̃ (2) ̃ (3)
q=1 k1 ∈𝒦 k2 ∈𝒦 k3 ∈𝒦
This leads to the representation of the energy sum (14.23) (with accuracy O(h2 )) in a
form
Z 2 h−3 ̂ ̃ |x =0 ),
EL,T = (⟨PcL , 1⟩ − ∑ P
2 k∈𝒦
k
where the first term in brackets represents the full canonical tensor lattice sum re-
stricted to the k-grid composing the lattice ℒL , whereas the second term introduces
the correction at singular points xj − xk = 0. Here, 1 ∈ ℝL×L×L is the all-ones tensor.
By using the rank-1 tensor P0L = P̃ |x =0 1, the correction term can be represented by a
k
simple tensor operation
̃ |x =0 = ⟨P0L , 1⟩.
∑ P k
k∈𝒦
Z 2 h−3 ̂
EL ≈ EL,T = (⟨PcL , 1⟩ − ⟨P0L , 1⟩), (14.24)
2
14.6 Interaction energy of the long-range potentials on finite lattices | 239
Table 14.4: Comparison of times for the full (Tfull ), (O(L6 )), and tensor-based (Ttens. ) calculation of the
interaction energy sum for the lattice electrostatic potentials.
N
P(x) = ∑ν=1 Zν p(‖x − sν ‖), Zν ∈ ℝ, sν , x ∈ Ω = [−b, b]3 , (15.1)
https://doi.org/10.1515/9783110365832-015
242 | 15 Range-separated tensor format for many-particle systems
leads to computationally intensive numerical task. Indeed, the generating radial ba-
sis function p(‖x‖) is allowed to have a slow polynomial decay in 1/‖x‖ so that each
individual term in (15.1) contributes essentially to the total potential at each point in
Ω, thus predicting the O(N) complexity for the straightforward summation at every
fixed target x ∈ ℝ3 . Moreover, in general, the function p(‖x‖) has a singularity or a
cusp at the origin x = 0. Typical examples of the radial basis function p(‖x‖) are given
by the Newton 1/‖x‖, Slater e−λ‖x‖ , Yukawa e−λ‖x‖ /‖x‖, and other Green’s kernels (see
examples in Section 15.3.1).
The important ingredient of the RS approach is the splitting of a single reference
potential, say p(‖x‖) = 1/‖x‖, into a sum of localized and long-range low-rank canoni-
cal tensors represented on the grid Ωn . In this regard, it can be shown that the explicit
sinc-based canonical tensor decomposition of the generating reference kernel p(‖x‖)
by a sum of Gaussians implies the distinct separation of its long- and short-range parts.
Such range separation techniques can be gainfully applied to summation of a
large number of generally distributed potentials in (15.1). Indeed, a sum of the long-
range contributions can be represented by a single tensor leaving on the Ωn ⊂ Ω grid
by using the canonical-to-Tucker transform [174], which returns this part in the form
of a low-rank Tucker tensor. Hence, the smooth long-range contribution to the overall
sum is represented on the fine n × n × n grid Ωn in O(n) storage via the global canonical
or Tucker tensor with the separation rank that only weakly (logarithmically) depends
on the number of particles N. This important feature is confirmed by numerical tests
for the large clusters of generally distributed potentials in 3D; see Section 15.2.
In turn, the short-range contribution to the total sum is constructed by using a
single low-rank reference tensor with a small local support selected from the “short-
range” canonical vectors in the tensor decomposition of p(‖x‖). To that end, the whole
set of N short-range clusters is represented by replication and rescaling of the small-
size localized reference tensor, thus reducing the storage to the O(1)-parametrization
of the reference canonical tensor, and the list of coordinates and charges of particles.
Representation of the short-range part over n × n × n grid needs O(N n) computational
work for N-particle system. Such cumulated sum of the short-range components al-
lows the “local operations” in the RS-canonical format, making it particularly efficient
for tensor multilinear algebra.
The RS tensor formats provide a tool for the efficient numerical treatment of inter-
action potentials in many-particle systems, which, in some aspects, can be considered
as an alternative to the well-established multipole expansion method [103]. The par-
ticular benefit is the low-parametric representation of the collective interaction poten-
tial on large 3D Cartesian grid in the whole computational domain in the linear cost
O(n), thus outperforming the grid-based summation techniques based on the full-grid
O(n3 )-representation in the volume. Both global and local summation schemes are
quite easy in program implementation. The prototype algorithms in MATLAB applied
on a laptop allow computing the electrostatic potential of large many-particle systems
on fine grids of size up to n3 = 1012 .
15.1 Tensor splitting of the kernel into long- and short-range parts | 243
with
𝒯l := {tk | k = 0, 1, . . . , Rl } and 𝒯s := {tk | k = Rl + 1, . . . , M}. (15.2)
The set 𝒯l includes quadrature points tk condensed “near” zero, hence generating the
long-range Gaussians (low-pass filters), and 𝒯s accumulates the increasing in M → ∞
sequence of “large” sampling points tk with the upper bound C02 log2 (M), correspond-
ing to the short-range Gaussians (high-pass filters). The quasi-optimal choice of the
constant C0 ≈ 3 was determined numerically in [30]. We further denote
𝒦l := {k | k = 0, 1, . . . , Rl } and 𝒦s := {k | k = l + 1, . . . , M}.
where
PRs = ∑ p(1)
k
⊗ p(2)
k
⊗ p(3)
k
, PRl = ∑ p(1)
k
⊗ p(2)
k
⊗ p(3)
k
. (15.3)
tk ∈𝒯s tk ∈𝒯l
or
−tk2 x2 2 2
(B) 𝒯s := {tk : ak ∫ e dx ≤ δ} ⇔ Rl = min k : ak ∫ e−tk x dx ≤ δ. (15.5)
Bσ Bσ
Clearly, the sphere Bσ can be subsituted by the small box of the corresponding size.
The quantitative estimates on the value of Rl can be easily calculated by using the
explicit equation (6.3) for the quadrature parameters. For example, in case C0 = 3 and
a(t) = 1, criteria (A) implies that Rl solves the equation
2
3Rl log M h
( ) σ 2 = log( M ).
M δ
Criteria (15.4) and (15.5) can be slightly modified, depending on the particular applica-
tions to many-particles systems. For example, in electronic structure calculations, the
parameter σ can be associated with the typical inter-atomic distance in the molecular
system of interest (Van der Waals distance).
Figures 15.1 and 15.2 illustrate the splitting (15.2) for the tensor PR computed on
the n × n × n grid with the parameters R = 20, Rl = 12 and Rs = 8, respectively. Fig-
ure 15.1 shows the long-range canonical vectors from PRl in (15.3), whereas Figure 15.2
displays the short-range part described by PRs . Following criteria (A) with δ ≈ 10−4 ,
the effective support for this splitting is determined by σ = 0.9. The complete New-
ton kernel simultaneously resolves both the short- and long-range behavior, whereas
the function values of the tensor PRs vanish exponentially fast apart from the effective
support, as can be seen in Figure 15.2.
Inspection of the quadrature point distribution in (6.3) shows that the short- and
long-range subsequences are distributed nearly equally balanced, so that one can ex-
pect approximately
Rs ≈ Rl = M/2. (15.6)
Figure 15.1: Long-range canonical vectors for n = 1024, R = 20, Rl = 12, and the corresponding
potential.
15.2 Tensor summation of range-separated potentials | 245
Figure 15.2: Short-range canonical vectors for n = 1024, R = 20, Rs = 8, and the corresponding
potential.
The optimal choice may depend on the particular application specified by the separa-
tion parameter σ > 0 and the required accuracy.
The main advantage of the range separation in the splitting to the canonical tensor
PR in (15.3) is the opportunity for independent tensor representations of both sub-
tensors PRs and PRl , which leads to simultaneous reduction of their complexity and
storage demands. Indeed, the effective local support characterized by σ > 0 includes
a much smaller number of grid points ns ≪ n compared with the global grid size.
Hence, the storage cost Stor(PRs ) for the canonical tensor representation of the short-
range part is estimated by
Stor(PRs ) ≤ Rs ns ≪ Rn.
Stor(PRl ) ≤ Rl nl ≪ Rn.
locations in the 3D volume. This task leads to the bottleneck computational problem
in the modeling of large stationary and dynamical N-particle systems.
One of the main limitations for the use of direct grid-based canonical/Tucker approxi-
mations to the large potential sums is due to the strong increase in tensor rank propor-
tionally to the number of particles N0 in a system. Figures 15.3 and 15.5 show the Tucker
ranks for electrostatic potential in the protein-type system consisting of N0 = 783
atoms.
Figure 15.3: The directional Tucker ranks computed by RHOSVD for a protein-type system with
n = 1024 (left) and n = 512 (right).
Given the generating kernel p(‖x‖), we consider the problem of efficiently calculating
the weighted sum of a large number of single potentials located in a set 𝒮 of separable
distributed points (sources) sν ∈ ℝ3 , ν = 1, . . . , N0 , embedded into the fixed bounding
box Ω = [−b, b]3 ,
N0
P0 (x) = ∑ zν p(‖x − sν ‖), zν ∈ ℝ. (15.7)
ν=1
The function p(‖x‖) is allowed to have slow polynomial decay in 1/‖x‖ so that each
individual source contributes essentially to the total potential at each point in Ω.
A family of point sets {𝒮1 , . . . , 𝒮m } is called uniformly σ∗ -separable if (15.8) holds for
every set 𝒮m , m = 1, 2, . . . , m, independently of the number of particles in a set 𝒮m .
15.2 Tensor summation of range-separated potentials | 247
Figure 15.4: Inter-particle distances in an ascendant order for protein-type structure with 500 parti-
cles (left); zoom for the first 100 smallest inter-particle distances (right).
Figure 15.4 (left) shows inter-particle distances in ascending order for a protein-type
structure including 500 particles. The total number of distances equals to N(N − 1)/2,
where N is the number of particles. Figure 15.4 (right) indicates that the number of
particles with small inter-particle distances is very moderate. In particular, for this
example, the number of pairs with interparticle distances less than 1 Å is about 0.04 %
(≈110) of the total number of 2.495 ⋅ 105 distances.
For ease of presentation, we further confine ourselves to the case of electrostatic
1
potentials described by the Newton kernel p(‖x‖) = ‖x‖ .
First, we describe the tensor summation method for calculating the collective inter-
action potential of a multi-particle system that includes only the long-range contri-
bution from the generating kernel. We introduce the n × n × n rectangular grid Ωn in
Ω = [−b, b]3 and the auxiliary 2n × 2n × 2n grid on the accompanying domain Ω̃ = 2Ω
248 | 15 Range-separated tensor format for many-particle systems
of double size. Conventionally, the canonical rank-R tensor representing the Newton
kernel (by projecting onto the n × n × n grid) is denoted by PR ∈ ℝn×n×n ; see (6.6).
Consider the splitting (15.3) applied to the reference canonical tensor PR and to its
extended version P̃ R = [p
̃ R (i1 , i2 , i3 )], iℓ ∈ Iℓ , ℓ = 1, 2, 3 such that
P
̃R = P ̃ R ∈ ℝ2n×2n×2n .
̃R + P
s l
For technical reasons, we further assume that the tensor grid Ωn is fine enough, such
that all charge centers 𝒮 = {sν } specifying the total electrostatic potential in (15.7)
belong to the set of grid points, that is, sν = (sν,1 , sν,2 , sν,3 )T = h(j1(ν) , j2(ν) , j3(ν) )T ∈ Ωh with
some indices 1 ≤ j1(i) , j2(i) , j3(i) ≤ n.
The total electrostatic potential P0 (x) in (15.7) is represented by a projected ten-
sor P0 ∈ ℝn×n×n , which can be constructed by a direct sum of shift-and-windowing
transforms of the reference tensor P ̃ R (see Chapter 14 for more detail):
N0 N0
P0 = ∑ zν 𝒲ν (P
̃ R ) = ∑ zν 𝒲ν (P
̃R + P
s
̃ R ) =: Ps + Pl .
l
(15.10)
ν=1 ν=1
Notice that the Tucker rank of the full tensor sum P0 increases almost proportionally to
the number N0 of particles in the system (see Figure 15.5) representing singular values
of the side matrix in the canonical tensor P0 . On the other hand, the canonical rank
of the tensor P0 shows up the pessimistic bound ≤ R N0 .
To overcome this difficulty, in what follows, we consider the global tensor decom-
position of only the long-range part in the tensor P0 , defined by
N0 N0
Pl = ∑ zν 𝒲ν (P
̃ R ) = ∑ zν 𝒲ν ( ∑ p
l
̃ (1)
k
⊗p
̃ (2)
k
⊗p
̃ (3)
k
). (15.11)
ν=1 ν=1 k∈𝒦l
The initial canonical rank of the tensor Pl equals to Rl N0 , and, again, it may increase
dramatically for a large number of particles N0 . Since by construction the tensor Pl
approximates rather smooth function on the domain Ω, one may expect that the large
initial rank can be reduced considerably to some value R∗ , which remains almost inde-
pendent of N0 . The same beneficial property can be expected for the Tucker rank of Pl .
The principal ingredient of our tensor approach is the rank reduction in the initial
canonical sum Pl by application of RHOSVD and the multigrid accelerated canonical-
to-Tucker transform [174].
15.2 Tensor summation of range-separated potentials | 249
Figure 15.5: Mode-1 singular values of the side matrix in the full potential sum vs. the number of
particles N0 = 200, 400, 774 and grid-size n: n = 512 (left), n = 1024 (right).
To simplify the exposition, we suppose that the tensor entries in Pl are computed by
collocation of Gaussian sums at the centers of the grid-cells. This provides the repre-
sentation that is very close to that obtained by (6.6).
x2
We consider the Gaussian in normalized form Gp (x) = e so that the relation
−
2p2
2
−tk2 x2 − x2 1
e = e2p holds, that is, we set tk = √2pk
with tk = khM , k = 0, 1, . . . , M, where
hM = C0 log M/M. Now criterion (B) on the bound of the L1 -norm (see (15.5)) reads
x2
∞
−
2p2 ε
ak ∫ e k ≤ < 1, ak = hM .
2
a
The following theorem proves the important result justifying the efficiency of
range-separated formats applied to a class of radial basis functions p(r): the Tucker
ε-rank of the long-range part in accumulated sum of potentials computed in the
bounding box Ω = [−b, b]3 remains almost uniformly bounded in the number of
particles N0 (but depends on the size b of the domain).
Theorem 15.2 ([24]). Let the long-range part Pl in the total interaction potential (see
(15.11)) correspond to the choice of splitting parameter in (15.6) with M = O(log2 ε).
Then the total ε-rank r0 of the Tucker approximation to the canonical tensor sum Pl is
bounded by
Proof. The proof can be sketched by the following steps: First, we represent all shifted
Gaussian functions contributing to the total sum in the fixed set of basis functions by
using truncated Fourier series. Second, we prove that on the “long-range” index set
k ∈ 𝒯l the parameter pk remains uniformly bounded in N0 from below, implying the
250 | 15 Range-separated tensor format for many-particle systems
uniform bound on the number of terms in the ε-truncated Fourier series. Finally we
take into account that the summation of elements presented in the fixed Fourier basis
set does not enlarge the Tucker rank, but only affects the Tucker core. The dependence
on b appears in the explicit form.
Specifically, let us consider the rank-1 term in the splitting (15.3) with maximal
index k ∈ 𝒯l . Taking into account the asymptotic choice M = log2 ε (see (6.5)), where
ε > 0 is the accuracy of the sinc-quadrature, relation (15.6) implies
M
max tk = Rl hM = C log(M)/M ≈ log(M) = 2 log(|log(ε)|). (15.12)
k∈𝒯l 2 0
Now we consider the Fourier transform of the univariate Gaussian on [−b, b],
2 M ∞
−x πmx πmx
Gp (x) = e 2p2 = ∑ αm cos( ) + η, with |η| = ∑ αm cos( ) < ε,
m=0
b
m=M+1 b
where
x2
b b
∫−b e cos( πmx
−
2p2
b
)dx πmx 2b if m = 0,
αm = with |Cm |2 = ∫ cos2 ( )dx = {
|Cm |2 b b otherwise.
−b
√2 b p √2 b p 1
m0 ≥ log0.5 ( )= log0.5 ( ).
π p (1 + |CM |2 )ε π p 1+bε
On the other hand, (15.12) implies
which requires only the double number of terms compared with the single Gaussian
analyzed above. To compensate the possible increase in | ∑ν ην |, we refine ε → ε/N0 .
These estimates also apply to all Gaussian functions presented in the long-range sum
since they have larger values of pk than pRl . Indeed, in view of (15.6), the number of
summands in the long-range part is of order Rl = M/2 = O(log2 ε). Combining these
arguments with (15.13) proves the resulting estimate.
15.2 Tensor summation of range-separated potentials | 251
Figure 15.6 illustrates the very fast decay of the Fourier coefficients for the “long-
range” discrete Gaussians sampled on n-point grid (left) and the slow decay of Fourier
coefficients for the “short-range” Gaussians (right). In the latter case, almost all the
coefficients remain essential, resulting in the full rank decomposition. The grid size is
chosen as n = 1024.
Figure 15.6: Fourier coefficients of the long- (left) and short-range (right) discrete Gaussians.
Remark 15.3. Notice that for fixed σ > 0 the σ-separability of the point distributions
(see Definition 15.1) implies that the volume size of the computational box [−b, b]3
should increase proportionally to the number of particles N0 , i. e., b = O(N01/3 ). Hence,
Theorem 15.2 indicates that the number of entries in the Tucker core of size r1 × r2 × r3
can be estimated by CN0 . This asymptotic cost remains of the same order in N0 as that
for the short-range part in the potential sum.
Figure 15.7 (left) illustrates that the singular values of side matrices for the long-
range part (by choosing Rl = 12) exhibit fast exponential decay with a rate indepen-
dent of the number of particles N0 = 214, 405, 754. Figure 15.7 (right) zooms into the
first 50 singular values, which are almost identical for different values of N0 . The fast
decay in these singular values guarantees the low-rank RHOSVD-based Tucker decom-
position of the long-range part in the potential sum.
Table 15.1 shows the Tucker ranks of sums of long-range ingredients in the elec-
trostatic potentials for the N-particle clusters. The Newton kernel is generated on the
grid with n3 = 10243 in the computational box of volume size b3 = 403 Å, with accu-
racy ε = 10−4 and canonical rank 21. Particle clusters with 200, 400, and 782 atoms
are taken as a part of protein-like multiparticle system. The clusters of size 1728 and
4096 correspond to the lattice structures of sizes 12 × 12 × 12 and 16 × 16 × 16, with
randomly generated charges. The line “RS-canonical rank” shows the resulting rank
after the canonical-to-Tucker and Tucker-to-canonical transform with εC2T = 4 ⋅ 10−5
and εT2C = 4 ⋅ 10−6 . Figures 15.8 show the accuracy of the RS-canonical tensor ap-
proximation for a multiparticle cluster of 400 particles at the middle section of the
252 | 15 Range-separated tensor format for many-particle systems
Figure 15.7: Mode-1 singular values of side matrices for the long range part (Rl = 12) in the total
potential vs. the number of particles N (left), and zoom of the first singular values (right).
Table 15.1: Tucker ranks and the RS canonical rank of the multiparticle potential sum vs. the number
of particles N for varying parameters Rℓ and Rs . Grid size n3 = 10243 .
Figure 15.8: Top: the potential sum at the middle plane of a cluster with 400 atoms (left) and the
error of the RS-canonical approximation (right). Bottom: long-range part of a sum (left); short range
part of a sum (right).
Figure 15.9: Example of potential surface at level z = 0 (left) for a sum of N0 = 200 particles com-
puted using only their long-range parts with Rl = 12. Decay in singular values of the side matrices for
the canonical tensor representing sums of long-range parts for Rl = 10, 11, and 12.
almost linear scaling of CPU time in the number of particles and in the univariate grid-
size n of the n × n × n representation grid. The last column shows the resulting ranks
of side matrices U (ℓ) in the canonical tensor U, (see (15.16)). The asymptotically opti-
254 | 15 Range-separated tensor format for many-particle systems
Table 15.2: Times (sec) for canonical-to-Tucker rank reduction vs. number of particles N and grid
size n3 .
mal complexity scaling of the RS decomposition and the required storage is the main
motivation for applications of the RS tensor format.
Remark 15.4. The second class of all positive vectors ensures the stability of RHOSVD
for problems such as (15.7) in the case of all positive (negative) weights (see Lem-
ma 14.10 and discussions thereafter).
The idea regarding how to get rid of the “curse of ranks”, the critical bottleneck
in applying tensor methods to problems such as (15.7), is suggested by results in The-
orem 15.2 on the almost uniform bound (in the number of particles N0 ) of the Tucker
rank for the long-range part of a multi-particle potential. Thanks to this beneficial
property, the new range-separated (RS) tensor formats was introduced in [24]. It is
based on the aggregated composition of global low-rank canonical/Tucker tensors
with the locally supported canonical tensors living on non-intersecting sub-sets in-
dices embedded into the large corporate multi-index set ℐ = I1 × ⋅ ⋅ ⋅ × Id , Iℓ = {1, . . . , n}.
Such a parametrization attempts to represent the large multidimensional arrays with
a storage cost linearly proportional to the number of cumulated inclusions (sub-
tensors).
The structure of the range-separated canonical/Tucker tensor formats is specified
by a composition of the local-global low parametric representations, which provide
good approximation features in application to the problems of grid-based representa-
tion to many-particle interaction potentials with multiple singularities.
15.2 Tensor summation of range-separated potentials | 255
Figure 15.10: Schematic illustration of effective supports of the cumulated canonical tensor (left);
short-range canonical vectors for k = 1, . . . , 11, presented in logarithmic scale (right).
Definition 15.5 (Cumulated canonical tensors, [24]). Given the index set ℐ , a set of
multi-indices (sources) 𝒥 = {j(ν) := (j1(ν) , j2(ν) , . . . , jd(ν) )}, ν = 1, . . . , N0 , jℓ(ν) ∈ Iℓ , and the
width index parameter γ ∈ ℕ such that the γ-vicinity of each point j(ν) ∈ 𝒥 , that is,
𝒥γ(ν) := {j : |j − j(ν) | ≤ γ}, does not intersect all others:
𝒥γ
(ν)
∩ 𝒥γ(ν ) = ⌀, ν ≠ ν .
̂ = ∑N0 cν Uν
U with rank(Uν ) ≤ R0 , (15.14)
ν=1
where the rank-R0 canonical tensors Uν = [uj ] are vanishing beyond the γ-vicinity
of j(ν) :
The separation criteria in Definition 15.5 leads to a rather “aggressive” strategy for
selecting the short-range part PRs in the reference canonical tensor PR allowing easy
implementation of the cumulated canonical tensor (non-overlapping case). However,
in some cases, this may lead to overestimation of the Tucker/canonical rank in the
long-range tensor component. To relax the criteria in Definition 15.5, we propose the
“soft” strategy that allows including a few (i. e., O(1) for large N0 ) neighboring particles
into the local vicinity 𝒥γ(ν) of the source point sν , which can be achieved by increasing
the overlap parameter γ > 0. This allows controlling the bound on the rank param-
eter in the long-range tensor almost uniformly in the system size N0 . The following
example illustrates this issue.
Example 15.6. Assume that the separation distance is equal to σ∗ = 0.8 Å, corre-
sponding to the example in Figure 15.4 (right), and the given computational threshold
is ε = 10−4 . Then we find from Figure 15.10 (right) that the “aggressive” criteria in
Definition 15.5 lead to choosing Rs = 10, since the value of the canonical vector with
k = 11 at point x = σ∗ is about 10−3 . Hence, in order to control the required rank
parameter Rl , we have to extend the overlap area to larger parameter σ∗ and, hence,
to larger γ. This will lead to a small O(1)-overlap between supports of the short-range
tensor components, but without asymptotic increase in the total complexity.
Table 15.3 represents the Tucker ranks r = (r1 , r2 , r3 ) for the long-range parts of
N0 -particle potentials. The reference Newton kernel is approximated on a 3D grid of
size 20483 with the rank R = 29 and accuracy ε𝒩 = 10−5 . Here, the Tucker tensor is
computed with the stopping criteria εT2C = 10−5 in the ALS iteration. It can be seen
that for fixed Rl , the Tucker ranks increase only moderately in the system size N0 .
Table 15.3: Tucker ranks r = (r1 , r2 , r3 ) for the long-range parts of N0 -particle potentials.
N0 / Rl 8 9 10 11 12 13
Definition 15.7 (Uniform CCT tensors, [24]). A CCT tensor in (15.14) is called uniform if
R0
all components Uν are generated by a single rank-R0 tensor U0 = ∑m=1 μm û (1)
m ⊗⋅ ⋅ ⋅⊗ um
̂ (d)
such that Uν |𝒥 (ν) = U0 .
δ
Now, we are in a position to define the range separated canonical and Tucker ten-
sor formats in ℝn1 ×⋅⋅⋅×nd . The RS canonical format is defined as follows.
15.2 Tensor summation of range-separated potentials | 257
Definition 15.8 (RS-canonical tensors, [24]). The RS-canonical tensor format specifies
the class of d-tensors A ∈ ℝn1 ×⋅⋅⋅×nd , which can be represented as a sum of a rank-R
canonical tensor U ∈ ℝn1 ×⋅⋅⋅×nd and a (uniform) cumulated canonical tensor generated
by U0 with rank(U0 ) ≤ R0 as in Definition 15.7 (or more generally in Definition 15.5):
R N0
A = ∑ ξk u(1)
k
⊗ ⋅ ⋅ ⋅ ⊗ u(d)
k
+ ∑ cν Uν , (15.16)
k=1 ν=1
which label all short-range tensors Uν , including the grid-point i within its effective
support.
Given i ∈ ℐ , denote by ui the row-vector with index iℓ in the side matrix U (ℓ) ∈ ℝnℓ ×R ,
(ℓ)
ℓ
and let ξ = (ξ1 , . . . , ξd ). Then the ith entry of the RS-canonical tensor A = [ai ] can be
calculated as a sum of long- and short-range contributions by
ai = (⊙dℓ=1 ui )ξ T + ∑ cν Uν (i)
(ℓ)
ℓ
ν∈ℒ(i)
Proof. Definition 15.8 implies that each RS-canonical tensor is uniquely defined by
the following parametrization: rank-R canonical tensor U, the rank-R0 local reference
canonical tensor U0 with mode-size bounded by 2γ, and list 𝒥 of the coordinates and
weights of N0 particles. Hence, the storage cost directly follows. To justify the represen-
tation complexity, we notice that by well-separability assumption (see Definition 15.1),
we have #ℒ(i) = O(1) for all i ∈ ℐ . This proves the complexity bounds.
Definition 15.10 (RS-Tucker tensors, [24]). The RS-Tucker tensor format specifies the
class of d-tensors A ∈ ℝn1 ×⋅⋅⋅×nd , which can be represented as a sum of a rank-r Tucker
tensor V and a (uniform) cumulated canonical tensor generated by U0 with rank(U0 ) ≤
R0 as in Definition 15.7 (or more generally in Definition 15.5):
N0
A = β ×1 V (1) ×2 V (2) ⋅ ⋅ ⋅ ×d V (d) + ∑ cν Uν , (15.17)
ν=1
where the tensor Uν , ν = 1, . . . , N0 , has local support, that is, diam(supp Uν ) ≤ 2γ.
258 | 15 Range-separated tensor format for many-particle systems
Similar to Lemma 15.9, the corresponding statement for the RS-Tucker tensors can
be proven.
Lemma 15.11 ([24]). The storage size for RS-Tucker tensor does not exceed
Proof. In view of Definition 15.10, each RS-Tucker tensor is uniquely defined by the
following parametrization: the rank-r = (r1 , . . . , rd ) Tucker tensor V ∈ ℝn1 ×⋅⋅⋅×nd , the
rank-R0 local reference canonical tensor U0 with diam(suppU0 ) ≤ 2γ, list 𝒥 of the
coordinates of N0 centers of particles, {sν }, and N0 weights {cν }. This proves the com-
plexity bounds.
N
(C) ‖U‖ = ∑ν=1
0
cν ‖Uν ‖.
If R0 = 1, that is, U
̂ is the conventional rank-N0 canonical tensor, then property (B)
in Proposition 15.12 leads to the definition of orthogonal canonical tensors in [192].
Hence, in case R0 > 1, we arrive at the generalization further called the block orthogo-
nal canonical tensors.
15.2 Tensor summation of range-separated potentials | 259
where σℓ,k denote the singular values of the side matrices U (ℓ) ; see (3.34).
Proof. We apply the general error estimate for RHOSVD approximation [174] to obtain
1/2 N R 1/2
3 min(n,R ) 0 0
0 2 2 2
U − U(r) ≤ C ∑ ( ∑ σℓ,k ) ( ∑ ∑ cν μm )
̂
ℓ=1 k=rℓ +1 ν=1 m=1
and then take into account the property (C), Proposition 15.12 to estimate
N0 R0 N0 R0 N0
∑ ∑ cν2 μ2m = ∑ cν2 ∑ μ2m ≤ C ∑ cν2 ‖Uν ‖2 = C‖U‖
̂ 2,
ν=1 m=1 ν=1 m=1 ν=1
The stability assumption in Lemma 15.13 is satisfied in the case of the constructive
canonical tensor approximation to the Newton and other types of Green’s kernels ob-
tained by sinc-quadrature based representations, where all canonical skeleton vectors
are non-negative and monotone.
Remark 15.14. In the case of higher dimensions d > 3, the local canonical tensors
can be combined with the global tensor train (TT) format [226] such that the simple
canonical-to-TT transform can be applied. In this case, the RS-TT format can be intro-
duced as a set of tensor represented as a sum of CCT term and the global TT-tensor.
The complexity and structural analysis is completely similar to those in the case of
the RS-Canonical and RS-Tucker formats.
be realized efficiently: (a) storage of a tensor; (b) real space representation on a fine
rectangular grid; (c) summation of many-particle interaction potentials represented
on the fine tensor grid; (d) computation of scalar products; and (e) computation of
gradients and forces.
Estimates on the storage complexity for the RS-canonical and RS-Tucker formats
were presented in Lemmas 15.9 and 15.11. Items (b) and (c) were addressed earlier. Cal-
culation of the scalar product of two RS-canonical tensors in the form (15.16), defined
on the same set 𝒮 of particle centers, can be reduced to the standard calculation of the
cross scalar products between all elementary canonical tensors presented in (15.16).
Hence, the numerical cost can be estimated by O( 21 R(R − 1)dn + 2γRR0 N0 ).
Here we and briefly describe the model reduction approach to the problem of multi-
dimensional data fitting based on the RS tensor approximation. The problems of mul-
tidimensional scattered data modeling and data mining are known to lead to compu-
tationally intensive simulations. We refer to [42, 141, 34, 84, 129] for the discussion of
most commonly used computational approaches in this field of numerical analysis.
The mathematical problems in scattered data modeling are concerned with the
approximation of multi-variate function f : ℝd → ℝ (d ≥ 2) by using samples given at
certain finite set 𝒳 = {x1 , . . . , xN } ⊂ ℝd of pairwise distinct points; see, e. g., [42]. The
function f may describe the surface of a solid body, the solution of a PDE, many-body
potential field, multiparametric characteristics of physical systems, or some other
multidimensional data.
In a particular problem setting, one may be interested in recovering f from a given
sampling vector f|𝒳 = (f (x1 ), . . . , f (xN )) ∈ ℝN . One of the traditional ways to tackle this
problem is based on constructing a suitable functional interpolant PN : ℝd → ℝ,
15.3 Outline of possible applications | 261
or approximating the sampling vector f|𝒳 on the set 𝒳 in the least squares sense. We
consider the approach based on using radial basis functions (RBFs) providing the tra-
ditional tools for multivariate scattered data interpolation. To that end, the radial ba-
sis function (RBF) interpolation approach deals with a class of interpolants PN in the
form
N
PN (x) = ∑ cj p(‖x − xj ‖) + Q(x), Q is some smooth function, (15.20)
j=1
For our tensor-based approach, the common feature of all these function classes is
the existence of low-rank tensor approximations to the grid-based discretization of the
RBF p(‖x‖) = p(x1 , . . . , xd ), x ∈ ℝd , where we set r = ‖x‖. We can add to the above exam-
ples a few examples of traditional RBFs functions commonly used in quantum chem-
istry, such as the Coulomb potential 1/r, Slater function exp(−λr), Yukawa potential
exp(−λr)/r, and the class of Matérn RBFs, traditionally applied in stochastic modeling
[219, 206]. Other examples are given by the Lennard-Jones (Van der Waals), dipole–
dipole interaction, and Stokeslet potentials (see [205]), given by p(r) = 4ϵ[( σr )12 −( σr )6 ],
p(r) = r13 , and 3 × 3 matrix P(‖x‖) = I/r + (xxT )/r 3 for x ∈ ℝ3 , respectively.
In the context of numerical data modeling, we shall focus on the following com-
putational tasks:
(A) Fixed coefficient vector c = (c1 , . . . , cN )T ∈ ℝN : the efficient representation and
storage of the interpolant in (15.20), sampled on fine tensor grid in ℝd , that al-
lows the O(1)-fast point evaluation of PN in the whole volume Ω and computation
of various integral-differential operations on that interpolant, such as gradients,
forces, scalar products, convolution integrals, etc.
(B) Finding the coefficient vector c that solves the interpolation problem (15.19).
We look on the problems (A) and (B) with the intent to apply the RS tensor representa-
tion to the interpolant PN (x). The point is that the representation (15.20) can be viewed
262 | 15 Range-separated tensor format for many-particle systems
q𝒳 /h𝒳 ,Ω → max .
We choose the set of points 𝒳 as a subset of the n⊗ square grid Ωh with the mesh-
size h = 1/(n − 1), such that the separation distance satisfies σ∗ = q𝒳 ≥ αh, α ≥ 1. Here,
N ≤ N0 = nd . The square grid Ωh is an example of the almost optimal point set (see the
discussion in [141]). The construction below also applies to nonuniform rectangular
grids.
Now, we are in a position to apply the RS tensor representation to the total inter-
polant PN . Let PR be the n × n × n (say, for d = 3) rank-R tensor representing the RBF
p(‖ ⋅ ‖), which allows the RS splitting by (15.3) generating the global RS representation
(15.10). Then PN can be represented by the tensor PN in the RS-Tucker (15.17) or RS-
canonical (15.16) formats. The storage cost scales linear in both N and n, O(N + dRl n).
Problem (B). The interpolation problem (15.19) reduces to solve the linear system
of equations for unknown coefficient vector c = (c1 , . . . , cN )T ∈ ℝN ,
with the symmetric matrix Ap,𝒳 . Here, without loss of generality, we assume that the
RBF p(‖ ⋅ ‖) is continuous. The solvability conditions for the linear system (15.21) with
the matrix Ap,𝒳 are discussed, for example, in [42]. We consider two principal cases.
Case (A). We assume that the point set 𝒳 coincides with the set of grid-points in Ωh ,
that is, N = nd . Introducing the d-tuple multi-index i = (i1 , . . . , id ) and j = (j1 , . . . , jd ),
we reshape the matrix Ap,𝒳 into the tensor form
d
Ap,𝒳 → A = [a(i1 , j1 , . . . , id , jd )] ∈ ⨂ ℝn×n ,
ℓ=1
on the grid Ωh . Splitting the rank-R canonical tensor PR into a sum of short- and long-
range terms
Rl
PR = PRs + PRl with PRl = ∑ p(1)
k
⊗ ⋅ ⋅ ⋅ ⊗ p(d)
k
k=1
allows representing the matrix A in the RS form as a sum of low-rank canonical tensors
A = ARs + ARl . Here, the first one corresponds to the diagonal (nearly diagonal in the
case of “soft” separation strategy) matrix by assumption on the locality of PRs . The
second matrix takes the form of Rl -term Kronecker product sum
Rl
ARl = ∑ A(1)
k
⊗ ⋅ ⋅ ⋅ ⊗ A(d)
k
,
k=1
Consider the calculation of the interaction energy (IE) for a charged multiparticle
system. In the case of lattice-structured systems, the fast tensor-based computation
scheme for IE was described in [152]. Here we follow [24].
264 | 15 Range-separated tensor format for many-particle systems
Recall that the interaction energy of the total electrostatic potential generated by
the system of N charged particles located at xk ∈ ℝ3 (k = 1, . . . , N) is defined by the
weighted sum
1 N N
zk
EN = EN (x1 , . . . , xN ) = ∑ zj ∑ , (15.22)
2 j=1 k=1,k=j̸ ‖xj − xk ‖
where zk denotes the particle charge. Letting σ > 0 be the minimal physical dis-
tance between the centers of particles, we arrive at the σ-separable systems (see
Definition 15.1). The double sum in (15.22) applies only to the particle positions
‖xj − xk ‖ ≥ σ. Hence, the quantity in (15.22) is computable also for singular kernels
such as p(r) = 1/r.
We observe that the quantity of interest EN can be recast in terms of the intercon-
nection matrix Ap,𝒳 defined by (15.21) with p(r) = 1/r, 𝒳 = {x1 , . . . , xN },
1
EN = ⟨(Ap,𝒳 − diag Ap,𝒳 )z, z⟩, where z = (z1 , . . . , zN )T . (15.23)
2
Hence, EN can be calculated by using the approach already addressed in the previous
section.
To fix the idea, we recall that the reference canonical tensor PR approximating the
single Newton kernel on an n×n×n tensor grid Ωh in the computational box Ω = [−b, b]3
is represented by (6.6), where h > 0 is the fine mesh size. For ease of exposition, we
further assume that the particle centers xk are located exactly at some grid points in
Ωh (otherwise, an additional approximation error may be introduced) such that each
point xk inherits some multi-index ik ∈ ℐ , and the origin x = 0 corresponds to the
central point n0 = (n/2, n/2, n/2) on the grid. In turn, the canonical tensor P0 approxi-
mating the total interaction potential PN (x) (x ∈ Ω) for the N-particle system,
N
zk
PN (x) = ∑ ⇝ P0 = Ps + Pl ∈ ℝn×n×n ,
k=1
‖x − xk ‖
Lemma 15.15 ([24]). Let the effective support of the short-range components in the ref-
erence potential PR not exceed σ > 0. Then the interaction energy EN of the N-particle
system can be calculated by using only the long-range part in the total potential sum
15.3 Outline of possible applications | 265
1 N
EN = EN (x1 , . . . , xN ) = ∑ z (P (x ) − zj PRl (x = 0)) (15.24)
2 j=1 j l j
Proof. Similarly to [152], where the case of lattice-structured systems was analyzed, we
show that the interior sum in (15.22) can be obtained from the tensor P0 traced onto
the centers of particles xk , where the term corresponding to xj = xk is removed:
N
zk
∑ ⇝ P0 (xj ) − zj PR (x = 0).
k=1,k =j̸
‖xj − xk ‖
Here, the value of the reference canonical tensor PR (see (6.6)) is evaluated at the origin
x = 0, i. e., corresponding to the multi-index n0 = (n/2, n/2, n/2). Hence, we arrive at
the tensor approximation
1 N
EN ⇝ ∑ z (P (x ) − zj PR (x = 0)). (15.25)
2 j=1 j 0 j
Now, we split P0 into the long-range part (15.11) and the remaining short-range po-
tential to obtain P0 (xj ) = Ps (xj ) + Pl (xj ), and the same for the reference tensor PR . By
assumption, the short-range part Ps (xj ) at point xj in (15.25) consists only of the local
term PRs (x = 0) = zj PR (x = 0). Due to the corresponding cancellations in the right-
hand side of (15.25), we find that EN depends only on Pl , leading to the final tensor
representation in (15.24).
We arrive at the linear complexity scaling O(dRl N) taking into account the O(dRl )
cost of the point evaluation for the canonical tensor Pl .
Table 15.4 presents the error of energy computation by (15.25) by using the RS ten-
sor format with Rl = 14 and Rs = 13.
Table 15.4: Absolute and relative errors in the interaction energy of N-particle clusters computed by
RS-tensor approximation with Rl = 14 (Rs = 13).
Table 15.5: Error in the interaction energy of clusters of N particles computed by the RS tensor ap-
proach (Rs = 10).
Table 15.6 shows the results for several clusters of particles generated by random as-
signment of charges zj to finite lattices of sizes 83 , 123 , 16 × 16 × 8, and 163 . Newton
kernel is approximated with εN = 10−4 on the grid of size 40963 with the rank R = 25.
Computation of the interaction energy was performed using the only long-range part
with Rl = 12. For the rank reduction, the multigrid C2T algorithm is applied [174], with
the rank truncation parameters εC2T = 10−5 , and εT2C = 10−6 . The box size is about
40 × 40 × 40 atomic units with mesh size h = 0.0098.
Table 15.6: Errors in the interaction energy of clusters of N particles computed by RS tensor approxi-
mation with the long-range rank parameter Rl = 12 (Rs = 13).
Table 15.6 illustrates that the relative accuracy of energy calculations by using the RS
tensor format remains of order 10−3 almost independent of the cluster size. Tucker
ranks only slightly increase with the system size N. The computation time for the ten-
sor Pl remains almost constant, whereas the point evaluations time for this tensor
(with pre-computed data) increases linearly in N (see Lemma 15.15).
G(ℓ)
k
= u(1)
k
⊗ ⋅ ⋅ ⋅ ⊗ ∇ℓ u(ℓ)
k
⊗ ⋅ ⋅ ⋅ ⊗ u(d)
k
,
The Ewald summation technique for force calculations was presented in [64, 133]. In
principle, it is possible to construct the RS tensor representation for this vector field
directly by using the radial basis function p(r) = 1/r 2 .
However, here we describe the alternative approach based on numerical differ-
entiation of the energy functional by using RS tensor representation of the N-particle
interaction potential on fine spacial grid. The differentiation in RS-tensor format with
respect to xj is based on the explicit representation (15.24), which can be rewritten in
the form
1 N
EN (x1 , . . . , xN ) = ÊN (x1 , . . . , xN ) − (∑ zj2 )PRl (x = 0), (15.27)
2 j=1
where ÊN (x1 , . . . , xN ) = 21 ∑Nj=1 zj Pl (xj ) denotes the “non-calibrated” interaction energy
with the long-range tensor component Pl . In the following discussion, for definiteness,
we set j = N. Since the second term in (15.27) does not depend on the particle positions,
it can be omitted in calculation of variations in EN with respect to xN . Hence, we arrive
at the representation for the first difference in direction ei , i = 1, 2, 3,
The straightforward implementation of the above relation for three different values of
e1 = (1, 0, 0)T , e2 = (0, 1, 0)T , and e3 = (0, 0, 1)T is reduced to the four calls of the basic
procedure for computation the tensor Pl corresponding to four different dispositions
of points x1 , . . . , xN leading to the cost of order O(dRn).
However, the factor four can be reduced to merely one, taking into account that the
two canonical/Tucker tensors Pl computed for particle positions (x1 , . . . , xN−1 , xN ) and
(x1 , . . . , xN−1 , xN −he) differ in a small part (since the positions x1 , . . . , xN−1 remain fixed).
This requires only minor modifications compared with repeating the full calculation
of ÊN (x1 , . . . , xN ).
− ∇ ⋅ (ϵ∇u) + κ2 u = ρf in Ω, (15.28)
where u denotes the target electrostatic potential of a protein, and ρf = ∑Nk=1 zk δ(‖x −
xk ‖) is the scaled singular charge distribution supported at points xk in Ωm , where δ
is the Dirac delta. Here, ϵ = 1 and κ = 0 in Ωm , whereas in the solvent region Ωs , we
have κ ≥ 0 and ϵ ≤ 1. The boundary conditions on the external boundary 𝜕Ω can be
specified depending on the particular problem setting. For definiteness, we impose
15.3 Outline of possible applications | 269
the simplest Dirichlet boundary condition u|𝜕Ω = 0. The interface conditions on the
interior boundary Γ = 𝜕Ωm arise from the dielectric theory:
𝜕u
[u] = 0, [ϵ ] on Γ. (15.29)
𝜕n
The practically useful solution methods for the PBE are based on regularization
schemes aimed at removing the singular component from the potentials in the govern-
ing equation. Among others, we consider one of the most commonly used approaches
based on the additive splitting of the potential only in the molecular region Ωm (see
[209]). To that end, we introduce the additive splitting
u = ur + us , where us = 0 in Ωs ,
− ϵm Δus = ρf in Ωm ; us = 0 on Γ. (15.30)
Now, equation (15.28) can be transformed to that for the regular potential ur :
−∇ ⋅ (ϵ∇ur ) + κ 2 ur = ρf in Ω, (15.31)
r s
[ur ] = 0,
𝜕u 𝜕u
[ϵ ] = −ϵm on Γ.
𝜕n 𝜕n
To facilitate solving equation (15.30) with singular data, we define the singular poten-
tial U in the free space by
ϵm ΔU = ρf in ℝ3
U s = U|Ω in Ωm ; Us = 0 in Ωs .
m
Δuh = 0 in Ωm ; uh = −U s on Γ.
Proposition 15.16. Let the effective support of the short-range components in the refer-
ence potential PR be not larger than σ/2. Then the interface conditions in the regularized
formulation (15.31) of the PBE depend only on the low-rank long-range component in the
free-space electrostatic potential of the system. The numerical cost to build up the inter-
face conditions on Γ in (15.31) does not depend on the number of particles N.
https://doi.org/10.1515/9783110365832-016
272 | Bibliography
[22] P. Benner, H. Faßbender, and C. Yang. Some remarks on the complex J-symmetric
eigenproblem. Preprint, Max Planck Institute Magdeburg, MPIMD/15-12, July 2015,
http://www2.mpi-magdeburg.mpg.de/preprints/2015/12/
[23] P. Benner, V. Khoromskaia, and B. N. Khoromskij. A reduced basis approach for calculation
of the Bethe–Salpeter excitation energies using low-rank tensor factorizations. Mol. Phys.,
114 (7–8), 1148–1161, 2016.
[24] P. Benner, V. Khoromskaia, and B. N. Khoromskij. Range-separated tensor formats for
numerical modeling of many-particle interaction potentials. arXiv:1606.09218 (39 pp.), 2016.
[25] P. Benner, S. Dolgov, V. Khoromskaia, and B. N. Khoromskij. Fast iterative solution of the
Bethe–Salpeter eigenvalue problem using low-rank and QTT tensor approximation. J. Comput.
Phys., 334, 221–239, 2017.
[26] P. Benner, V. Khoromskaia, B. N. Khoromskij, C. Kweyu, and M. Stein. Application of the
range-separated tensor format in solution of the Poisson–Boltzmann equation. Manuscript,
2017.
[27] P. Benner, V. Khoromskaia, B. N. Khoromskij, and C. Yang. Computing the density of states for
optical spectra by low-rank and QTT tensor approximation. arXiv:1801.03852, 2017.
[28] P. Benner, V. Khoromskaia, and B. N. Khoromskij. Range-separated tensor format for
many-particle modeling. SIAM J. Sci. Comput., 40 (2), A1034–A1062, 2018.
[29] A. Bensoussan, J.-L. Lions, and G. Papanicolaou. Asymptotic Analysis for Periodic Structures.
North-Holland, Amsterdam, 1978.
[30] C. Bertoglio, and B. N. Khoromskij. Low-rank quadrature-based tensor approximation of the
Galerkin projected Newton/Yukawa kernels. Comput. Phys. Commun., 183 (4), 904–912, 2012.
[31] G. Beylkin and M. J. Mohlenkamp. Numerical operator calculus in higher dimensions. Proc.
Natl. Acad. Sci. USA, 99, 10246–10251, 2002.
[32] G. Beylkin and M. J. Mohlenkamp. Algorithms for numerical analysis in high dimension. SIAM
J. Sci. Comput., 26 (6), 2133–2159, 2005.
[33] G. Beylkin, M. J. Mohlenkamp, and F. Pérez. Approximating a wavefunction as an
unconstrained sum of Slater determinants. J. Math. Phys., 49, 032107, 2008.
[34] G. Beylkin, J. Garcke, and M. J. Mohlenkamp, Multivariate regression and machine learning
with sums of separable functions. SIAM J. Sci. Comput., 31 (3), 1840–1857, 2009.
[35] F. A. Bischoff, E. F. Valeev. Computing molecular correlation energies with guaranteed
precision. J. Chem. Phys., 139 (11), 114106, 2013.
[36] T. Blesgen, V. Gavini, and V. Khoromskaia. Tensor product approximation of the electron
density of large aluminium clusters in OFDFT. J. Comput. Phys., 231 (6), 2551–2564, 2012.
[37] A. Bloch. Les theoremes de M. Valiron sur les fonctions entieres et la theorie
de l’uniformisation. Ann. Fac. Sci. Univ. Toulouse, 17 (3), 1–22, 1925, ISSN 0240-2963.
[38] S. F. Boys, G. B. Cook, C. M. Reeves, and I. Shavitt. Automatic fundamental calculations of
molecular structure. Nature, 178, 1207–1209, 1956.
[39] D. Braess. Nonlinear Approximation Theory. Springer-Verlag, Berlin, 1986.
[40] D. Braess. Asymptotics for the approximation of wave functions by exponential-sums.
J. Approx. Theory, 83, 93–103, 1995.
[41] S. Brenner and R. Scott. The Mathematical Theory of Finite Element Methods. Springer, Berlin,
1994.
[42] M. D. Buhmann. Radial Basis Functions. Cambridge University Press, Cambridge, 2003.
[43] H. J. Bungartz, and M. Griebel. Sparse grids. Acta Numer., 1–123, 2004.
[44] E. Cancés and C. Le Bris. On the convergence of SCF algorithms for the Hartree–Fock
equations. ESAIM: M2AN, 34 (4), 749–774, 2000.
[45] E. Cancés and C. Le Bris. Mathematical modeling of point defects in materials science. Math.
Models Methods Appl. Sci., 23, 1795–1859, 2013.
Bibliography | 273
[46] E. Cancés, A. Deleurence, and M. Lewin. A new approach to the modeling of local defects in
crystals: the reduced Hartree–Fock case. Commun. Math. Phys., 281, 129–177, 2008.
[47] E. Cancés, V. Ehrlacher, and Y. Maday. Periodic Schrödinger operator with local defects and
spectral pollution. SIAM J. Numer. Anal., 50 (6, 3016–3035, 2012.
[48] E. Cancés, V. Ehrlacher, and T. Leliévre. Greedy algorithms for high-dimensional
non-symmetric linear problems. ESAIM Proc. 41, 95–131, 2013.
[49] J. D. Carrol and J. Chang. Analysis of individual differences in multidimensional scaling via an
N-way generalization of ‘Eckart–Young’ decomposition. Psychometrika 35, 283–319, 1970.
[50] J. D. Carrol, S. Pruzansky, and J. B. Kruskal. CANDELINC: A general approach to
multidimensional analysis of many-way arrays with linear constraints on parameters.
Psychometrika, 45, 3–24, 1980.
[51] M. E. Casida. Time-dependent density-functional response theory for molecules. In
D. P. Chong, ed., Recent Advances in Density Functional Methods, Part I, World Scientific,
Singapore, 155–192, 1995.
[52] S. R. Chinnamsetty, M. Espig, W. Hackbusch, B. N. Khoromskij, and H.-J. Flad. Kronecker
tensor product approximation in quantum chemistry. J. Chem. Phys., 127, 084110, 2007.
[53] P. G. Ciarlet and C. Le Bris, eds. Handbook of Numerical Analysis, vol. X, Computational
Chemistry. Elsevier, Amsterdam, 2003.
[54] A. Cichocki and Sh. Amari. Adaptive Blind Signal and Image Processing: Learning Algorithms
and Applications. Wiley, New York, 2002.
[55] A. Cichocki, N. Lee, I. Oseledets, A. H. Pan, Q. Zhao, and D. P. Mandic. Tensor networks
for dimensionality reduction and large-scale optimization: Part 1 low-rank tensor
decompositions. Found. Trends Mach. Learn., 9 (4–5), 249–429, 2016.
[56] C. Cramer and D. Truhlar. Density functional theory for transition metals and transition metal
chemistry. Phys. Chem. Chem. Phys., 11 (46), 10757–10816, 2009.
[57] W. Dahmen, R. Devore, L. Grasedyck, E. Süli. Tensor-sparsity of solutions to high-dimensional
elliptic partial differential equations. Found. Comput. Math., 16 (4), 813–874, 2016.
[58] T. Darten, D. York, and L. Pedersen. Particle mesh Ewald: an O(N log N) method for Ewald
sums in large systems. J. Chem. Phys., 98, 10089–10091, 1993.
[59] L. De Lathauwer. Signal Processing Based on Multilinear Algebra. PhD thesis, Katholeke
Universiteit Leuven, 1997.
[60] L. De Lathauwer, B. De Moor, and J. Vandewalle. On the best rank-1 and rank-(R1 , . . . , RN )
approximation of higher-order tensors. SIAM J. Matrix Anal. Appl., 21, 1324–1342, 2000.
[61] L. De Lathauwer, B. De Moor, and J. Vandewalle. A multilinear singular value decomposition.
SIAM J. Matrix Anal. Appl., 21, 1253–1278, 2000.
[62] V. De Silva and L.-H. Lim. Tensor rank and the ill-posedness of the best low-rank
approximation problem. SIAM J. Matrix Anal. Appl., 30 (3), 1084–1127, 2008.
[63] M. Deserno and C. Holm. How to mesh up Ewald sums. I. A theoretical and numerical
comparison of various particle mesh routines. J. Chem. Phys., 109 (18), 7678–7693, 1998.
[64] M. Deserno and C. Holm. How to mesh up Ewald sums. II. A theoretical and numerical
comparison of various particle mesh routines. J. Chem. Phys., 109 (18), 7694–7701, 1998.
[65] S. Dolgov. Tensor Product Methods in Numerical Simulation of High-Dimensional
Dynamical Problems. PhD thesis, University of Leipzig, 2014.
http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-151129
[66] S. V. Dolgov, and B. N. Khoromskij. Two-level Tucker-TT-QTT format for optimized tensor
calculus. SIAM J. Matrix Anal. Appl., 34 (2),593–623, 2013.
[67] S. Dolgov, and B. N. Khoromskij. Simultaneous state-time approximation of the chemical
master equation using tensor product formats. Numer. Linear Algebra Appl., 22 (2), 197–219,
2015.
274 | Bibliography
[110] W. Hackbusch. Tensor Spaces and Numerical Tensor Calculus. Springer, Berlin, 2012.
[111] W. Hackbusch and B. N. Khoromskij. Low-rank Kronecker product approximation to
multi-dimensional nonlocal operators. Part I. Separable approximation of multi-variate
functions. Computing, 76, 177–202, 2006.
[112] W. Hackbusch and B. N. Khoromskij. Low-rank Kronecker-product approximation to
multi-dimensional nonlocal operators. Part II. HKT representation of certain operators.
Computing, 76, 203–225, 2006.
[113] W. Hackbusch and B. N. Khoromskij. Tensor-product approximation to operators and
functions in high dimension. J. Complex., 23, 697–714, 2007.
[114] W. Hackbusch and B. N. Khoromskij. Tensor-product approximation to multi-dimensional
integral operators and Green’s functions. SIAM J. Matrix Anal. Appl., 30 (3), 1233–1253, 2008.
[115] W. Hackbusch, and S. Kühn. A new scheme for the tensor representation. J. Fourier Anal.
Appl., 15, 706–722, 2009.
[116] W. Hackbusch, B. N. Khoromskij, and E. E. Tyrtyshnikov. Hierarchical Kronecker tensor-product
approximations. J. Numer. Math., 13, 119–156, 2005.
[117] W. Hackbusch, B. N. Khoromskij, and E. Tyrtyshnikov. Approximate iteration for structured
matrices. Numer. Math., 109, 365–383, 2008.
[118] W. Hackbusch, B. N. Khoromskij, S. Sauter, and E. Tyrtyshnikov. Use of tensor formats in
elliptic eigenvalue problems. Numer. Linear Algebra Appl., 19 (1), 133–151, 2012.
[119] W. Hackbusch, and R. Schneider. Tensor spaces and hierarchical tensor representations.
In S. Dahlke, W. Dahmen, et al., eds, Lecture Notes in Computer Science and Engineering,
vol. 102, Springer, Berlin, 2014.
[120] N. Hale and L. N. Trefethen. Chebfun and numerical quadrature. Sci. China Math., 55 (9),
1749–1760, 2012.
[121] H. Harbrecht, M. Peters, and R. Schneider. On the low-rank approximation by the pivoted
Cholesky decomposition. Appl. Numer. Math., 62 (4), 428–440, 2012.
[122] R. J. Harrison, G. I. Fann, T. Yanai, Z. Gan, and G. Beylkin. Multiresolution quantum chemistry:
basic theory and initial applications. J. Chem. Phys., 121 (23), 11587–11598, 2004.
[123] D. R. Hartree. The Calculation of Atomic Structure. Wiley, New York, 1957.
[124] R. Haydock, V. Heine, and M. J. Kelly. Electronic structure based on the local atomic
environment for tight-binding bands. J. Phys. C, Solid State Phys., 5, 2845–2858, 1972.
[125] M. Head-Gordon, J. A. Pople, and M. Frisch. MP2 energy evaluation by direct methods. Chem.
Phys. Lett., 153 (6), 503–506, 1988.
[126] L. Hedin. New method for calculating the one-particle Green’s function with application to the
electron–gas problem. Phys. Rev. 139, A796, 1965.
[127] T. Helgaker, P. Jørgensen, and N. Handy. A numerically stable procedure for calculating
Møller–Plesset energy derivatives, derived using the theory of Lagrangians. Theor. Chim.
Acta, 76, 227–245, 1989.
[128] T. Helgaker, P. Jørgensen, and J. Olsen. Molecular Electronic-Structure Theory. Wiley, New
York, 1999.
[129] J. S. Hesthaven, G. Rozza, and B. Stamm. Certified Reduced Basis Methods for Parametrized
Partial Differential Equations. Springer, Berlin, 2016.
[130] N. Higham. Analysis of the Cholesky decomposition of a semi-definite matrix. In M. G. Cox and
S. J. Hammarling, eds, Reliable Numerical Computations, Oxford University Press, Oxford,
pp. 161–185, 1990.
[131] F. L. Hitchcock. The expression of a tensor or a polyadic as a sum of products. J. Math. Phys.,
6, 164–189, 1927.
[132] F. L. Hitchcock. Multiple invariants and generalized rank of a p-way matrix or tensor. J. Math.
Phys., 7, 39–79, 1927.
Bibliography | 277
[133] R. W. Hockney and J. W. Eastwood. Computer Simulation Using Particles. IOP, Bristol, 1988.
[134] E. G. Hohenstein, R. M. Parrish, and T. J. Martinez. Tensor hypercontraction density fitting.
Quartic scaling second- and third-order Møller–Plesset perturbation theory. J. Chem. Phys.,
137, 044103, 2012.
[135] M. Holst, N. Baker, and F. Wang. Adaptive multilevel finite element solution of the
Poisson–Boltzmann equation: algorithms and examples. J. Comput. Chem., 21, 1319–1342,
2000.
[136] S. Holtz, T. Rohwedder, and R. Schneider. On manifold of tensors of fixed TT-rank. Numer.
Math., 120 (4), 701–731, 2012.
[137] S. Holtz, T. Rohwedder, and R. Schneider. The alternating linear scheme for tensor
optimization in the tensor train format. SIAM J. Sci. Comput., 34 (2), A683–A713, 2012.
[138] T. Huckle, K. Waldherr, and T. Schulte-Herbrüggen. Computations in quantum tensor
networks. Linear Algebra Appl., 438, 750–781, 2013.
[139] P. H. Hünenberger. Lattice-sum methods for computing electrostatic interactions in molecular
simulations. AIP Conf. Proc., 492, 17, 1999.
[140] M. Ishteva, L. De Lathauwer, P.-A. Absil, and S. Van Huffel. Differential-geometric Newton
method for the best rank-(R1 , R2 , R3 ) approximation of tensors. Numer. Algorithms, 51 (2),
179–194, 2009.
[141] A. Iske. Multiresolution Methods in Scattered Data Modeling. Springer, Berlin, 2004.
[142] V. Kazeev, and B. N. Khoromskij. Explicit low-rank QTT representation of Laplace operator and
its inverse. SIAM J. Matrix Anal. Appl., 33 (3), 2012, 742–758.
[143] V. Kazeev, B. N. Khoromskij, and E. E. Tyrtyshnikov. Multilevel Toeplitz matrices generated by
tensor-structured vectors and convolution with logarithmic complexity. SIAM J. Sci. Comput.
35 (3), A1511–A1536, 2013.
[144] V. Kazeev, M. Khammash, M. Nip, and Ch. Schwab. Direct solution of the chemical master
equation using quantized tensor trains. PLoS Comput. Biol. 10 (3), 2014.
[145] V. Khoromskaia. Computation of the Hartree–Fock exchange in the tensor-structured format.
Comput. Methods Appl. Math., 10 (2), 1–16, 2010.
[146] V. Khoromskaia. Numerical Solution of the Hartree–Fock Equation by Multilevel
Tensor-Structured Methods. PhD dissertation, TU Berlin, 2010.
https://depositonce.tu-berlin.de/handle/11303/3016
[147] V. Khoromskaia. Black-box Hartree–Fock solver by tensor numerical methods. Comput.
Methods Appl. Math., 14 (1), 89–111, 2014.
[148] V. Khoromskaia and B. N. Khoromskij. Grid-based lattice summation of electrostatic
potentials by assembled rank-structured tensor approximation. Comput. Phys. Commun.,
185, 3162–3174, 2014.
[149] V. Khoromskaia and B. N. Khoromskij. Tucker tensor method for fast grid-based summation of
long-range potentials on 3D lattices with defects. arXiv:1411.1994, 2014.
[150] V. Khoromskaia and B. N. Khoromskij. Møller–Plesset (MP2) energy correction using tensor
factorizations of the grid-based two-electron integrals. Comput. Phys. Commun., 185, 2–10,
2014.
[151] V. Khoromskaia and B. N. Khoromskij. Tensor approach to linearized Hartree–Fock equation
for lattice-type and periodic systems. Preprint 62/2014, Max-Planck Institute for Mathematics
in the Sciences, Leipzig. arXiv:1408.3839, 2014.
[152] V. Khoromskaia and B. N. Khoromskij. Tensor numerical methods in quantum chemistry:
from Hartree–Fock to excitation energies. Phys. Chem. Chem. Phys., 17 (47), 31491–31509,
2015.
[153] V. Khoromskaia and B. N. Khoromskij. Fast tensor method for summation of long-range
potentials on 3D lattices with defects. Numer. Linear Algebra Appl., 23, 249–271, 2016.
278 | Bibliography
[154] V. Khoromskaia and B. N. Khoromskij. Block circulant and Toeplitz structures in the linearized
Hartree–Fock equation on finite lattices: tensor approach. Comput. Methods Appl. Math.,
17 (3), 431–455, 2017.
[155] V. Khoromskaia, B. N. Khoromskij, and R. Schneider. QTT representation of the Hartree and
exchange operators in electronic structure calculations. Comput. Methods Appl. Math., 11 (3),
327–341, 2011.
[156] V. Khoromskaia, D. Andrae, and B. N. Khoromskij. Fast and accurate 3D tensor calculation of
the Fock operator in a general basis. Comput. Phys. Commun., 183, 2392–2404, 2012.
[157] V. Khoromskaia, B. N. Khoromskij, and R. Schneider. Tensor-structured factorized calculation
of two-electron integrals in a general basis. SIAM J. Sci. Comput., 35 (2), A987–A1010,
2013.
[158] V. Khoromskaia, B. N. Khoromskij, and F. Otto. A numerical primer in 2D stochastic
homogenization: CLT scaling in the representative volume element. Preprint 47/2017,
Max-Planck Institute for Math. in the Sciences, Leipzig 2017.
[159] B. N. Khoromskij. Data-sparse elliptic operator inverse based on explicit approximation to the
Green function. J. Numer. Math., 11 (2), 135–162, 2003.
[160] B. N. Khoromskij. An Introduction to Structured Tensor-Product Representation of Discrete
Nonlocal Operators. Lecture Notes, vol. 27, Max-Planck Institute for Mathematics in the
Sciences, Leipzig, 2005.
[161] B. N. Khoromskij. Structured rank-(r1 , . . . , rd ) decomposition of function-related tensors in ℝd .
Comput. Methods Appl. Math., 6 (2), 194–220, 2006.
[162] B. N. Khoromskij. Structured data-sparse approximation to high order tensors arising from
the deterministic Boltzmann equation. Math. Comput., 76, 1292–1315, 2007.
[163] B. N. Khoromskij. On tensor approximation of Green iterations for Kohn–Sham equations.
Comput. Vis. Sci., 11, 259–271, 2008.
[164] B. N. Khoromskij. Tensor-structured preconditioners and approximate inverse of elliptic
operators in ℝd . Constr. Approx., 30, 599–620, 2009.
[165] B. N. Khoromskij. O(d log N)-quantics approximation of N-d tensors in high-dimensional
numerical modeling. Preprint 55/2009, Max-Planck Institute for Mathematics in the Sciences,
Leipzig 2009.
http://www.mis.mpg.de/publications/preprints/2009/prepr2009-55.html
[166] B. N. Khoromskij. Fast and accurate tensor approximation of a multivariate convolution with
linear scaling in dimension. J. Comput. Appl. Math., 234, 3122–3139, 2010.
[167] B. N. Khoromskij. O(d log N)-quantics approximation of N-d tensors in high-dimensional
numerical modeling. Constr. Approx., 34 (2), 257–289, 2011.
[168] B. N. Khoromskij. Introduction to tensor numerical methods in scientific computing. Lecture
Notes, Preprint 06-2011, University of Zuerich, Institute of Mathematics, 2011, pp. 1–238,
http://www.math.uzh.ch/fileadmin/math/preprints/06_11.pdf
[169] B. N. Khoromskij. Tensors-structured numerical methods in scientific computing: survey on
recent advances. Chemom. Intell. Lab. Syst., 110, 1–19, 2012.
[170] B. N. Khoromskij. Tensor numerical methods for high-dimensional PDEs: basic theory and
initial applications. ESAIM, 48, 1–28, 2014.
[171] B. N. Khoromskij. Tensor Numerical Methods in Scientific Computing. Research Monograph,
De Gruyter Verlag, Berlin, 2018.
[172] B. N. Khoromskij. Operator-Dependent Approximation of the Dirac Delta by Using
Range-Separated Tensor Format. Manuscript, 2018.
[173] B. N. Khoromskij and V. Khoromskaia. Low rank Tucker-type tensor approximation to classical
potentials. Cent. Eur. J. Math., 5 (3), 523–550, 2007 (Preprint 105/2006 Max-Planck Institute
for Mathematics in the Sciences, Leipzig 2006).
Bibliography | 279
[195] D. Kressner and C. Tobler. Preconditioned low-rank methods for high-dimensional elliptic PDE
eigenvalue problems. Comput. Methods Appl. Math., 11 (3), 363–381, 2011.
[196] D. Kressner, M. Steinlechner, and A. Uschmajew. Low-rank tensor methods with subspace
correction for symmetric eigenvalue problems. SIAM J. Sci. Comput., 36 (5), A2346–A2368,
2014.
[197] P. M. Kroonenberg and J. De Leeuw. Principal component analysis of three-mode data by
means of alternating least squares algorithms. Psychometrika, 45, 69–97, 1980.
[198] J. B. Kruskal. Three-way arrays: rank and uniqueness of trilinear decompositions, with
applications to arithmetic complexity and statistics. Linear Algebra Appl., 18, 95–138, 1977.
[199] K. N. Kudin, and G. E. Scuseria, Revisiting infinite lattice sums with the periodic Fast Multipole
Method, J. Chem. Phys. 121, 2886–2890, 2004.
[200] L. Laaksonen, P. Pyykkö, and D. Sundholm. Fully numerical Hartree–Fock methods for
molecules, Comput. Phys. Rep., 4, 313–344, 1986.
[201] J. M. Landsberg. Tensors: Geometry and Applications. American Mathematical Society,
Providence, RI, 2012.
[202] S. Lang. Linear Algebra, 3rd edn. Springer, Berlin, 1987.
[203] C. Le Bris, Computational chemistry from the perspective of numerical analysis. Acta Numer.,
363–444, 2005.
[204] L. Lin, Y. Saad, and Ch. Yang. Approximating spectral densities of large matrices. SIAM Rev.,
58, 34, 2016.
[205] D. Lindbo and A.-K. Tornberg. Fast and spectrally accurate Ewald summation for 2-periodic
electrostatic systems. J. Chem. Phys., 136, 164111, 2012.
[206] A. Litvinenko, D. Keyes, V. Khoromskaia, B. Khoromskij, and H. Matthies. Tucker Tensor
analysis of Matérn functions in spatial statistics. arXiv:1711.06874, 2017.
[207] M. Lorenz, D. Usvyat, and M. Schütz. Local ab initio methods for calculating optical band
gaps in periodic systems. I. Periodic density fitted local configuration interaction singles
method for polymers. J. Chem. Phys., 134, 094101, 2011.
[208] S. A. Losilla, D. Sundholm, and J. Juselius. The direct approach to gravitation and
electrostatics method for periodic systems. J. Chem. Phys., 132 (2), 024102, 2010.
[209] B. Z. Lu, Y. C. Zhou, M. J. Holst, and J. A. McCammon. Recent progress in numerical methods
for Poisson–Boltzmann equation in biophysical applications. Commun. Comput. Phys., 3 (5),
973–1009, 2008.
[210] Ch. Lubich. On variational approximations in quantum molecular dynamics. Math. Comput.,
74, 765–779, 2005.
[211] Ch. Lubich. From Quantum to Classical Molecular Dynamics: Reduced Models and Numerical
Analysis. Zurich Lectures in Advanced Mathematics, EMS, Zurich, 2008.
[212] Ch. Lubich and I. V. Oseledets. A projector-splitting integrator for dynamical low-rank
approximation. BIT Numer. Math., 54 (1), 171–188, 2014.
[213] Ch. Lubich, T. Rohwedder, R. Schneider, and B. Vandereycken. Dynamical approximation
of hierarchical Tucker and tensor-train tensors. SIAM J. Matrix Anal. Appl., 34 (2), 470–494,
2013.
[214] C Lubich, I. V. Oseledets, and B. Vandereycken. Time integration of tensor trains. SIAM J.
Numer. Anal., 53 (2), 917–941, 2015.
[215] J. Lund and K. L. Bowers. Sinc Methods for Quadrature and Differential Equations. SIAM,
Philadelphia, 1992.
[216] F. R. Manby. Density fitting in second-order linear-r12 Møller–Plesset perturbation theory.
J. Chem. Phys., 119 (9), 4607–4613, 2003.
[217] F. R. Manby, P. J. Knowles, and A. W. Lloyd. The Poisson equation in density fitting for the
Kohn–Sham Coulomb problem. J. Chem. Phys., 115, 9144–9148, 2001.
Bibliography | 281
[218] G. I. Marchuk and V. V. Shaidurov. Difference Methods and Their Extrapolations. Applications
of Mathematics, Springer, New York, 1983.
[219] H. G. Matthies, A. Litvinenko, O. Pajonk, B. L. Rosic, and E. Zander. Parametric and uncertainty
computations with tensor product representations. In: Uncertainty Quantification in Scientific
Computing, Springer, Berlin, pp. 139–150, 2012.
[220] V. Mazyja, G. Schmidt. Approximate Approximations. Mathematical Surveys and Monographs,
vol. 141, AMS, Providence, 2007.
[221] H.-D. Meyer, F. Gatti, and G. A. Worth. Multidimensional Quantum Dynamics: MCTDH Theory
and Applications. Willey–VCH, Wienheim, 2009.
[222] C. Møller and M. S. Plesset. Note on an approximation treatment for many-electron systems.
Phys. Rev., 46, 618, 1934.
[223] K. K. Naraparaju and J. Schneider. Generalized cross approximation for 3d-tensors. Comput.
Vis. Sci., 14 (3), 105–115, 2011.
[224] G. Onida, L. Reining, A. Rubio. Electronic excitations: density-functional versus many-body
Green’s-function approaches. Rev. Mod. Phys., 74 (2), 601, 2002.
[225] I. V. Oseledets. Approximation of 2d × 2d matrices using tensor decomposition. SIAM J. Matrix
Anal. Appl., 31 (4), 2130–2145, 2010.
[226] I. V. Oseledets. Tensor-train decomposition. SIAM J. Sci. Comput., 33 (5), 2295–2317, 2011.
[227] I. V. Oseledets. Constructive representation of functions in low-rank tensor formats. Constr.
Approx., 37 (1), 1–18, 2013.
[228] I. V. Oseledets and S. V. Dolgov. Solution of linear systems and matrix inversion in the
TT-format. SIAM J. Sci. Comput., 34 (5), A2718–A2739, 2012.
[229] I. V. Oseledets, and E. E. Tyrtyshnikov, Breaking the curse of dimensionality, or how to use
SVD in many dimensions. SIAM J. Sci. Comput., 31 (5), 3744–3759, 2009.
[230] I. Oseledets and E. E. Tyrtyshnikov. TT-cross approximation for multidimensional arrays.
Linear Algebra Appl., 432 (1), 70–88, 2010.
[231] I. V. Oseledets, D. V. Savostyanov, and E. E. Tyrtyshnikov. Tucker dimensionality reduction
of three-dimensional arrays in linear time. SIAM J. Matrix Anal. Appl., 30 (3), 939–956,
2008.
[232] I. V. Oseledets et al. Tensor Train Toolbox, 2014. https://github.com/oseledets/TT-Toolbox
[233] R. Parrish, E. G. Hohenstein, T. J. Martinez, and C. D. Sherrill. Tensor hypercontraction. II.
Least-squares renormalization. J. Chem. Phys., 137, 224106, 2012.
[234] K. A. Peterson, D. E. Woon, and T. H. Dunning, Jr. Benchmark calculations with correlated
molecular wave functions. IV. The classical barrier height of the H + H2 → H2 + H reaction.
J. Chem. Phys., 100, 7410–7415, 1994.
[235] C. Pisani, M. Schütz, S. Casassa, D. Usvyat, L. Maschio, M. Lorenz, and A. Erba. CRYSCOR:
a program for the post-Hartree–Fock treatment of periodic systems. Phys. Chem. Chem. Phys.,
14, 7615–7628, 2012.
[236] E. L. Pollock and J. Glosli. Comments on p(3)m, fmm and the Ewald method for large periodic
Coulombic systems. Comput. Phys. Commun., 95, 93–110, 1996.
[237] R. Polly, H.-J. Werner, F. R. Manby, and P. J. Knowles. Fast Hartree–Fock theory using density
fitting approximations. Mol. Phys., 102, 2311–2321, 2004.
[238] P. Pulay. Improved SCF convergence acceleration. J. Comput. Chem., 3, 556–560, 1982.
[239] M. Rakhuba and I. Oseledets. Fast multidimensional convolution in low-rank tensor formats
via cross approximation. SIAM J. Sci. Comput., 37 (2), A565–A582, 2015.
[240] M. Rakhuba and I. Oseledets. Grid-based electronic structure calculations: the tensor
decomposition approach. J. Comput. Phys., 312, 19–30, 2016.
[241] G. Rauhut, P. Pulay, H.-J. Werner. Integral transformation with low-order scaling for large local
second-order Mollet–Plesset calculations. J. Comput. Chem., 19, 1241–1254, 1998.
282 | Bibliography
[242] H. Rauhut, R. Schneider, and Z. Stojanac. Low rank tensor recovery via iterative hard
thresholding. Linear Algebra Appl., 523, 220–262, 2017.
[243] E. Rebolini, J. Toulouse, and A. Savin. Electronic excitation energies of molecular systems
from the Bethe–Salpeter equation: Example of H2 molecule. In S. Ghosh, P. Chattaraj, eds,
Concepts and Methods in Modern Theoretical Chemistry, vol. 1: Electronic Structure and
Reactivity, p. 367, 2013.
[244] E. Rebolini, J. Toulouse, and A. Savin. Electronic excitations from a linear-response
range-separated hybrid scheme. Mol. Phys., 111, 1219, 2013.
[245] E. Rebolini, J. Toulouse, A. M. Teale, T. Helgaker, and A. Savin. Calculating excitation
energies by extrapolation along adiabatic connections. Phys. Rev. A, 91, 032519,
2015.
[246] M. Reed and B. Simon. Functional Analysis. Academic Press, San Diego, 1972.
[247] S. Reine, T. Helgaker, and R. Lindh. Multi-electron integrals. WIREs Comput. Mol. Sci., 2,
290–303, 2012.
[248] L. Reining, V. Olevano, A. Rubio, and G. Onida. Excitonic effects in solids described by
time-dependent density-functional theory. Phys. Rev. Lett., 88 (6), 66404, 2002.
[249] T. Rohwedder and R. Schneider. Error estimates for the coupled cluster method. ESAIM:
M2AN, 47 (6), 1553–1582, 2013.
[250] T. Rohwedder and A. Uschmajew. On local convergence of alternating schemes for
optimization of convex problems in the tensor train format. SIAM J. Numer. Anal., 51 (2),
1134–1162, 2013.
[251] E. Runge and E. K. U. Gross. Density-functional theory for time-dependent systems. Phys. Rev.
Lett., 52 (12), 997, 1984.
[252] E. E. Salpeter and H. A. Bethe. A relativistic equation for bound-state problems. Phys. Rev.,
84 (6), 1951.
[253] G. Sansone, B. Civalleri, D. Usvyat, J. Toulouse, K. Sharkas, and L. Maschio. Range-separated
double-hybrid density-functional theory applied to periodic systems. J. Chem. Phys., 143,
102811, 2015.
[254] B. Savas and L.-H. Lim. Quasi-Newton methods on Grassmanians and multilinear
approximations of tensors. SIAM J. Sci. Comput., 32 (6), 3352–3393, 2010.
[255] D. V. Savostianov. Fast revealing of mode ranks of tensor in canonical form. Numer. Math.,
Theory Methods Appl., 2 (4), 439–444, 2009.
[256] D. V. Savostyanov and I. V. Oseledets. Fast adaptive interpolation of multi-dimensional
arrays in tensor train format. In Multidimensional (ND) Systems, 7th International Workshop,
University of Poitiers, France, 2011, doi:10.1109/nDS.2011.6076873
[257] D. V. Savostyanov, S. V. Dolgov, J. M. Werner, and I. Kuprov. Exact NMP simulation of
protein-size spin systems using tensor train formalism. Phys. Rev. B, 90, 085139,
2014.
[258] G. Schaftenaar and J. H. Noordik. Molden: a pre- and post-processing program for molecular
and electronic structures. J. Comput.-Aided Mol. Des., 14, 123–134, 2000.
[259] W. G. Schmidt, S. Glutsch, P. H. Hahn, and F. Bechstedt. Efficient O(N 2) method to solve the
Bethe–Salpeter equation. Phys. Rev. B, 67, 085307, 2003.
[260] R. Schneider. Analysis of the projected coupled cluster method in electronic structure
calculation. Numer. Math., 113, (3), 433–471, 2009.
[261] R. Schneider and A. Uschmajew. Approximation rates for the hierarchical tensor format in
periodic Sobolev spaces. J. Complex., 30 (2), 56–71, 2014.
[262] R. Schneider and A. Uschmajew. Convergence results for projected line-search methods on
varieties of low-rank matrices via Lojasiewicz inequality. SIAM J. Optim., 25 (1), 622–646,
2015.
Bibliography | 283
[263] R. Schneider, Th. Rohwedder, J. Blauert, and A. Neelov. Direct minimization for calculating
invariant subspaces in density functional computations of the electronic structure. J. Comput.
Math., 27 (2–3), 360–387, 2009.
[264] U. Schollwöck. The density-matrix renormalization group in the age of matrix product states,
Ann. Phys., 326 (1), 96–192, 2011.
[265] K. L. Schuchardt, B. T. Didier, T. Elsethagen, L. Sun, V. Gurumoorthi, J. Chase, J. Li, and
T. L. Windus. Basis set exchange: a community database for computational sciences, J. Chem.
Inf. Model., 47, 1045–1052, 2007.
[266] C. Schwab and R.-A. Todor, Karhunen–Loéve approximation of random fields by generalized
fast multipole methods. J. Comput. Phys., 217, 100–122, 2006.
[267] H. Sekino, Y. Maeda, T. Yanai, and R. J. Harrison. Basis set limit Hartree Fock and density
functional theory response property evaluation by multiresolution multiwavelet basis.
J. Chem. Phys., 129, 034111, 2008.
[268] Y. Shao, L. F. Molnar, Y. Jung, J. Kussmann, C. Ochsenfeld, S. T. Brown, et al. Advances in
methods and algorithms in a modern quantum chemistry program package. Phys. Chem.
Chem. Phys., 8 (27), 3172–3191.
[269] J. Sherman, W. Morrison. Adjustment of an inverse matrix corresponding to a change in one
element of a given matrix. Ann. Math. Stat., 21 (1), 124–127, 1950.
[270] A. Smilde, R. Bro, and P. Geladi. Multi-Way Analysis. Wiley, New York, 2004.
[271] F. Stenger. Numerical Methods Based on Sinc and Analytic Functions. Springer-Verlag, Berlin,
1993.
[272] G. Strang. Introduction to Linear Algebra, 5th edn. Wellesley–Cambridge Press, Wellesley,
2016.
[273] G. Strang and G. J. Fix. An Analysis of the Finite Element Method. Prentice-Hall, Inc., NJ,
1973.
[274] R. E. Stratmann, G. E. Scuseria, and M. J. Frisch. An efficient implementation of
time-dependent density-functional theory for the calculation of excitation energies of large
molecules. J. Chem. Phys., 109, 8218, 1998.
[275] E. Süli and D. F. Mayers. An Introduction to Numerical Analysis. Cambridge University Press,
Cambridge, 2003.
[276] D. Sundholm, P. Pyykkö, and L. Laaksonen. Two-dimensional fully numerical molecular
calculations. X. Hartree–Fock results for He2 , Li1 2, Be2 , HF, OH− , N2 , CO, BF, NO+ , and CN− .
Mol. Phys., 56, 1411–1418, 1985.
[277] A. Szabo and N. Ostlund. Modern Quantum Chemistry. Dover Publication, New York, 1996.
[278] A. Y. Toukmaji, and J. Board Jr. Ewald summation techniques in perspective: a survey. Comput.
Phys. Commun., 95, 73–92, 1996.
[279] J. Toulouse, A. Savin. Local density approximation for long-range or for short-range energy
functionals? J. Mol. Struct., Theochem, 762, 147, 2006.
[280] J. Toulouse, F. Colonna, and A. Savin. Long-range – short-range separation of the
electron–electron interaction in density-functional theory. Phys. Rev. A, 70, 062505, 2004.
[281] L. N. Trefethen. Spectral Methods in MATLAB. SIAM, Philadelphia, 2000.
[282] L. N. Trefethen and D Bau III. Numerical Linear Algebra. SIAM, Philadelphia, 1997.
[283] L. N. Trefethen and M. Embree. Spectra and Pseudospectra: The Behavior of Nonnormal
Matrices and Operators. Princeton University Press, Princeton and Oxford, 2005.
[284] L. R. Tucker. Some mathematical notes on three-mode factor analysis. Psychometrika, 31,
279–311, 1966.
[285] I. Turek. A maximum-entropy approach to the density of states within the recursion method.
J. Phys. C, 21, 3251–3260, 1988.
[286] E. E. Tyrtyshnikov. Mosaic-skeleton approximations. Calcolo, 33, 47–57, 1996.
284 | Bibliography