0% found this document useful (0 votes)
89 views299 pages

Venera Khoromskaia - Boris Khoromskij - Tensor Numerical Methods in Quantum Chemistry-De Gruyter (2018)

This document summarizes a book on tensor numerical methods in quantum chemistry. It introduces tensor formats for representing multidimensional tensors, including canonical and Tucker formats. It discusses using these tensor decompositions for grid-based approximations of functions and operators arising in quantum chemistry problems, such as the Hartree-Fock equation. The book presents algorithms for tensor factorization of quantities like two-electron integrals and examines applications like computing excitation energies and long-range interactions.

Uploaded by

Pablo Álvarez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views299 pages

Venera Khoromskaia - Boris Khoromskij - Tensor Numerical Methods in Quantum Chemistry-De Gruyter (2018)

This document summarizes a book on tensor numerical methods in quantum chemistry. It introduces tensor formats for representing multidimensional tensors, including canonical and Tucker formats. It discusses using these tensor decompositions for grid-based approximations of functions and operators arising in quantum chemistry problems, such as the Hartree-Fock equation. The book presents algorithms for tensor factorization of quantities like two-electron integrals and examines applications like computing excitation energies and long-range interactions.

Uploaded by

Pablo Álvarez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 299

Venera Khoromskaia, Boris N.

Khoromskij
Tensor Numerical Methods in Quantum Chemistry
Also of Interest
Tensor Numerical Methods in Scientific Computing
Boris N. Khoromskij, 2018
ISBN 978-3-11-037013-3, e-ISBN (PDF) 978-3-11-036591-7,
e-ISBN (EPUB) 978-3-11-039139-8

Numerical Tensor Methods.


Tensor Trains in Mathematics and Computer Science
Ivan Oseledets, 2018
ISBN 978-3-11-046162-6, e-ISBN (PDF) 978-3-11-046163-3,
e-ISBN (EPUB) 978-3-11-046169-5

The Robust Multigrid Technique. For Black-Box Software


Sergey I. Martynenko, 2017
ISBN 978-3-11-053755-0, e-ISBN (PDF) 978-3-11-053926-4,
e-ISBN (EPUB) 978-3-11-053762-8

Direct and Large-Eddy Simulation


Bernard J. Geurts, 2018
ISBN 978-3-11-051621-0, e-ISBN (PDF) 978-3-11-053236-4,
e-ISBN (EPUB) 978-3-11-053182-4

Richardson Extrapolation. Practical Aspects and Applications


Zahari Zlatev, Ivan Dimov, István Faragó, Ágnes Havasi, 2017
ISBN 978-3-11-051649-4, e-ISBN (PDF) 978-3-11-053300-2,
e-ISBN (EPUB) 978-3-11-053198-5
Venera Khoromskaia, Boris N. Khoromskij

Tensor Numerical
Methods in Quantum
Chemistry

|
Mathematics Subject Classification 2010
65F30, 65F50, 65N35, 65F10

Authors
Dr. Venera Khoromskaia
Max-Planck Institute for
Mathematics in the Sciences
Inselstr. 22-26
04103 Leipzig
Germany
[email protected]

DrSci. Boris N. Khoromskij


Max-Planck Institute for
Mathematics in the Sciences
Inselstr. 22-26
04103 Leipzig
Germany
[email protected]

ISBN 978-3-11-037015-7
e-ISBN (PDF) 978-3-11-036583-2
e-ISBN (EPUB) 978-3-11-039137-4

Library of Congress Control Number: 2018941005

Bibliographic information published by the Deutsche Nationalbibliothek


The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie;
detailed bibliographic data are available on the Internet at http://dnb.dnb.de.

© 2018 Walter de Gruyter GmbH, Berlin/Boston


Typesetting: VTeX UAB, Lithuania
Printing and binding: CPI books GmbH, Leck
Cover image: Venera Khoromskaia and Boris N. Khoromskij, Leipzig, Germany

www.degruyter.com
Contents
1 Introduction | 1

2 Rank-structured formats for multidimensional tensors | 9


2.1 Some notions from linear algebra | 9
2.1.1 Vectors and matrices | 9
2.1.2 Matrix–matrix multiplication. Change of basis | 11
2.1.3 Factorization of matrices | 13
2.1.4 Examples of rank decomposition for function related matrices | 15
2.1.5 Reduced SVD of a rank-R matrix | 18
2.2 Introduction to multilinear algebra | 18
2.2.1 Full format dth order tensors | 19
2.2.2 Canonical and Tucker tensor formats | 23
2.2.3 Tucker tensor decomposition for full format tensors | 27
2.2.4 Basic bilinear operations with rank-structured tensors | 33

3 Rank-structured grid-based representations of functions in ℝd | 37


3.1 Super-compression of function-related tensors | 37
3.1.1 Prediction of approximation theory: O(log n) ranks | 38
3.1.2 Analytic methods of separable approximation of multivariate
functions and operators | 39
3.1.3 Tucker decomposition of function-related tensors | 43
3.2 Multigrid Tucker tensor decomposition | 53
3.2.1 Examples of potentials on lattices | 60
3.2.2 Tucker tensor decomposition as a measure of randomness | 62
3.3 Reduced higher order SVD and canonical-to-Tucker transform | 62
3.3.1 Reduced higher order SVD for canonical target | 63
3.3.2 Canonical-to-Tucker transform via RHOSVD | 66
3.3.3 Multigrid canonical-to-Tucker algorithm | 69
3.4 Mixed Tucker-canonical transform | 73
3.5 On Tucker-to-canonical transform | 76

4 Multiplicative tensor formats in ℝd | 79


4.1 Tensor train format: linear scaling in d | 79
4.2 O(log n)-quantics (QTT) tensor approximation | 82
4.3 Low-rank representation of functions in quantized tensor spaces | 84

5 Multidimensional tensor-product convolution | 87


5.1 Grid-based discretization of the convolution transform | 87
5.2 Tensor approximation to discrete convolution on uniform grids | 90
VI | Contents

5.3 Low-rank approximation of convolving tensors | 92


5.4 Algebraic recompression of the sinc approximation | 94
5.5 Numerical verification on quantum chemistry data | 95

6 Tensor decomposition for analytic potentials | 99


6.1 Grid-based canonical/Tucker representation of the Newton kernel | 99
6.2 Low-rank representation for the general class of kernels | 102

7 The Hartree–Fock equation | 105


7.1 Electronic Schrödinger equation | 105
7.2 The Hartree–Fock eigenvalue problem | 106
7.3 The standard Galerkin scheme for the Hartree–Fock equation | 107
7.4 Rank-structured grid-based approximation of the Hartree–Fock
problem | 109

8 Multilevel grid-based tensor-structured HF solver | 111


8.1 Calculation of the Hartree and exchange operators | 111
8.1.1 Agglomerated representation of the Galerkin matrices | 112
8.1.2 On the choice of the Galerkin basis functions | 114
8.1.3 Tensor computation of the Galerkin integrals in matrices J(D)
and K(D) | 115
8.2 Numerics on three-dimensional convolution operators | 117
8.3 Multilevel rank-truncated self-consistent field iteration | 121
8.3.1 SCF iteration by using modified DIIS scheme | 122
8.3.2 Unigrid and multilevel tensor-truncated DIIS iteration | 123

9 Grid-based core Hamiltonian | 129


9.1 Tensor approach for multivariate Laplace operator | 129
9.2 Nuclear potential operator by direct tensor summation | 132
9.3 Numerical verification for the core Hamiltonian | 136

10 Tensor factorization of grid-based two-electron integrals | 141


10.1 General introduction | 141
10.2 Grid-based tensor representation of TEI in the full product basis | 143
10.3 Redundancy-free factorization of the TEI matrix B | 145
10.3.1 Grid-based 1D density fitting scheme | 145
10.3.2 Redundancy-free factorization of the TEI matrix B | 148
10.3.3 Low-rank Cholesky decomposition of the TEI matrix B | 152
10.4 On QTT compression to the Cholesky factor L | 154

11 Fast grid-based Hartree–Fock solver by factorized TEI | 157


11.1 Grid representation of the global basis functions | 157
Contents | VII

11.2 3D Laplace operator in O(n) and O(log n) complexity | 159


11.3 Nuclear potential operator in O(n) complexity | 160
11.4 Coulomb and exchange operators by factorized TEI | 161
11.5 Algorithm of the black-box HF solver | 163
11.6 Ab initio ground state energy calculations for compact molecules | 165
11.7 On Hartree–Fock calculations for extended systems | 168
11.8 MP2 calculations by factorized TEI | 171
11.8.1 Two-electron integrals in a molecular orbital basis | 172
11.8.2 Separation rank estimates and numerical illustrations | 173
11.8.3 Complexity bounds, sketch of algorithm, QTT compression | 176

12 Calculation of excitation energies of molecules | 179


12.1 Numerical solution of the Bethe–Salpeter equation | 179
12.2 Prerequisites from Hartree–Fock calculations | 180
12.3 Tensor factorization of the BSE matrix blocks | 182
12.4 The reduced basis approach using low-rank approximations | 185
12.5 Approximating the screened interaction matrix in a reduced-block
format | 190
12.6 Inverse iteration for diagonal plus low-rank matrix | 194
12.7 Inversion of the block-sparse matrices | 196
12.8 Solving BSE spectral problems in the QTT format | 198

13 Density of states for a class of rank-structured matrices | 201


13.1 Regularized density of states for symmetric matrices | 202
13.2 General overview of commonly used methods | 204
13.3 Computing trace of a rank-structured matrix inverse | 205
13.4 QTT approximation of DOS via Lorentzians: rank bounds | 209
13.5 Interpolation of the DOS function by using the QTT format | 211
13.6 Upper bounds on the QTT ranks of DOS function | 213

14 Tensor-based summation of long-range potentials on finite


3D lattices | 215
14.1 Assembled tensor summation of potentials on finite lattices | 217
14.2 Assembled summation of lattice potentials in Tucker tensor format | 222
14.3 Assembled tensor sums in a periodic setting | 224
14.4 QTT ranks of the assembled canonical vectors in the lattice sum | 227
14.5 Summation of long-range potentials on 3D lattices with defects | 230
14.5.1 Sums of potentials on defected lattices in canonical format | 231
14.5.2 Tucker tensor format in summation on defected lattices | 232
14.5.3 Numerical examples for non-rectangular and composite
lattices | 233
14.6 Interaction energy of the long-range potentials on finite lattices | 237
VIII | Contents

15 Range-separated tensor format for many-particle systems | 241


15.1 Tensor splitting of the kernel into long- and short-range parts | 243
15.2 Tensor summation of range-separated potentials | 245
15.2.1 Quasi-uniformly separable point distributions | 246
15.2.2 Low-rank representation to the sum of long-range terms | 247
15.2.3 Range-separated canonical and Tucker tensor formats | 254
15.3 Outline of possible applications | 260
15.3.1 Multidimensional data modeling | 260
15.3.2 Interaction energy for many-particle systems | 263
15.3.3 Gradients and forces | 266
15.3.4 Regularization scheme for the Poisson–Boltzmann
equation | 268

Bibliography | 271

Index | 287
1 Introduction
All truths are easy to understand once they are discovered;
the point is to discover them.
Galileo Galilei

This research monograph describes novel tensor-structured numerical methods in ap-


plication to problems of computational quantum chemistry. Numerical modeling of
the electronic structure of molecules and molecular clusters poses a variety of com-
putationally challenging problems caused by the multi-dimensionality of the govern-
ing physical equations. Traditional computer algorithms for the numerical solution
of integral-differential equations are usually operating with the discretized represen-
tation of the multivariate functions, yielding nd amount of data for functions in ℝd .
This problem of the exponential increase in amount of data associated with adding
extra dimensions to a mathematical space was described by Richard Bellman [18] as
the “curse of dimensionality” (1961). This phenomenon can be hardly eliminated by
data-sparse techniques and grid refinement or by using parallelization and high per-
formance computing.
The tensor numerical methods that reduce, and even break the curse of dimen-
sionality, are based on a “smart” rank-structured tensor representation of the multi-
variate functions and operators using n×⋅ ⋅ ⋅×n Cartesian grids. In this book, we discuss
how the basic algebraic tensor decompositions originating from chemometrics, and
signal processing, recently led to revolutionizing the numerical analysis. However, the
multi-linear algebra of tensors is not a single cornerstone for tensor numerical meth-
ods. The prior results [93, 94] on theory of the low-rank tensor-product approxima-
tion of multivariate functions and operators, based, in particular, on sinc quadrature
techniques provided a significant background for starting the advanced approaches
in scientific computing. Thus the tensor-structured numerical methods resulted from
bridging the modern multi-linear algebra and the nonlinear approximation theory for
multivariate functions and operators.
Tensors are simply multidimensional arrays of real (complex) numbers. For ex-
ample, vectors are one-dimensional tensors with the number of entries n, while an
n × n-matrix is a two-dimensional tensor of size n2 . A third-order tensor can be gen-
erated, for example, by sampling a function of three spatial variables on n × n × n 3D
Cartesian grid in a volume box, then the number of entries is n3 . Storage size for a
tensor of order d grows exponentially in d as nd , provoking the curse of dimensional-
ity. The work required for computations with this tensor is also of the order of O(nd ).
Therefore, it is preferable to find a more efficient way to represent the multidimen-
sional arrays.
A rank-structured representation of tensors reduces the multidimensional array
to a sum of tensor products of vectors in d dimensions. In the case of equal size n,
for example, the storage and computations would be scaled as O(dnR), where R is

https://doi.org/10.1515/9783110365832-001
2 | 1 Introduction

the number of summands. This idea is well known since it was proposed by Frank L.
Hitchcock in 1927 [131] in the form of so-called “canonical tensors”. Thus, the canon-
ical tensor format allows us to avoid the curse of dimensionality. This can be seen as
a discrete analogue of the representation of a multivariate function by a sum of sepa-
rable functions. The main problem of the canonical tensor format is in the absence of
stable algorithms for its representation from a full size tensor.
The Tucker tensor decomposition was invented in 1966 by Ledyard R. Tucker and
was used in principal component analysis problems in psychometrics, chemomet-
rics, and signal processing for calculating the amount of correlations in experimen-
tal data. Usually these data contain rather moderate number of dimensions; the data
sizes in every dimension are not large, and the accuracy issues are not significant. The
main advantage of the Tucker tensor format is in the existence of stable algorithms
for the tensor decomposition based on the higher-order singular value decomposition
(HOSVD) introduced by Lieven De Lathauwer et al. in [61, 60]. However, this Tucker
algorithm from multilinear algebra requires an available storage for a full format ten-
sor, nd , and exhibits the complexity of the order of O(nd+1 ) for the HOSVD. Rather low
compression rate by the Tucker tensor decomposition in problems of principal compo-
nent analysis could hardly promote this method for accurate calculations in scientific
computing.
The fascinating story with the grid-based tensor numerical methods in scientific
computing started in 2006, when it was proven that the error of the Tucker tensor
approximation applied to several classes of function-related tensors decays exponen-
tially fast in the Tucker rank [161]. That is, instead of a three-dimensional (3D) tensor
having n3 entries in a full format, one obtains its Tucker tensor approximation given by
only O(3n + log3 n) numbers, thus gaining an enormous compression rate. The related
analytical results on the rank bounds for canonical tensors based on the sinc approxi-
mation method had been proven earlier by Ivan Gavrilyuk, Wolfgang Hackbusch, and
Boris Khoromskij in [94, 91, 111].
In numerical tests for several classical multivariate functions discretized on
n × n × n 3D Cartesian grids, it was shown that the Tucker decomposition provides
an easily computable low-rank separable representation in a problem-adapted basis
[173]. Such beneficial separable representation enables efficient numerical treatment
of the integral transforms, and other computationally extensive operations with the
multivariate functions. However, the HOSVD in the Tucker decomposition requires
full format tensors, which is often not applicable for numerical modeling in physics
and quantum chemistry. Thus, the HOSVD does not break the curse of dimensionality,
and has, indeed, a limited significance in computational practice.
In this regard, an essential advancement was brought forth by the so-called re-
duced higher order singular value decomposition (RHOSVD), introduced by Boris
Khoromskij and Venera Khoromskaia as part of the canonical-to-Tucker (C2T) trans-
form [173, 174]. The latter works efficiently in cases where the standard Tucker de-
composition is unfeasible. It was demonstrated that for the Tucker decomposition
1 Introduction | 3

of function-related tensors given in the canonical form, for example, resulting from
analytic approximation and certain algebraic transforms, there is no need to build
a full-size tensor. It is enough to find the orthogonal Tucker basis only by using the
directional matrices of the canonical tensor, consisting of skeleton vectors in every
single dimension. The C2T decomposition proved to be an efficient tool for reducing
the redundant rank parameter in the large canonical tensors. Since the RHOSVD does
not require the full size tensor, it promoted further development of tensor methods
also to higher dimensions, because it applies to canonical tensors, which are free from
the curse of dimensionality.
Furthermore, the orthogonal Tucker vectors, being an adaptive basis of the Tucker
tensor representation, exhibited smooth oscillating shapes, which can be viewed as
“fingerprints” for a given multivariate function. This property facilitated the multigrid
Tucker decomposition proposed in [174, 146], which enables fast 3D tensor calculus in
electronic structure calculations using incredibly large grids. Further gainful proper-
ties of the multigrid approach for the tensor numerical methods are not yet compre-
hensively investigated. Since the rank-structured tensor decompositions are basically
working on Cartesian grids, the methodology developed for the finite difference meth-
ods, including the Richardson extrapolation techniques, yielding O(h3 ) accuracy in
the mesh size h can be applied.
The traditional methods for numerical solution of the Hartree–Fock equation have
been developed in computational quantum chemistry. They are based on the ana-
lytical computation of the arising two-electron integrals,1 convolution-type integrals
in ℝ3 , in the problem-adapted naturally separable Gaussian-type basis sets [3, 277,
128], by using erf -functions. This rigorous approach resulted in a number of efficient
program packages, which required years of development by large scientific groups
and which are nowadays widely used in scientific community. See, for example, [299,
292, 88], and other packages listed in Wikipedia. Other models in quantum chemistry,
like the density functional theory [251, 107, 268], usually apply a combination of rigor-
ously constructed pseudopotentials, and the grid-based wavefunctions, as well as, the
experimentally justified coefficients. In general, for solution of the multidimensional
problems in physics and chemistry, it is often best to approximate the multivariate
functions by sums of separable functions. However, the initial separable representa-
tion of functions may be deteriorated by the integral transforms and other operations,
leading to cumbersome computational schemes.
In such a way, the success of the analytical integration methods for the ab-initio
electronic structure calculations stems from the big amount of precomputed infor-
mation based on the physical insight, including the construction of problem-adapted
atomic orbitals basis sets, and elaborate nonlinear optimization for calculation of the
density-fitting basis. The known limitations of this approach appear due to a strong

1 Also called electron repulsion integrals.


4 | 1 Introduction

dependence of the numerical efficiency on the size and quality of the chosen Gaussian
basis sets. These restrictions might be essential in calculations for larger molecules
and heavier atoms. Now, it is a common practice to reduce these difficulties by switch-
ing partially or completely to grid-based calculations. The conventional numerical
methods quickly encounter tractability limitations even for small molecules, and
when using moderate grid sizes. The real space multi-resolution approaches suggest
to reduce the grid size by local mesh refinements [122, 305], which may encounter
problems with computation of three-dimensional convolution integrals for functions
with multiple singularities.
The grid-based tensor-structured numerical methods were first developed for
solving challenging problems in electronic structure calculations. The main ingredi-
ents include the low-rank grid representation of multivariate functions and operators,
and tensor calculation of the multidimensional integral transforms, introduced by the
authors in 2007–2010, [166, 187, 145, 146, 147, 168]. An important issue was the possi-
bility for comparison of the results of tensor-based computations with the outputs of
benchmark quantum chemical packages, which use the analytical methods for cal-
culating the three-dimensional convolution integrals [300]. It was shown that tensor
calculation of the multidimensional convolution operators is reduced to a sequence of
one-dimensional convolutions and one-dimensional Hadamard and scalar products
[145, 146]. Such reduction to one-dimensional operations enables computations on
exceptionally fine tensor grids. The initial multilevel tensor-structured solver for the
Hartree–Fock equation was based on the calculation of the Coulomb and exchange
integral operators “on-the-fly”, using a sequence of refined uniform grids, thus avoid-
ing precomputation and storage of the two-electron integrals tensor [146, 187]. The
disadvantage of this version is rather substantial time consumption. This solver is
discussed in Chapter 8.
Further progress in tensor methods in electronic structure calculations was pro-
moted by a fast algorithm for the grid-based computation of the two-electron integrals
(TEI) [157, 150] in O(Nb3 ) storage in the number of basis functions Nb . The fourth-order
TEI tensor is calculated in a form of low-rank Cholesky factorization by using an al-
gebraic black-box-type “1D density fitting”scheme, which applies to the products of
discretized basis functions. Using the low-rank tensor representation of the Newton
convolving kernel and that of the products of basis functions, all represented on n×n×n
Cartesian grid, the 3D integral transforms are calculated in O(n log n) complexity. The
corresponding algorithms are described in Chapter 10.
The elaborated tensor-based Hartree–Fock solver [147], described in Chapter 11,
employs factorized representation of the two-electron integrals and tensor calculation
of the core Hamiltonian, including the three-dimensional Laplace and nuclear poten-
tial operators [156]. In the course of self-consistent iteration for solving the Hartree–
Fock eigenvalue problem, due to factorized representation of TEI, the update of the
Coulomb, and exchange parts in the Fock matrix, is reduced to cheap algebraic opera-
tions. Owing to grid representation of basis functions, the basis sets are not restricted
1 Introduction | 5

to Gaussian-type orbitals and are allowed to consist of any well-separable functions


defined on a grid. High accuracy is attained because of easy calculations on large 3D
grids up to n3 = 1018 , so that the high resolution with a mesh size of the order of atomic
radii h ≈ 10−4 Å is possible.
This Hartree–Fock solver is competitive in computational time and accuracy [147]
with the solvers in standard packages based on analytical calculations of the Fock op-
erator. It may also have weaker limitations on the size of a molecular system. It works
in MATLAB on a laptop for moderate-size molecules, and its efficiency is not yet inves-
tigated for larger computing facilities. It is a black-box type scheme; the input needs
only the charges and coordinates of nuclei, the number of electron pairs, and the ba-
sis set defined on the grid. The tensor approach shows a good potential for the ab-
initio calculations for finite lattices [154], and may be used in numerical simulations
for small nanostructures.
The progress of tensor methods for three-dimensional problems in quantum
chemistry motivated, in particular, development of novel tensor formats. Though the
matrix-product states (MPS) format was well known for modeling spin-type systems in
many dimensions [302, 294, 293, 264], a considerable impact on further developments
of tensor numerical methods in scientific computing was due to the tensor train (TT)
format [229, 226], introduced by Ivan Oseledets and Eugene Tyrtyshnikov in 2009. The
advanced TT Toolbox developed in the group of Ivan Oseledets [232] provides power-
ful tools for the function-related multilinear algebra in higher dimensions. A closely
related hierarchical Tucker tensor representation was introduced in 2009 by Wolfgang
Hackbusch and Stefan Kühn, [115]. Both the tensor train and hierarchical Tucker ten-
sor formats were established on the basis of earlier hierarchical dimension splitting
concept in [161].
The quantics tensor train (QTT) approximation, introduced by Boris Khoromskij in
2009,2 reduces the computational work on discretized multivariate functions to loga-
rithmic complexity, O(log n) [165, 167]. It was initiated by the idea to test the TT-ranks
of long function-related vectors of size 2d (or qd ), reshaped to multidimensional hy-
percubes (the quantized image). Then, it was proven that the reshaped n-vectors re-
sulting from discretization of classical functions like exponentials—plane waves or
polynomials—have surprisingly small or even constant QTT ranks [167]. Thus, one
comes to a paradoxical almost “mesh-free” grid-based calculations, when the size of
fine grids used in solution of the multidimensional problems remains practically un-
restricted. Numerical study on TT decomposition of 2d × 2d matrices, was presented in
[225]. A short description of the TT and QTT tensor formats is given in Chapter 4.
Notice that the tensor numerical methods are now recognized as a powerful tool
for solving the multidimensional partial differential equations (PDEs) discretized by

2 The paper on QTT approximation was first published in September 2009 as the Preprint 55/2009 of
the Max-Planck Institute for Mathematics in the Sciences in Leipzig.
6 | 1 Introduction

traditional grid-based schemes. The tensor approach established a new branch of


numerical analysis, providing efficient algorithms for solving the multidimensional
integral-differential equations in ℝd , with linear or even logarithmic complexity scal-
ing in the dimension [169, 170, 152, 171]. We also refer to literature surveys on tensor
spaces and multilinear algebra [102, 110, 119]. It is still a challenging issue how to
make tensors work for problems on complicated geometries.
The rank-structured Cholesky factorization of the two-electron integrals gave rise
to a new approach for calculating the excitation energies of molecules introduced in
[23] in the framework of the Bethe–Salpeter equation (BSE), [252]. The BSE is a widely
used model for ab-initio estimation of the absorption spectra for molecules or surfaces
of solids. This eigenvalue problem is a complicated task due to the size of the corre-
sponding matrix that scales as O(Nb2 ), where Nb is the number of the basis functions.
The new approach based on the diagonal plus low-rank representation of the gen-
erating matrix combines the iterative solution of a large rank-structured eigenvalue
problem with the reduced basis approach for finding certain number of the smallest in
modulo eigenvalues. An interesting solution was found by representing the exchange
part of the BSE matrix by a small-size block, which has led to reduction of the over-
all ranks, and improved accuracy of excitation energies calculations. As a result, the
complexity scaling for the numerical solution of the BSE eigenvalue problem is re-
duced from O(Nb6 ) to O(Nb2 ).
An efficient interpolation scheme to approximate the spectral density or density
of states (DOS) for BSE problem was introduced in [27]. It is based on calculating
the traces of parametric matrix resolvents at interpolation points by taking advan-
tage of the block-diagonal plus low-rank matrix structure in the BSE Hamiltonian. It
is also shown that a regularized or smoothed DOS discretized on a fine grid of size N
can be accurately interpolated using its low rank adaptive QTT tensor representation.
The QTT tensor format provides good interpolation properties for strictly oscillating
functions with multiple gaps like DOS, and requires asymptotically much fewer (i. e.,
O(log N)) functional calls compared with the full grid size N. This approach eliminates
the computational difficulties of the traditional schemes by avoiding both the need
of stochastic sampling, and interpolation by problem-independent functions such as
polynomials.
In summary, the tensor-structured approach leads the way to the numerical so-
lution of the Hartree–Fock equation, and subsequent MP2 corrections, based on the
efficient grid-based calculation of the two-electron integrals, and the core Hamilto-
nian. The tensor-based ab-initio calculations also provide good prerequisites for post
Hartree–Fock computations of the excitation energies, and of the optical spectra, by
using moderate computing requirements. These new techniques in electronic struc-
ture calculations can be considered as a starting point for a thorough scientific devel-
opment and investigation. We notice that the grid-based tensor methodology allows
the efficient implementation of the most numerical schemes in a black-box way.
1 Introduction | 7

Another challenging problem in computational chemistry is the summation of


a large number of the long-range (electrostatic) potentials distributed on finite 3D
lattices with vacancies. The recent tensor-based method for summation of the long-
range potentials on a finite L × L × L lattice, introduced by the authors in [149, 148],
provides the computational cost O(L) contrary to O(L3 log L) by the traditional Ewald-
type methods [79]. It employs the low-rank canonical/Tucker tensor representations
of a single Newton potential3 by using the Laplace transform and sinc-approximation.
The required precision is guaranteed by employing large 3D Cartesian grids for the
1
representation of a single reference potential, ‖x‖ , x ∈ ℝ3 . The resulting rank of a ten-
sor representing a sum of a large number, say millions of potentials on a finite three-
dimensional lattice, remains the same as for the single reference potential. Indeed,
the summation in the volume is reduced to a simple addition of entries in the skele-
ton vectors of the canonical tensor, thus producing the so-called “assembled” vectors
of the collective potential on a lattice. The method remains efficient for multidimen-
sional lattices with step-type geometries and in the presence of multiple defects [149].
The interaction energy of the electrostatic potentials on a lattice is then computed in
sub-linear cost, O(L2 ) [152].
For multiparticle systems of general type, a novel range separated (RS) tensor for-
mat was proposed and analyzed by Peter Benner, Venera Khoromskaia, and Boris
Khoromskij in [24]. It provides the rank-structured tensor approximation of highly
non-regular functions with multiple singularities in ℝ3 , sampled on the fine n × n × n
grid. These can be the electrostatic potentials of a large atomic system like a bio-
molecule or the multidimensional scattered data modeled by radial basis functions,
etc. The main advantage of the RS tensor format is that the partition into the long and
short range parts is performed just by sorting of a small number of skeleton vectors in
the low-rank canonical tensor representation of the generating kernel. It was proven
[24] that the sum of long range contributions from all particles can be represented in a
form of a low-rank canonical or Tucker tensor at the O(n) storage cost, with a rank pa-
rameter only weakly (logarithmically), depending on the number of particles N. The
basic tool here is again the RHOSVD algorithm. The representation complexity of the
short range part is O(N) with a small prefactor independent on the number of parti-
cles.
Note that the RS tensor format differs from the traditional tensor representations
in multilinear algebra since it intrinsically applies to function related tensors. The RS
format originates from the short-long range splitting within the low-rank representa-
1
tion of a singular potential, for example, the Newton potential ‖x‖ or other radial ba-
sis functions in ℝd , d ≥ 3. It essentially extends the applicability of tensor numerical
methods in scientific computing.

3 The method works also for other types of multivariate radial basis functions p(‖x‖).
8 | 1 Introduction

In recent years, due to progress in computer science, the grid-based approaches


and real-space numerical methods have attracted more attention in computational
quantum chemistry since they allow, in principle, the efficient approximation to the
physical entities of interest with a controllable precision [16, 122, 305, 35] and also of-
fer the new techniques for calculation of the molecular excitation energies [23, 27]. On
the one side, modern supercomputing facilities enable usage of the computational al-
gorithms [86, 76], which could be considered in former times as unfeasible, and on
the other side, there are completely new approaches like the rank-structured tensor
numerical methods [187, 157, 147, 152], which suggest a new interpretation to usage
of uniform grids in many dimensions. These topics have been recently addressed in
a special issue of the PCCP journal devoted to real-space methods in quantum chem-
istry [85].
Finally, we would like to thank Wolfgang Hackbusch, our former director at the
Max-Planck Institute (MPI) in Leipzig, for his continuous attention to our research and
for interesting discussions. We are appreciative to Peter Benner, the director at the MPI
in Magdeburg, for the fruitful collaboration in the recent years. We thank Felix Otto,
the director at the MPI in Leipzig for his encouraging support of this book project. We
thank Heinz-Juergen Flad and Reinhold Schneider for our fruitful collaboration which
was a significant step in development of tensor methods in computational quantum
chemistry. We would like to thank Andreas Savin and Dirk Andrae for valuable collab-
oration and discussions.
This research monograph on the tensor-structured numerical methods is an in-
troduction to the modern field of numerical analysis with applications in computa-
tional quantum chemistry. Many of the presented new topics are based on the papers
published by the authors in the recent decade during their research work at the Max-
Planck Institute for Mathematics in the Sciences in Leipzig.
This book may be interesting for a wide audience of students and researchers
working in computational chemistry and material science, as well as in numerical
analysis, scientific computing, and multi-linear algebra. There is already a number
of promising results in tensor numerical methods, and there is even more work to be
done. We present some algorithms in MATLAB for a quick start in this new field. Nu-
merous pictures can be helpful in explaining the main topics.

Leipzig, 2017 Venera Khoromskaia


Boris Khoromskij
2 Rank-structured formats for multidimensional
tensors
2.1 Some notions from linear algebra
From the wide-ranging realm of linear algebra, we discuss here only a set of notions
which are essential in describing the main topics of the tensor-structured numerical
methods. We refer to a number of standard textbooks on linear algebra, for example,
[98, 282, 202, 272].

2.1.1 Vectors and matrices

An ordered set of numbers is called a (column) vector,

u1
[u ]
[ 2] n
u=[
[ .. ] ∈ ℝ .
]
[.]
[un ]

To show that it is a column vector, one can write it explicitly, u ∈ ℝn×1 . Transpose of a
column vector uT is a row vector, uT ∈ ℝ1×n .
Products of column and row vectors give different results depending on the order
of multiplication. Multiplying a row vector with a column vector, we obtain a scalar
product of vectors, and the result is a number. That is, the scalar (or inner) product of
two vectors uT ∈ ℝ1×n and v ∈ ℝn×1 is the real number given by

v1
[v ]
[ 2]
uT v = [u1 u2 ⋅⋅⋅ [ .. ] = u1 v1 + u2 v2 + ⋅ ⋅ ⋅ + un vn .
un ] [ ]
[.]
[vn ]

The scalar product of vectors is the main ingredient of matrix–matrix multiplications,


and it is the prototype of contraction operation in tensor algebra.
The Euclidean norm of a vector (or its length) is ‖u‖ = √uT u.
Multiplying a column vector with a row vector is called a tensor product (or outer
product), where vectors may be of different size, and the result is a matrix. Thus a ten-
sor product increases the number of dimensions d of the resulting data array: multiply-
ing two vectors (one-dimensional arrays), we obtain a matrix, i. e., a two-dimensional
array corresponding to d = 2. Indeed, the tensor product of a column vector u ∈ ℝm×1

https://doi.org/10.1515/9783110365832-002
10 | 2 Rank-structured formats for multidimensional tensors

and a row vector vT ∈ ℝ1×n is a (rank-1) matrix of size m × n,

u1 u1 v1 u1 v2 ⋅⋅⋅ u1 vn
[u ] [u v u2 v2 ⋅⋅⋅ u2 vn ]
[ 2] [ 2 1 ]
A = u ⊗ v = uvT = [
[ .. ] ⋅ [v1
] v2 ⋅⋅⋅ vn ] = [
[ .. .. .. .. ]]. (2.1)
[ . ] [ . . . . ]
[um ] [um v1 um v2 ⋅⋅⋅ um vn ]

A sum of two tensor products of vectors of corresponding lengths will be a matrix


of rank-2, and so on. A matrix of rank-R is a sum of R terms, each being a tensor product
of two vectors.
In general, an m × n matrix A is a rectangular array of real numbers arranged into
m rows and n columns,

a11 a12 ⋅⋅⋅ a1n


[a a22 ⋅⋅⋅ a2n ]
[ 21 ]
A=[
[ .. .. .. .. ]
],
[ . . . . ]
[am1 am2 ⋅⋅⋅ amn ]

where the number aij is called the entry of the matrix. A matrix A is an element of the
linear vector space ℝm×n equipped with an Euclidean scalar product
m n
⟨A, B⟩ = ∑ ∑ aij bij (2.2)
i=1 j=1

and the Euclidean (Frobenius) norm of a matrix

m n
‖A‖ = √∑ ∑ a2ij . (2.3)
i=1 j=1

Computation of the Frobenius norm of a general matrix needs O(nm) operations. But
for the rank-1 matrices, A = u ⊗ v = uvT , the norm

m n m n m n
‖A‖ = √∑ ∑(ui vj )2 = √∑ ∑ u2i vj2 = √∑ u2i ∑ vj2 = ‖u‖ ⋅ ‖v‖
i=1 j=1 i=1 j=1 i=1 j=1

can be computed only in O(m + n) operations.


Multiplication of vectors and matrices is based on scalar products of vectors and
also depends on their positions with respect to each other. A row vector can be mul-
tiplied from the left with a matrix of a proper size (the size of the row vector should
be equal to the column length of a matrix), and the result is a row vector of the length
equal to the size of a matrix row.
A matrix can be multiplied by a column vector of the size coinciding with the size
of matrix rows, thus resulting in a column vector of the size corresponding to matrix
columns.
2.1 Some notions from linear algebra | 11

When multiplying the row vector and the matrix, the entries of the resulting vector
are computed by scalar products of the vector and columns vectors of a matrix (their
sizes should coincide),

a11 a12 ⋅⋅⋅ a1n


[a a22 ⋅⋅⋅ a2n ]
[ 21 ]
uT A = [u1 u2 ⋅⋅⋅ um ] [
[ .. .. .. .. ]
]
[ . . . . ]
[am1 am2 ⋅⋅⋅ amn ]
= [u1 a11 + u2 a21 + ⋅ ⋅ ⋅ + um am1 ⋅⋅⋅ u1 a1n + u2 a2n + ⋅ ⋅ ⋅ + um amn ] ∈ ℝ1×n .

Multiplication of a matrix with a column vector is performed by the scalar products of


every row of a matrix with this column vector, thus producing a column vector with
the size equal to the number of matrix rows,

a11 a12 ⋅⋅⋅ a1n v1 c1


[a a22 ⋅⋅⋅ a2n ] [v ] [ c ]
[ 21 ] [ 2] [ 2 ]
Av = [
[ .. .. .. .. ]
]
[ . ] = [ . ],
[.] [ . ] (2.4)
[ . . . . ] [.] [ . ]
[am1 am2 ⋅⋅⋅ amn ] [vn ] [cm ]

where

c1 a11 v1 + a12 v2 + ⋅ ⋅ ⋅ + a1n vn


[c ] [ a v + a v + ⋅ ⋅ ⋅ + a v ]
[ 2 ] [ 21 1 22 2 2n n ]
[ . ]=[ .. ] ∈ ℝm . (2.5)
[ . ] [ ]
[ . ] [ . ]
[cm ] [am1 v1 + am2 v2 + ⋅ ⋅ ⋅ + amn vn ]

In matrix–matrix multiplications, the above scheme applies column-wise to the


right factor or row-wise to the left one.

2.1.2 Matrix–matrix multiplication. Change of basis

Matrix–matrix multiplication can be explained from the point of view of the matrix–
vector multiplication (2.4). For two matrices A ∈ ℝm×n and B ∈ ℝn×p , their product is a
matrix

C = AB, C ∈ ℝm×p .

As we can see, each entry of the resultant matrix C is obtained as the scalar product of
two n-vectors. Complexity of multiplication of two square matrices is O(n3 ). If one of
matrices is given as an R-term sum of tensor products of vectors, then the complexity
of multiplication is O(Rn2 ).
12 | 2 Rank-structured formats for multidimensional tensors

A general matrix A in a vector space ℝm×n is supposed to be presented in a basis


of a vector space spanned by the unit vectors for rows and columns, ei and ej in the
spaces ℝ1×n and ℝm×1 , respectively. Here ei is a vector with all entries equal to zero
except the entry with the number i. That is, an entry of the matrix is given by aij =
⟨ei , Aej ⟩.
One can change a representation basis of a matrix, that is, a given matrix can be
presented in a new basis, given by the set of column vectors of transformation matri-
ces. This is done by multiplication (mapping) of the target matrix from both sides by
the matrices representing the basis sets. If both matrices are invertible (in which case
the size of the mapping matrices equals to the size of the original matrix), then this
change of basis is reversible. If the mapping matrices are not square, then the mapping
is not reversible.
A matrix A ∈ ℝn×n in a new basis given by columns of matrix U ∈ ℝn×n is repre-
sented in the factorized form

AU = U T AU, AU ∈ ℝn×n . (2.6)

The matrix AU is the Galerkin projection of A to subspace spanned by columns of U,


see Figure 2.1. If U is invertible, then one can get the original matrix A using matrices
U −1 and U −T , A = U −T AU U −1 . If U ∈ ℝn×m , m < n, then AU ∈ ℝm×m , and the operation
is not reversible.

Figure 2.1: A matrix A in the basis of column vectors of the matrix U yields the matrix AU .

The Kronecker product is an operation on matrices in linear algebra that maps matri-
ces to a matrix. The Kronecker product of matrices A ∈ ℝm×n and B ∈ ℝp×q is defined
by

a11 B a12 B ⋅⋅⋅ a1n B


[a B a22 B ⋅⋅⋅ a2n B ]
[ 21 ] mp×nq
A⊗B=[
[ .. .. .. .. ]]∈ℝ .
[ . . . . ]
[am1 B am2 B ⋅⋅⋅ amn B]

In general, this operation is not commutative: A ⊗ B ≠ B ⊗ A.


2.1 Some notions from linear algebra | 13

2.1.3 Factorization of matrices

Factorized low-rank representation of matrices reduces the cost of linear algebra op-
erations considerably. There is a number of methods for decomposing a matrix into
a sum of tensor products of vectors. In the following, we discuss briefly the singular
value decomposition (SVD), the QR-factorization, and the Cholesky decomposition.
There is a large number of routines on various platforms that can be applied to calcu-
late these decompositions. For convenience, we refer to the corresponding commands
in Matlab.
(1) We start with an eigenvalue decomposition (EVD), which diagonalizes a ma-
trix, that is, finds a basis in which the symmetric matrix becomes diagonal. The eigen-
value decomposition for the symmetric matrix requires the full set of eigenvectors and
eigenvalues for the algebraic problem

Au = λu.

A Matlab command

[V,D] = eig(A)

produces a diagonal matrix D of eigenvalues and a full orthogonal matrix V whose


columns are the corresponding eigenvectors so that A = VDV T or AV = VD.
(2) When A is a rectangular or non-symmetric matrix, the singular value decom-
position can be applied. In fact, the SVD suggests to solve the eigenvalue problems for
auxiliary symmetric positive definite matrices AAT ∈ ℝm×m and AT A ∈ ℝn×n .

Theorem 2.1. Let A ∈ ℝm×n , with m ≤ n, for definiteness. Then there exist U ∈ ℝm×m ,
Σ ∈ ℝm×n , and V ∈ ℝn×n such that

A = UΣV T , (2.7)

where Σ is a diagonal m × n matrix whose diagonal entries σi , i = 1, 2, . . . , m, are the


ordered singular values of A, σ1 ≥ σ2 ≥ ⋅ ⋅ ⋅ ≥ σm ≥ 0, and U T U = Im and V T V = In , with
In denoting the n × n identity matrix.

The algebraic complexity of the SVD transform scales as O(mn2 ). We have


– U ∈ ℝm×m is a matrix composed of orthonormal vectors (columns);
– V T ∈ ℝn×n is a matrix composed of orthonormal vectors (rows);
– Σ ∈ ℝm×n is a diagonal matrix of singular values.

Here, matrices U and V include the full set of left and right singular vectors, respec-
tively,
14 | 2 Rank-structured formats for multidimensional tensors

σ1 0 ⋅⋅⋅ 0 ⋅⋅⋅ 0 vT1


[0 σ2 ⋅⋅⋅ 0 ⋅⋅⋅ 0] [vT ]
[ ] [ 2]
U = [u1 u2 ⋅⋅⋅ um ], Σ=[
[ .. .. .. ..
],
] VT = [
[ .. ] .
]
[. . . . ] [ . ]
T
[0 0 ⋅⋅⋅ σm ⋅⋅⋅ 0] [vn ]
The rank of A, r = rank(A), does not exceed r ≤ m. If the singular values are rapidly
decaying, the SVD gives the possibility to approximate a rectangular matrix A in a
factorized form.The best approximation of an arbitrary matrix A ∈ ℝm×n by a rank-r
matrix Ar (say, in Frobenius norm, that is, ‖A‖2F = ∑i,j a2ij ) can be calculated by the
truncated SVD as follows.
Let us consider (2.7) and set Σr = diag{σ1 , . . . σr , 0, . . . , 0}. Then the best rank-r ap-
proximation is given by
r
Ar := UΣr V T = ∑ σi ui vTi ,
i=1

where ui , vi are the respective left and right singular vectors of A. The approximation
error in the Frobenius norm is bounded by a sum of squares of discarded singular
values:
n
‖Ar − A‖F ≤ √ ∑ σi2 . (2.8)
i=r+1

The SVD Matlab routine

[U,S,V] = svd(A)

produces a diagonal matrix S of singular values and the orthogonal matrices U and V
whose columns are the corresponding singular vectors so that A = USV T .
(3) LU decomposition represents a matrix as a product of lower and upper triangu-
lar matrices. This decomposition is commonly used in the solution of linear systems
of equations. For the LU decomposition

A = LT U,

the corresponding Matlab routine reads

[L,U] = lu(A)

so that A = LT U.
(4) The orthogonal-triangular decomposition is called QR factorization that is

A = QR.

The corresponding MATLAB routine of an m-by-n matrix A reads

[Q,R] = qr(A)
2.1 Some notions from linear algebra | 15

and produces an m-by-n upper triangular matrix R and an m-by-m unitary matrix Q so
that A = QR, QT Q = I.
(5) Cholesky decomposition of a symmetric non-negative definite matrix A,

A = RT R

produces an upper triangular matrix R satisfying the equation RT R = A. The chol func-
tion in MATLAB,

R = chol(A)

assumes that A is symmetric and positive definite.


In most of applications one deals not with the exact rank of a matrix, but with the
so-called ε-rank. This concerns with the rank optimization procedure based on the
truncated SVD such that in (2.8) we estimate the ε-rank from the condition
n
∑ σi2 ≤ ε2 .
i=r+1

In the following section we present some numerical experiments illustrating the


adaptive low-rank approximation of a matrix.

2.1.4 Examples of rank decomposition for function related matrices

In Example 1 below, we present a simple MATLAB script for testing the decay of the
singular values of several matrices. First, a two-dimensional Slater function, e−α‖x‖ ,
is discretized in a square box [−b/2, b/2]2 using the n × n 2D Cartesian grids with n =
65, 257, and n = 513, and the SVD is computed for the resulting matrices. Figure 2.2
(left) shows exponentially fast decay of singular values for all three matrices nearly
independently on the matrix size.

Figure 2.2: Decay of singular values for a matrix generated by a Slater function (left) and for a matrix
containing random valued entries (right).
16 | 2 Rank-structured formats for multidimensional tensors

Next, we compose matrices of the same size, but using the generator of random num-
bers in the interval [0, 1]. The singular values of these matrices are shown in Figure 2.2
(right). They are not decaying fast, as was the case for the function related matrix.

%____Example 1____________________
clear; b=10; alp=1;
figure(1);
[Fun,sigmas,x,y] = Gener_Slat(65,b,alp); semilogy(sigmas);
hold on; grid on;
[~,sigmas,~,~]= Gener_Slat(257,b,alp); semilogy(sigmas,'r');
[~,sigmas,~,~]= Gener_Slat(513,b,alp); semilogy(sigmas,'black');
grid on; axis tight; set(gca,'fontsize',16);
hold off;
figure(2); mesh(x,y,Fun);
figure(3);
A1 = rand(65,65); [~,S1,~]= svd(A1); semilogy(diag(S1));
hold on; grid on;
A = rand(257,257); [~,S1,~]= svd(A); semilogy(diag(S1),'r');
A = rand(513,513); [~,S1,~]= svd(A); semilogy(diag(S1),'black');
grid on; axis tight; set(gca,'fontsize',16);
hold off;
%______________________
function [Fun1,sigmas,x,y]=Gener_Slat(n1,b,alpha1)
h1=b/(n1-1); x=-b/2:h1:b/2; y=-b/2:h1:b/2;
Fun1=zeros(n1,n1);
for i=1:n1
Fun1(i,:)= exp(-alpha1*sqrt(x(1,i)^2 +y(1,:).^2));
end
[~,S1,~]=svd(Fun1); sigmas=diag(S1);
end
%____________end of Example 1____________________________

Note that the slope in the Slater function in Example 1 is controlled by the parameter
“alp”. One can generate a Slater function with sharper or smoother shape by changing
the parameter “alp” and observe nearly the same behavior of singular values.
Example 2 demonstrates the error of the approximation to the discretized Slater
function (given by a matrix A) by a sum of tensor products of the singular vectors with
the first m = 18 singular values.

m
A = u1 σ1 vT1 + ⋅ ⋅ ⋅ + um σm vTm = ∑ σi ui vT .
i=1
2.1 Some notions from linear algebra | 17

Figure 2.3: A matrix representing a discretized two-dimensional Slater function (left) and the error of
its rank-18 factorized representation (right).

When running this program, figure (3) works as an “animation” picture, where one
can distinctly notice diminishing of the error of the approximation within the loop
while adding to approximation more summands with smaller singular values. Fig-
ure 2.3 (left) shows the original discretized function with cusp at zero, and Figure 2.3
(right) shows the final approximation error for rank r = m = 18.

%____Example 2_________________________________________
b=10; n=412; h1=b/n;
x=-b/2:h1:b/2; [~,n1]=size(x);
y=-b/2:h1:b/2;
A1=zeros(n1,n1); alpha1=1;
for i=1:n1
A1(i,:)= exp(-alpha1*sqrt(x(1,i)^2 +y(1,:).^2));
end
figure(1); mesh(x,y,A1);
[U1 S1 V1]=svd(A1); sigmas=diag(S1);
figure(5); semilogy(sigmas);

r1=18; Ar1 = zeros(n1,n1);


for i=1:r1
Ar = sigmas(i,1) *U1(:,i)*V1(:,i)';
Ar1 =Ar1+Ar;
figure(2); mesh(x,y,Ar1); drawnow;
ER_A=abs(A1 - Ar1);
figure(3); mesh(x,y,ER_A); drawnow;
end
%_______________end of Example 2________________________
18 | 2 Rank-structured formats for multidimensional tensors

2.1.5 Reduced SVD of a rank-R matrix

Let us consider a rank-R matrix M = ABT ∈ ℝn×n , with the factor matrices A ∈ ℝn×R and
B ∈ ℝn×R , where R ≤ n. We are interested in the best rank r approximation of M, with
r < R. It can be implemented using the following algorithm that avoids the singular
value decomposition of the target matrix M with possibly large n.
This algorithm includes the following steps:
(1) Perform the QR-decomposition of the side matrices,

A = QA RA , B = QB RB ,

with the unitary matrices QA , QB ∈ ℝn×R and the upper triangular matrices RA , RB ∈
ℝR×R .
(2) Compute the SVD of the core matrix, RA RTB ∈ ℝR×R

RA RTB = UΣV T ,

with the diagonal matrix Σ = diag{σ1 , . . . , σR } and unitary matrices U, V ∈ ℝR×R .


(3) Compute the best rank-r approximation of the core matrix, UΣV T ≈ Ur Σr VrT , by
extracting the submatrix Σr = diag{σ1 , . . . , σr } in Σ, and the first r columns Ur , Vr ∈
ℝR×r in the unitary matrices U and V, respectively.
(4) Finally, set the rank-r approximation Mr = QA Ur Σr VrT QTB , where QA Ur and QB Vr
are n × r unitary matrices.

The approximation error is bounded by √∑Ri=r+1 σi2 . The complexity of the above algo-
rithm scales linear in n, O(nR2 ) + O(R3 ). In the case R ≪ n, this reduces dramatically
the cost O(n3 ) of the truncated SVD applied to the full-format n × n matrix M.
Low-rank approximation of matrices by using only partial information can be
computed by heuristic adaptive cross approximation (ACA) methods developed in
[286, 99, 287, 288, 15, 289, 14, 223], see also literature therein. Dynamical low-rank
approximation of matrices has been analyzed in [191].

2.2 Introduction to multilinear algebra


The ideas and algorithms for the low-rank tensor approximation of multi-dimensional
data by using the canonical (CP/CANDECOMP/PARAFAC) and Tucker tensor decom-
positions have been originally developed in chemometrics, psychometrics [49, 198,
50, 197], and then in signal processing and experimental data analysis [310, 192,
270, 59, 254, 62, 140]. The early papers on the polyadic (canonical) decomposition
by F. L. Hitchcock in 1927 [131, 132] and the orthogonal Tucker tensor decomposition,
introduced by L. R. Tucker in 1966 [284], gave rise to multilinear algebra of rank-
structured tensors. Comprehensive surveys on multi-linear algebra with applications
2.2 Introduction to multilinear algebra | 19

in principal component analysis and image, and signal processing, are presented in
[54, 270, 1, 193]. Nowadays, there is an extensive research on tensor decomposition
methods in computer science towards big data analysis, see for example [2, 55].
Notice that the tensor decompositions have been used in computer science mostly
for quantitative analysis of correlations in the multidimensional data arrays obtained
from experiments, without special requirements on the accuracy of decompositions.
Usually these data arrays have been considered for a small number of dimensions
(modes) and moderate mode sizes.
A mathematical approval and analysis of the Tucker tensor decomposition algo-
rithm was presented in 2000 in the seminal works of L. De Lathauwer, B. De Moor,
and J. Vandewalle on the higher-order singular value decomposition [61] and on the
best rank-(r1 , . . . , rd ) orthogonal Tucker approximation of higher-order tensors [60].
The higher-order singular value decomposition (HOSVD) provides a generalization of
the matrix singular value decomposition [98]. The main limitation of the Tucker al-
gorithm from computer science [61, 193, 78] is the requirement to have a storage for
full size tensor nd , as well as the complexity of HOSVD, O(nd+1 ), which includes the
singular value decomposition of directional unfolding matrices. This makes HOSVD
and the corresponding Tucker decomposition algorithm practically unfeasible for the
problems in electronic structure calculations, and for solving multidimensional PDEs.
However, multilinear algebra with the Tucker tensor decomposition via HOSVD
was one of the starting points for the tensor numerical methods. In what follows, we
recall the tensor formats and main algorithms [60, 9] from multilinear algebra, where
the techniques are being developed in view of the arbitrary content of the multidimen-
sional arrays.
In forthcoming chapters, we shall see that the content of a tensor matters and that
for function-related multidimensional arrays, even the standard multilinear algebra
algorithms provide amazing results. One can further enhance the schemes by taking
into account the predictions from approximation theory [111, 161] on the exponentially
fast convergence of the Tucker/CP decompositions in tensor rank applied to the grid-
based representation of the multidimensional functions and operators.
Let us start with the multilinear algebra approach to rank-structured tensor ap-
proximation, taking into account a general tensor content.

2.2.1 Full format dth order tensors

A tensor of order d is a multidimensional array over a d-tuple index set,

A = [ai1 ...id ] ∈ ℝn1 ×n2 ×⋅⋅⋅×nd , (2.9)

where iℓ ∈ Iℓ = {1, . . . , nℓ } is a set of indexes for each mode ℓ, ℓ = 1, . . . , d. A tensor A is


an element of the linear vector space
20 | 2 Rank-structured formats for multidimensional tensors

d
𝕍n = ⨂ ℝnℓ ,
ℓ=1

where n = (n1 , . . . , nd ), with the entry-wise addition

(A + B)i = ai + bi

and the multiplication by a constant

(cA)i = cai (c ∈ ℝ).

The linear vector space 𝕍n of tensors is equipped with the Euclidean scalar product
⟨⋅, ⋅⟩ : 𝕍n × 𝕍n → ℝ defined as

⟨A, B⟩ := ∑ ai1 ...id bi1 ...id for A, B ∈ 𝕍n , (2.10)


(i1 ...id )∈ℐ

where i is the d-tuple index set i = (i1 , . . . , id ). The related norm

‖A‖F := √⟨A, A⟩

is called the Frobenius norm, as for matrices.


Notice that a vector is an order-1 tensor, whereas a matrix is an order-2 tensor,
so the Frobenius tensor norm coincides with the Euclidean norm of vectors and the
Frobenius norm of matrices, respectively.
The number of entries in a tensor scales exponentially in the dimension,

d
N = ∏ nℓ , that is, for nℓ = n, N = nd .
ℓ=1

This phenomenon is often called the “curse of dimensionality”. As a result, any mul-
tilinear operations with tensors given in full format (2.9), for example, computation of
a scalar product, have an exponential complexity scaling O(nd ).
Some multilinear algebraic operations with tensors of order d (d ≥ 3), can be re-
duced to the standard linear algebra by unfolding of a tensor into a matrix.
Unfolding of a tensor A ∈ ℝI1 ×⋅⋅⋅×Id along the ℓ-mode1 arranges the ℓ-mode columns
of a tensor to be the columns of the resulting unfolding matrix. Figure 2.4 shows un-
folding of a 3D tensor. The unfolding of a tensor is a matrix whose columns are the
respective fibers2 along ℓ-mode, ℓ = 1, . . . , d.

1 Note that in multilinear algebra the notion “mode” is often used for designating the particular di-
mension. ℓ-mode means the dimension number ℓ. Also, tensors of order d are called d-dimensional
tensors.
2 Fibers along mode ℓ are generalization of notions of rows and columns for matrices.
2.2 Introduction to multilinear algebra | 21

Figure 2.4: Unfolding of a 3D tensor for mode ℓ = 1.

Specifically, the unfolding of a tensor along mode ℓ is a matrix of size nℓ ×


(nℓ+1 ⋅ ⋅ ⋅ nd n1 ⋅ ⋅ ⋅ nℓ−1 ), further denoted by

A(ℓ) = [aij ] ∈ ℝnℓ ×(nℓ+1 ⋅⋅⋅nd n1 ⋅⋅⋅nℓ−1 ) , (2.11)

whose columns are the respective fibers [193] of A along the ℓth mode such that the
tensor entry ai1 i2 ...id is mapped into the matrix element aiℓ j where the long index is given
by
d k−1
j = 1 + ∑ (ik − 1)Jk , with Jk = ∏ nm .
k=1,k =ℓ̸ m=1,m=ℓ̸

In Matlab, the unfolding operation is performed by a simple reshape command.


Then unfolding is done with respect to the given (first) variable. For unfolding along
other variables, it is necessary to make a corresponding permutation (reordering of
the dimensions) by using permute command as shown below.
The following script demonstrates the Matlab implementation for unfolding of a
3D tensor of size 5 × 7 × 10:
_________________________________________________________________
n1=5; n2=7; n3=10;
A=rand(n1,n2,n3) % generate a 3D tensor with random coefficients
% unfolding along mode 1:
A1= reshape(A,n1,n2*n3);
% or for unfolding along mode 3,
B=permute(A,[3,2,1]); A3= reshape(B,n3,n1*n2);
_________________________________________________________________

The size of the unfolding matrix A(1) in the above Matlab example is 5 × 70, whereas
the size of the unfolding matrix A(3) is 10 × 35.
Another important tensor operation is the so-called contracted product of two ten-
sors. This operation is similar to matrix–matrix multiplication with the difference that
for matrices it is important that matrices are positioned in a proper way for multipli-
cation over the compatible size; in the case of tensors, one explicitly determines the
22 | 2 Rank-structured formats for multidimensional tensors

contraction mode ℓ. In the following, we frequently use the tensor–matrix multiplica-


tion along mode ℓ.

Definition 2.2 ([59]). Contracted product: Given a tensor A ∈ ℝI1 ×⋅⋅⋅×Id and a matrix
M ∈ ℝJℓ ×Iℓ , we define the respective mode-ℓ tensor–matrix product by

B = A ×ℓ M ∈ ℝI1 ×⋅⋅⋅×Iℓ−1 ×Jℓ ×Iℓ+1 ×⋅⋅⋅×Id , (2.12)

where3
nℓ
bi1 ...iℓ−1 jℓ iℓ+1 ...id = ∑ ai1 ...iℓ−1 iℓ iℓ+1 ...id mjℓ iℓ , jℓ ∈ Jℓ .
iℓ =1

Contraction can be easily performed by using the following sequence of operations,


– matrix unfolding (reshaping) of the tensor;
– matrix–matrix multiplication over the corresponding dimension;
– reshaping of the resulting matrix back to a tensor.

The examples of contractions of a tensor with a matrix are shown in the subroutine
for the Tucker decomposition algorithm presented in Section 3.
The tensor–matrix contracted product can be applied successively along several
modes, and it can be shown to be commutative:

(A ×ℓ M) ×m P = (A ×m P) ×ℓ M = A ×ℓ M ×m P, ℓ ≠ m.

We notice the convenience of notation of type ×ℓ since it gives explicitly the mode
number, which is subjected to contraction.
Figure 2.5 illustrates a sequence of contracted products of a tensor A ∈ ℝn1 ×n2 ×n3
with matrices M3 ∈ ℝr3 ×n3 , M2 ∈ ℝr2 ×n2 , and M1 ∈ ℝr1 ×n1 as follows:
– Contraction of tensor A in mode ℓ = 3 with the matrix M3 ∈ ℝr3 ×n3 yields a tensor
A3 of size n1 × n2 × r3 ,

A ×3 M3 = A3 ∈ ℝn1 ×n2 ×r3 .

– Contraction of tensor A3 in mode ℓ = 2 with the matrix M2 ∈ ℝr2 ×n2 yields a tensor
A2 of size n1 × r2 × r3 ,

A3 ×2 M2 = A2 ∈ ℝn1 ×r2 ×r3 .

– Contraction in the mode 1 with the matrix M1 ∈ ℝr1 ×n1 yields the tensor A1 ∈
ℝr1 ×r2 ×r3 ,

A2 ×1 M1 = A1 ∈ ℝr1 ×r2 ×r3 .

As a result of all contractions, the original tensor A is represented in the basis given
by matrices M1 , M2 , and M3 .

3 Here the sign “×ℓ ” denotes contraction over the mode number ℓ.
2.2 Introduction to multilinear algebra | 23

Figure 2.5: A sequence of contracted products in all three modes of a tensor A with the correspond-
ing matrices M3 , M2 , and M1 .

2.2.2 Canonical and Tucker tensor formats

As we mentioned in the previous section, the number of entries in a full format tensor
grows exponentially in dimension d.
To get rid of exponential scaling in the dimension, we are interested in the rank-
structured representations of tensors. The simplest rank-structured tensor is con-
structed by tensor product of vectors u(ℓ) = {u(ℓ) }n ∈ ℝnℓ , which forms the canonical
iℓ iℓ =1
rank-1 tensor

A ≡ [ui ]i∈ℐ = u(1) ⊗ ⋅ ⋅ ⋅ ⊗ u(d) ∈ 𝕍n ,

with entries given by ui = u(1)


i1
⋅ ⋅ ⋅ u(d)
id
. Notice that a rank-1 tensor requires only dn num-
bers to store it (now linear scaling in the dimension). Moreover, the scalar product of
two rank-1 tensors U, V ∈ 𝕍n is a product of d componentwise univariate scalar prod-
24 | 2 Rank-structured formats for multidimensional tensors

ucts
d
⟨U, V⟩ := ∏⟨u(ℓ) , v(ℓ) ⟩,
ℓ=1

which can be calculated in O(dn) operations. Recall that for d = 2, the tensor product
of two vectors, u ∈ ℝI and v ∈ ℝJ , represents a rank-1 matrix (see also equation (2.1)
in Section 2.1),

u ⊗ v = uvT ∈ ℝI×J .

An analogue of a rank-1 tensor is a separable multivariate function f (x1 , x2 , . . . , xd ) ∈


ℝd , which can be presented as a product of univariate functions,

f (x1 , x2 , . . . , xd ) = f1 (x1 )f2 (x2 ) ⋅ ⋅ ⋅ fd (xd ),

where fℓ (xℓ ) are functions over the single variable xℓ , ℓ = 1, 2, . . . , d. A well-known


example is the multivariate Gaussian,
2 2 2 2
f (x1 , x2 , . . . , xd ) = e−(α1 x1 +⋅⋅⋅+αd xd ) = e−α1 x1 ⋅ ⋅ ⋅ e−αd xd .

In what follows, we consider the rank-structured representation of higher-order ten-


sors based on sums of rank-1 tensors. There are two basic rank-structured tensor for-
mats frequently used in multilinear algebra.

Definition 2.3. The canonical tensor format: Given a rank parameter R ∈ ℕ, we denote
by 𝒞 R ⊂ 𝕍n a set of tensors that can be represented in the canonical format,

R
U = ∑ ξν u(1) (d)
ν ⊗ ⋅ ⋅ ⋅ ⊗ uν , ξν ∈ ℝ, (2.13)
ν=1

with normalized vectors u(ℓ) ν ∈ 𝕍ℓ (ℓ = 1, . . . , d). The minimal parameter R in represen-


tation (2.13) is called the rank (or canonical rank) of a tensor.

The storage for a tensor in the canonical format is dRn ≪ nd . Figure 2.6 visualizes
a canonical tensor in 3D.
Note that an analogue of the canonical tensor is the representation of a multivari-
ate function f (x1 , x2 , . . . , xd ) ∈ ℝd by a sum of R separable functions:

R
f (x1 , x2 , . . . , xd ) = ∑ f1,k (x1 )f2,k (x2 ) ⋅ ⋅ ⋅ fd,k (xd ),
k=1

where fℓ,k (xℓ ) are functions over the single variable xℓ , ℓ = 1, 2, . . . , d.


Introducing the side matrices corresponding to representation (2.13),
nℓ ×R
U (ℓ) = [u(ℓ)
1 ⋅⋅⋅ u(ℓ)
R ]∈ℝ ,
2.2 Introduction to multilinear algebra | 25

Figure 2.6: Visualizing canonical tensor decomposition of a third-order tensor.

and the diagonal tensor ξ := diag{ξ1 , . . . , ξR } such that ξν1 ,...,νd = 0 except when ν1 =
⋅ ⋅ ⋅ = νd with ξν,...,ν = ξν (ν = 1, . . . , R), we obtain the equivalent contracted product
representation of the rank-R canonical tensor,

U = ξ ×1 U (1) ×2 U (2) ⋅ ⋅ ⋅ ×d U (d) . (2.14)

The canonical tensor representation is helpful for the multilinear tensor operations. In
Section 2.2.4 it is shown that the bilinear tensor operations with tensors in the rank-R
canonical format have linear complexity

d
O( ∑ nℓ ), or O(dRn) if nℓ = n,
ℓ=1

with respect to both the univariate grid size n of a tensor and the dimension parame-
ter d. The disadvantage of this representation is lack of fast and stable algorithms for
the best approximation of arbitrary tensors in the fixed-rank canonical format.
The other commonly used tensor format, introduced by Tucker [284], is the
rank-(r1 , . . . , rd ) Tucker tensor format. It is based on a representation in subspaces

d
𝕋r := ⨂ 𝕋ℓ of 𝕍n for certain 𝕋ℓ ⊂ 𝕍ℓ
ℓ=1

with fixed dimension parameters rℓ := dim 𝕋ℓ ≤ n.

Definition 2.4. The Tucker tensor format: For given rank parameter r = (r1 , . . . , rd ), we
denote by 𝒯 r the subset of tensors in 𝕍n represented in the Tucker format

r1 rd
A = ∑ ⋅ ⋅ ⋅ ∑ βν1 ,...,νd v(1) (d)
ν1 ⊗ ⋅ ⋅ ⋅ ⊗ vνd ∈ 𝕍n , (2.15)
ν1 =1 νd =1

Iℓ
with some vectors v(ℓ)
νℓ ∈ 𝕍ℓ = ℝ (1 ≤ νℓ ≤ rℓ ), which form an orthonormal basis of
r
rℓ -dimensional subspaces 𝕋ℓ = span{v(ℓ)
ν }ν=1 (ℓ = 1, . . . , d).

26 | 2 Rank-structured formats for multidimensional tensors

Figure 2.7: Visualizing the Tucker decomposition for a 3D tensor.

The coefficients tensor β = [βν1 ,...,νd ], which is an element of a tensor space

𝔹r = ℝr1 ×⋅⋅⋅×rd , (2.16)

is called the core tensor. We call the parameter r = minℓ {rℓ } the minimal Tucker rank.
Figure 2.7 visualizes a Tucker tensor decomposition of a tensor A in ℝn1 ×n2 ×n3 .
Note that for problems in signal processing or principal component analysis, some
of the mode sizes of the core tensor, i. e., the Tucker rank rℓ , may be close to the original
tensor size nℓ in the corresponding mode.
Introducing the (orthogonal) side matrices V (ℓ) = [v(ℓ) 1 ⋅ ⋅ ⋅ v(ℓ)
rℓ ] such that
T
V (ℓ) V (ℓ) = Irℓ ×rℓ , we then use a tensor-by-matrix contracted product notation to
represent the Tucker decomposition of A(r) ∈ 𝒯 r in a compact form,

A(r) = β ×1 V (1) ×2 V (2) ⋅ ⋅ ⋅ ×d V (d) . (2.17)

Remark 2.5. Notice that the representation (2.17) is not unique, since the tensor A(r) is
invariant under directional rotations. In fact, for any set of orthogonal rℓ × rℓ matrices
Yℓ (ℓ = 1, . . . , d), we have the equivalent representation

A(r) = β
̂× V
1
̂ (1) ×2 V
̂ (2) ⋅ ⋅ ⋅ ×d V
̂ (d) ,

with

β
̂ = β × Y × Y ⋅⋅⋅ × Y ,
1 1 2 2 d d
̂ (ℓ) = V (ℓ) Y T ,
V ℓ = 1, . . . , d.

2.2 Introduction to multilinear algebra | 27

r
Remark 2.6. If the subspaces 𝕋ℓ = span{vν(ℓ) }ν=1

⊂ 𝕍ℓ are fixed, then the approxima-
tion A(r) ∈ 𝒯 r of a given tensor A ∈ 𝕍n is reduced to the orthogonal projection of A
onto the particular linear space 𝕋r = ⨂dℓ=1 𝕋ℓ ⊂ 𝒯 r,n , that is,
r
A(r) = ∑ ⟨vν(1)
1
⊗ ⋅ ⋅ ⋅ ⊗ vν(d)
d
, A⟩vν(1)
1
⊗ ⋅ ⋅ ⋅ ⊗ vν(d)
d
ν1 ,...,νd =1
T T
= (A ×1 V (1) ×2 ⋅ ⋅ ⋅ ×d V (d) ) ×1 V (1) ×2 . . . ×d V (d) .

This property plays an important role in the computation of the best orthogonal Tucker
approximation, where the “optimal” subspaces 𝕋ℓ are recalculated within a nonlinear
iteration process.

In the following, to simplify the discussion of complexity issues, we assume that


rℓ = r (ℓ = 1, . . . , d). The storage requirements for the Tucker decomposition are esti-
mated by r d + drn, where usually r is noticeably smaller than n. In turn, the maximal
canonical rank of the Tucker representation is bounded by r d−1 (see Remark 3.17).

2.2.3 Tucker tensor decomposition for full format tensors

The Tucker approximation of dth-order tensors is the higher-order extensions of the


best rank-r matrix approximation in linear algebra, based on the truncated SVD.
Since the subset of Tucker tensors 𝒯 r,n is not linear space the best Tucker approx-
imation problem leads to the challenging nonlinear minimization problem

A0 ∈ 𝒮0 ⊂ 𝕍n : f (A) := ‖A0 − A‖2 → min (2.18)

over all tensors A ∈ 𝒮 = {𝒯 r,n }. Here, 𝒮0 might be the set of Tucker or CP tensors with
the rank parameter substantially larger than r.
As the basic nonlinear approximation scheme, we consider the best orthogonal
rank-(r1 , . . . , rd ) Tucker approximation for the full format input, corresponding to the
choice 𝒮0 = 𝒯 r,n . Tensors A ∈ 𝒯 r , are parameterized as in (2.17), with the orthogonal-
ity constraints

V (ℓ) ∈ 𝒱nℓ ,rℓ (ℓ = 1, . . . , d),

where
n×r
𝒱n,r := {Y ∈ ℝ : Y T Y = Ir×r ∈ ℝr×r } (2.19)

is the so-called Stiefel manifold of n × r orthogonal matrices. This minimization prob-


lem on the product of Stiefel manifolds was first addressed in [197].
In the following, we denote by 𝒢ℓ the Grassman manifold, that is, the factor space
of the Stiefel manifold 𝒱nℓ ,rℓ (ℓ = 1, . . . , d) in (2.19), with respect to all possible rotations.
See Remark 2.5.
28 | 2 Rank-structured formats for multidimensional tensors

The key point for the efficient solution of the minimization problem (2.18) over
tensor manifold 𝒮 = 𝒯 r,n is its equivalent reformulation as the dual maximization
problem [60],
󵄩 r 󵄩2
[Z (1) , . . . , Z (d) ] = argmax󵄩󵄩󵄩[⟨vν(1)
1
⊗ ⋅ ⋅ ⋅ ⊗ vν(d)
d
, A⟩]ν=1 󵄩󵄩󵄩𝔹 (2.20)
r

over the set of side-matrices V (ℓ) = [v1(ℓ) ⋅ ⋅ ⋅ vr(ℓ)



] in the Stiefel manifold 𝒱nℓ ,rℓ , as in
(2.19).
The following lemma by De Lathauwer, De Moor, and Vandewalle [60] shows that
the minimization of the original quadratic functional is reduced to the dual maximiza-
tion problem, thus eliminating the core tensor β from the optimization process.

Lemma 2.7 ([60]). For given A0 ∈ ℝI1 ×⋅⋅⋅×Id , the minimization problem (2.18) on 𝒯 r is
equivalent to the dual maximization problem

󵄩 T T 󵄩2
g(V (1) , . . . , V (d) ) := 󵄩󵄩󵄩A0 ×1 V (1) ×2 ⋅ ⋅ ⋅ ×d V (d) 󵄩󵄩󵄩 → max (2.21)

over a set V (ℓ) ∈ ℝnℓ ×rℓ from the Grassman manifold, i. e., V (ℓ) ∈ 𝒢ℓ (ℓ = 1, . . . , d). For
given maximizing matrices Z (m) (m = 1, . . . , d), the core tensor β minimizing (2.18) is
represented by
T T
β = A0 ×1 Z (1) ×2 ⋅ ⋅ ⋅ ×d Z (d) ∈ ℝr1 ×⋅⋅⋅×rd . (2.22)

In view of Remark 2.5, the rotational non-uniqueness of the maximizer in (2.20)


can be avoided if one solves this maximization problem in the Grassmann manifold.
The dual maximization problem (2.21) posed on the compact manifold can be proven
to have at least one global maximum (see [161, 78]). For the size consistency of the
arising tensors, we require the natural compatibility conditions

rℓ ≤ rℓ̄ := r1 ⋅ ⋅ ⋅ rℓ−1 rℓ+1 ⋅ ⋅ ⋅ rd , ℓ = 1, . . . , d. (2.23)

The best (nonlinear) Tucker approximation by solving the dual maximization problem
(2.20) is usually solved numerically by the ALS iteration, combined with the so-called
higher order SVD (HOSVD), introduced by De Lathauwer et al. in [61] and [60], respec-
tively. We recall the theorem from [61].

Theorem 2.8 (dth-order SVD, HOSVD, [61]). Every real (complex) n1 ×n2 ×⋅ ⋅ ⋅×nd -tensor
A can be written as the product

A = 𝒮 ×1 V (1) ×2 V (2) ⋅ ⋅ ⋅ ×d V (d) , (2.24)

in which
(1) V (ℓ) = [V1(ℓ) V2(ℓ) ⋅ ⋅ ⋅ Vn(ℓ)

] is a unitary nℓ × nℓ -matrix;
(2) 𝒮 is a complex n1 × n2 × ⋅ ⋅ ⋅ × nd -tensor of which the subtensors 𝒮iℓ =α , obtained by
fixing the ℓth index to α, have the following properties:
2.2 Introduction to multilinear algebra | 29

(i) all-orthogonality: two subtensors 𝒮iℓ =α and 𝒮iℓ =β are orthogonal for all possible
values of ℓ, α, and β subject to α ≠ β:

⟨𝒮iℓ =α , 𝒮iℓ =β ⟩ = 0 when α ≠ β

(ii) ordering: ‖𝒮iℓ =1 ‖ ≥ ‖𝒮iℓ =2 ‖ ≥ ⋅ ⋅ ⋅ ≥ ‖𝒮iℓ =nℓ ‖ ≥ 0 for all positive values of ℓ.

The Frobenius norms ‖𝒮iℓ =i ‖, symbolized by σi(ℓ) , are ℓ-mode singular values of A(ℓ) and
the vector Ui(ℓ) is the ith ℓ-mode left singular vector of A(ℓ) .

Another theorem from [61] proves the error bound for the truncated HOSVD. It
states that for the HOSVD of A, as given in Theorem 2.8 with the ℓ-mode rank of A,
rank(A(ℓ) ) = Rℓ (ℓ = 1, . . . , d), the tensor à obtained by discarding the smallest ℓ-mode
singular values σr(ℓ)+1 , σr(ℓ)+2 , . . . , σR(ℓ) for given values of rℓ (ℓ = 1, . . . , d), (i. e., setting the
ℓ ℓ ℓ
corresponding parts of 𝒮 equal to zero) provides the following approximation error

R1 R2 Rd
̃ 2 ≤ ∑ σ (1) 2 + ∑ σ (2) 2 + ⋅ ⋅ ⋅ + ∑ σ (d) 2 .
‖A − A‖ i 1i 2 i d
i1 =r1 +1 i2 =r2 +1 id =rd +1

We refer to the original papers [60, 61] on the detailed discussions of above theory
which was an important step for applying tensor decompositions in scientific com-
puting.
Figure 2.8 illustrates the statements of above theorems by an example of a cubic
third-order tensor A. It shows the core tensor 𝒮 and the matrices V (1) , V (2) , and V (3)
from (2.24). The size of the core tensor 𝒮 is the same as the size of the original tensor A,
except that it is now represented in the orthogonal basis, given by matrices V1 , V2 , and,
V3 . The core tensor of the truncated HOSVD is colored by yellow.

Figure 2.8: Illustration to Theorem 2.8.

The orthogonality of subtensors 𝒮iℓ =α and 𝒮iℓ =β follows from the fact that these matri-
ces originate from reshaping of the orthogonal vectors in the matrix (W (ℓ) )T of the SVD
of the respective matrix unfolding of A for modes ℓ, ℓ = 1, 2, 3,
T
A(ℓ) = V (ℓ) Σ(ℓ) (W (ℓ) ) .
30 | 2 Rank-structured formats for multidimensional tensors

Note that matrices V (ℓ) , ℓ = 1, 2, 3, obtained as a result of the singular value decom-
position of the corresponding matrix unfolding of A for modes ℓ, ℓ = 1, 2, 3, initially
have the same size as the original tensor. Based on the truncated HOSVD, their size re-
duction can be performed taking into account the decay of singular values in Σ(ℓ) and
then discarding the smallest singular values subject to some threshold ε > 0. It corre-
sponds to the choice of first r (ℓ) vectors in V (ℓ) , as shown in Figure 2.8. The sizes r (ℓ)
may be different, depending on the chosen threshold and the structure of the initial
tensor A.
Next, we recall the Tucker decomposition algorithm for full format tensors, intro-
duced by De Lathauwer et al. in [60]. It is based on the initial guess by HOSVD and the
alternating least square (ALS) iteration.

Tucker decomposition algorithm for full format tensor (𝕍n → 𝒯 r,n )


Given the input tensor A ∈ 𝕍n , the Tucker rank r, and the maximum number of ALS
iterations kmax ≥ 1.
(1) Compute the truncated HOSVD of A to obtain an initial guess V0(ℓ) ∈ ℝnℓ ×rℓ for
the ℓ-mode side-matrices V (ℓ) (ℓ = 1, . . . , d) (“truncated” SVD applied to each ma-
trix unfolding A(ℓ) ). Figure 2.9 illustrates this step of the algorithm for a 3D tensor
(ℓ = 1, 2, 3).
(2) For k = 1 : kmax perform ALS iteration:
for each q = 1, . . . , d, and with fixed side-matrices Vk−1 (ℓ)
∈ ℝnℓ ×rℓ , ℓ ≠ q, the
ALS iteration optimizes the q-mode matrix Vk(q) via computing the dominating
rq -dimensional subspace (truncated SVD) for the respective matrix unfolding

B(q) ∈ ℝnq ×rq , rq̄ = r1 ⋅ ⋅ ⋅ rq−1 rq+1 ⋅ ⋅ ⋅ rd = O(r d−1 ),


̄
(2.25)

corresponding to the tensor obtained by the “single-hole” contracted product in


q-mode:

T T T T
B = A ×1 Vk(1) ×2 ⋅ ⋅ ⋅ ×q−1 Vk(q−1) ×q+1 Vk−1
(q+1) (d)
⋅ ⋅ ⋅ ×d Vk−1 . (2.26)

(3) Set V (ℓ) = Vk(ℓ) , and compute the core β as the representation coefficients of the
max
r
orthogonal projection of A onto 𝕋n = ⨂dℓ=1 𝕋ℓ , with 𝕋ℓ = span{v(ℓ)
ν }ν=1 (see Re-

mark 2.6),
T T
β = A ×1 V (1) ×2 ⋅ ⋅ ⋅ ×d V (d) ∈ 𝔹r .

The computational costs are the following: (1) the HOSVD cost is W = O(dnd+1 ); (2) the
costs of ALS procedure: each iteration has the cost O(dr d−1 n min{r d−1 , n}+dnd r), which
represents the expense of SVDs and the computation of matrix unfoldings B(q) . The
last step, i. e., computation of the core tensor, has the cost O(r d n).
2.2 Introduction to multilinear algebra | 31

Let us comment the Tucker decomposition algorithm from [61] for a third-order
tensor A ∈ ℝn1 ×n2 ×n3 by using Figures 2.9, 2.10, and 2.11. Set the Tucker ranks as r1 , r2 ,
and r3 , respectively.

Figure 2.9: The initial guess for the Tucker decomposition is computed by HOSVD via SVD of the
ℓ-mode unfolding matrices, ℓ = 1, 2, 3.

(1) At the first step, the truncated SVD is computed for three unfolding matrices A(1) ∈
ℝn1×n2 n3 , A(2) ∈ ℝn2×n3 n1 , and A(3) ∈ ℝn3×n1 n2 , as shown in Figure 2.9. Every SVD needs
O(n4 ) computer operations if we simplify nℓ = n. Thus, it is the most storage/time
consuming part of the algorithm.

Figure 2.10: Construction of a “single-hole” tensor by contractions.


32 | 2 Rank-structured formats for multidimensional tensors

Figure 2.11: Unfolding of a “single-hole” tensor.

(2) At the ALS iteration step of the scheme, the construction of the “single-hole” ten-
sors, given by (2.26), allows to reduce essentially the cost of computation of the best
mappings for the Tucker modes. The construction of a single-hole tensor for ℓ = 3
by contractions with matrices V (ℓ) for all ℓ, but one, is shown in Figure 2.10. As illus-
trated in Figure 2.11, the truncated SVD is performed for a tensor unfolding of much
smaller size, since it is already partially mapped into the Tucker projection subspaces
𝕋ℓ ∈ 𝒯 r,n , except the single mode ℓ = 1 from the original tensor space 𝕍n , for which
the mapping matrix is being updated. The ALS procedure is repeated kmax times for
every mode ℓ, ℓ = 1, . . . , d, of the tensor.
(3) At the last step of the algorithm, the core tensor is computed using contraction
of the original tensor with updated side matrices Vr(ℓ) ℓ
, ℓ = 1, . . . , d.
With fixed kmax , the overall complexity of the algorithm for d = 3, nℓ = n, and
rℓ = r, ℓ = 1, 2, 3, is estimated by

WF→T = O(n4 + n3 r + n2 r 2 + n3 r) = O(n4 ),

where different summands denote the cost of initial HOSVD of A, computation of un-
folding matrices B(q) , related SVDs, and computation of the core tensor, respectively.
Notice that the Tucker model applied to the general fully populated tensor of size
nd requires O(dnd+1 ) arithmetical operations, due to the presence of complexity domi-
nating HOSVD. Hence, in computational practice this algorithm applies only to small
d and moderate n.
We conclude that the ALS Tucker tensor decomposition algorithm poses severe
restriction on the size of available tensors. For example, for the conventional laptop
computers this is restricted to 3D tensors of size less than 2003 , which is not satisfac-
tory for real space calculations in quantum chemistry. This restriction will be avoided
for function-related tensors when using the multigrid Tucker tensor decomposition
discussed in Section 3.1.
2.2 Introduction to multilinear algebra | 33

2.2.4 Basic bilinear operations with rank-structured tensors

We have observed that the canonical and Tucker tensor formats provide representa-
tions by using sums of tensor product of vectors. Hence, the standard operations with
tensors are reduced to one-dimensional operations in corresponding dimensions, ex-
actly in the same way, as it is done for rank-structured matrices (see Section 2.1).
The main point here is the rank of the tensor, that is, the number of tensor product
summands. However, the separation rank parameter is hard to be controlled for ten-
sors containing unstructured or experimental data. Due to addition/multiplication of
ranks in every rank-structured operation, after several steps we may have a “curse of
ranks” instead of curse of dimensions. However, it will be shown in Chapter 3 that for
function related tensors things become different, due to their intrinsically low ε-ranks.
Moreover, for tensors approximating functions and operators, it is possible to provide
means for reducing their ranks after a sequence of tensor operations.
For the sake of clarity (and without loss of generality), in this section we assume
that r = rℓ , n = nℓ (ℓ = 1, . . . , d). If there is no confusion, the index n can be skipped. We
denote by W the complexity of various tensor operations (say, W⟨⋅,⋅⟩ ) or the related stor-
age requirements (say, Wst(β) ). We estimate the storage demands Wst and complexity of
the following standard tensor-product operations: the scalar product, the Hadamard
(component-wise) product, and the convolution transform. We consider the multilin-
ear operations in 𝒯 r,n and 𝒞 R,n tensor classes.
The Tucker model requires

Wst,T = drn + r d (2.27)

storage to represent a tensor. The storage for the rank-R canonical tensor scales lin-
early in d,

Wst,C = dRn. (2.28)

Setting R = αr with α ≥ 1, we can specify the range of parameters where the Tucker
model is less storage consuming compared with the canonical one

r d−1 ≤ d(α − 1)n (for d = 3 : r 2 ≤ 3(α − 1)n).

In general, the numerical Tucker decomposition leads to a fully populated core


tensor that is represented by r d nonzero elements. However, the special data struc-
ture of the Tucker core can be imposed, which reduces the complexity of the corre-
sponding tensor operations (cf. [161]). In particular, for the mixed (two-level) Tucker-
canonical decomposition, the core tensor is represented in the rank-R CP format (see
Definition 3.15), so that storage demands scale linearly in d,

Wst,TC = dr(n + R).


34 | 2 Rank-structured formats for multidimensional tensors

Bilinear operations in the Tucker format


For given tensors A1 ∈ 𝒯 r1 , A2 ∈ 𝒯 r2 represented in the form (2.15), i. e.,

r1 rd
A1 = ∑ ⋅ ⋅ ⋅ ∑ βν1 ,...,νd u(1) (d)
ν1 ⊗ ⋅ ⋅ ⋅ ⊗ uνd ∈ 𝕍n ,
ν1 =1 νd =1
r1 rd (2.29)
A2 = ∑ ⋅ ⋅ ⋅ ∑ ζμ1 ,...,νd v(1) (d)
μ1 ⊗ ⋅ ⋅ ⋅ ⊗ vμd ∈ 𝕍n ,
μ1 =1 μd =1

the scalar product (2.10) is computed by

r1 r2 d
⟨A1 , A2 ⟩ := ∑ ∑ βk1 ...kd ζm1 ...md ∏⟨u(ℓ)
k
, v(ℓ)
mℓ ⟩. (2.30)

k=1 m=1 ℓ=1

In fact, applying the definition of the scalar product in (2.10) to the rank-1 tensors (with
R = r = 1), we have

⟨A1 , A2 ⟩ := ∑ u(1) (d) (1) (d)


i ⋅ ⋅ ⋅ ui vi ⋅ ⋅ ⋅ vi
1 d 1 d
i∈ℐ
n1 nd d
= ∑ u(1) (1) (d) (d) (ℓ) (ℓ)
i vi ⋅ ⋅ ⋅ ∑ ui vi = ∏⟨u , v ⟩. (2.31)
1 1 d d
i1 =1 id =1 ℓ=1

Then, the above representation follows by combining all rank-1 terms in the left-hand
side in (2.30).
We further simplify and suppose that r = r1 = r2 = (r, . . . , r). The calculation in
(2.30) then includes dr 2 scalar products of vectors of size n plus r 2d multiplications,
leading to the overall complexity

W⟨⋅,⋅⟩ = O(dnr 2 + r 2d ),

whereas for calculation of the respective tensor norm, the second term reduces to
O(r d ).
Note that in the case of mixed Tucker-canonical decomposition (see Definition
3.15), the scalar product can be computed in O(R2 + dr 2 n + dR2 r) operations (cf. [161],
Lemma 2.8).
For given tensors A, B ∈ ℝℐ , the Hadamard product A ⊙ B ∈ ℝℐ of two tensors of
the same size ℐ is defined by the componentwise product,

(A ⊙ B)i = ai ⋅ bi , i ∈ ℐ.

Hence, for A1 , A2 ∈ 𝒯 r , as in (2.29), we tensorize the Hadamard product by


r r
A1 ⊙ A2 := ∑ ⋅ ⋅ ⋅ ∑ βk1 ...kd ζm1 ...md (u(1)
k
⊙ v(1) (d) (d)
m1 ) ⊗ ⋅ ⋅ ⋅ ⊗ (uk ⊙ vmd ). (2.32)
1 d
k1 ,m1 =1 kd ,md =1
2.2 Introduction to multilinear algebra | 35

Again, applying definition (2.10) to the rank-1 tensors (with β = ζ = 1), we obtain

(A1 ⊙ A2 )i =(u(1) (1) (d) (d)


i vi ) ⋅ ⋅ ⋅ (ui vi ), i ∈ ℐ,
1 1 d d
(1) (1) (d)
A1 ⊙ A2 =(u ⊙ v ) ⊗ ⋅ ⋅ ⋅ ⊗ (u ⊙ v(d) ). (2.33)

Then, (2.32) follows by summation over all rank-1 terms in A1 ⊙A2 . Relation (2.32) leads
to the storage requirement

Wst(⊙) = O(dr 2 n + r 2d ),

which includes the memory size for d modes n × r × r Tucker vectors, and for the new
Tucker core of size (r 2 )d .
Summation of two tensors is performed by concatenation of the side matrices,
their orthogonalization and recomputation of the Tucker core.

Summary on tensor operations in rank-R canonical format


We consider tensors A1 , A2 , represented in the rank-R canonical format (2.13):
R1 R2
A1 = ∑ ck u(1)
k
⊗ ⋅ ⋅ ⋅ ⊗ u(d)
k
, A2 = ∑ bm v(1) (d)
m ⊗ ⋅ ⋅ ⋅ ⊗ vm , (2.34)
k=1 m=1

nℓ
with normalized vectors u(ℓ) k
, v(ℓ)
m ∈ ℝ . For simplicity of discussion, we assume that
nℓ = n, ℓ = 1, . . . , d. We have
(1) A sum of two canonical tensors, given by (2.34), can be written as
R1 R2
A1 + A2 = ∑ ck u(1)
k
⊗ ⋅ ⋅ ⋅ ⊗ u(d)
k
+ ∑ bm v(1) (d)
m ⊗ ⋅ ⋅ ⋅ ⊗ vm , (2.35)
k=1 m=1

resulting in the canonical tensor with the rank, at most, RS = R1 + R2 . This operation
has no cost since it is simply a concatenation of side matrices.
(2) For given canonical tensors A1 , A2 , the scalar product (2.10) is computed by
(see (2.31))
R1 R2 d
⟨A1 , A2 ⟩ := ∑ ∑ ck bm ∏⟨u(ℓ)
k
, v(ℓ)
m ⟩. (2.36)
k=1 m=1 ℓ=1

Calculation of (2.36) includes R1 R2 scalar products of vectors in ℝn , leading to the over-


all complexity
W⟨⋅,⋅⟩ = O(dnR1 R2 ).

(3) For A1 , A2 given by (2.34), we tensorize the Hadamard product by (see (2.33))
R1 R2
A1 ⊙ A2 := ∑ ∑ ck bm (u(1)
k
⊙ v(1) (d) (d)
m ) ⊗ ⋅ ⋅ ⋅ ⊗ (uk ⊙ vm ). (2.37)
k=1 m=1

The complexity of this operation is estimated by O(dnR1 R2 ).


Convolution of tensors will be considered in Section 5.
3 Rank-structured grid-based representations
of functions in ℝd
3.1 Super-compression of function-related tensors
In the numerical solution of multidimensional problems, it is advantageous to have
separable representations of multivariate functions, since then all calculations are re-
duced to operations with univariate functions. A well-known example is a Gaussian
function, which can be represented as a product of one-dimensional Gaussians. How-
ever, the multivariate functions can loose their initial separability after undergoing
nonlinear (integral) transformations involved in multidimensional PDEs, leading to
cumbersome error-prone calculation schemes.
When solving the d-dimensional PDEs by using the standard finite difference or fi-
nite element methods via the grid-based representation of the multidimensional func-
tions and operators, the storage and, consequently, the number of operations scale
exponentially with respect to the dimension size d. This redundancy of the grid rep-
resentation in conventional numerical methods can be only mildly released for low-
dimensional problems when using mesh refinement in finite element approaches or
sparse grids methods.
In the previous section, it was shown that there are algebraically separable rep-
resentations of the multidimensional arrays by using the Tucker or canonical tensor
decomposition. The question is how to compute such representations in numerical
analysis of multidimensional PDEs? Another question is concerned with the numer-
ical efficiency of the tensor approach and the possibilities to reduce the separation
rank parameters, adapting them to the approximation threshold.
The sinc-quadrature based canonical approximation to analytic functions and
certain operator-valued functions have been analyzed in [94, 91, 111, 112, 162, 161, 166].
The related results in approximation theory can be found in [5, 89, 90, 114, 117, 118].
However, this kind of canonical-type approximation in the explicit analytic form is
limited to spherically symmetric functions while algebraic canonical approximation
algorithms suffer from slow and unstable convergence. In 2006, it was proven by
Boris Khoromskij that for some classes of function related tensors, the Tucker ap-
proximation error via the minimization (2.18), decays exponentially in the Tucker
rank [161]:

‖A(r) − A0 ‖ ≤ Ce−αr with r ̂ = min rℓ , (3.1)


̂

where A(r) is a minimizer in (2.18). As a consequence, the approximation error ε > 0


can be achieved with the moderate rank parameter

r ̂ = O(|log ε|).

https://doi.org/10.1515/9783110365832-003
38 | 3 Rank-structured grid-based representations of functions in ℝd

In this section, we demonstrate that, for a wide class of function-related tensors, the
Tucker decomposition provides a separable approximation with exponentially fast
decay of the error with respect to the rank parameter.
The particular properties of the Tucker decomposition for function-related ten-
sors led to invention of the multigrid Tucker decomposition method, which allows one
to reduce numerical complexity dramatically. Moreover, for function-related tensors,
the novel canonical-to-Tucker (C2T) transform and the reduced higher-order singular-
value decomposition (RHOSVD) were developed in [174], which made a tremendous
impact on the evolution of tensor numerical methods. C2T transform provides a stable
algorithm for reducing a large canonical tensor rank arising in the course of bilinear
matrix–tensor and tensor–tensor operations.
The C2T algorithm has pushed forward the grid-based numerical methods for cal-
culating the 3D integral convolution operators for functions with multiple singulari-
ties [174], providing the accuracy level comparable with the analytical evaluation of
the same integrals. In turn, the RHOSVD applies to the tensor in the canonical form
(say, resulting from certain algebraic transforms or analytic approximations), and it
does not need building a full tensor for the Tucker decomposition. Indeed, it is enough
to find the orthogonal Tucker basis only for directional matrices of the canonical ten-
sor, which consists of skeleton vectors in every single dimension [174]. Presumably,
the invention of the RHOSVD and the C2T algorithm anticipated development of the
tensor formats avoiding the “curse of dimensionality”.
We conclude that coupling the multi-linear algebra of tensors and the nonlinear
approximation theory resulted in the tensor-structured numerical methods for mul-
tidimensional PDEs. The prior results on theory of tensor-product approximation of
multivariate functions and operators [94, 91, 111, 161] were significant prerequisites for
understanding and developing the tensor numerical methods. First, we sketch some
of the basic results in approximation theory.

3.1.1 Prediction of approximation theory: O(log n) ranks

In the following, we choose the set 𝒮 of rank-structured (formatted) tensors within the
above defined tensor classes and call the elements in 𝒮 as 𝒮 -tensors.
To perform computation in the low-parametric tensor formats (say, in the course of
rank-truncated iteration), we need to perform a nonlinear “projection” of the current
iterand onto 𝒮 . This action is fulfilled by using the tensor truncation operator T𝒮 :
𝕍n,d → 𝒮 defined by

A0 ∈ 𝒮0 ⊂ 𝕍n,d : T𝒮 A0 = argmin ‖A0 − T‖𝕍n , (3.2)


T∈𝒮

which is a challenging nonlinear approximation problem. In practice, the computa-


tion of the minimizer T𝒮 A0 can be performed only approximately. The replacement of
3.1 Super-compression of function-related tensors | 39

A0 by its approximation in the tensor class 𝒮 is called the tensor truncation to 𝒮 and
is denoted by T𝒮 A0 .
There are analytic and algebraic methods of approximate solution to the problem
(3.2) applicable to different classes of rank-structured tensors 𝒮 . The target tensor may
arise, in particular, as the grid-based representation of regular enough functions, say,
solutions of PDEs or some classical Green’s kernels. The storage and numerical com-
plexity for the elements in 𝒮 are strictly determined by the rank parameters involved
in the parametrization within given tensor format. In view of the relation r ̂ = O(|log ε|)
between the Tucker rank and the corresponding approximation error ‖A0 −T‖𝕍n , which
is the case for the wide class of function-related tensors [161], one may expect in the
PDE-related applications the O(log n) ranks asymptotic in the univariate mode size
of the n⊗d tensors living on n × ⋅ ⋅ ⋅ × n tensor grid. Our experience in the numerical
simulations in electronic structure calculations confirms this hypothesis.
Such optimistic effective rank bounds justify the benefits of tensor numerical
methods in large-scale scientific computing, indicating that these methods are not
just heuristic tools but rigorously approved techniques.

3.1.2 Analytic methods of separable approximation of multivariate functions and


operators

In what follows, we discuss the low rank approximation of a special class of higher-
order tensors, also called function-related tensors (FRTs), obtained by sampling the
multi-variate function over n × ⋅ ⋅ ⋅ × n tensor grid in ℝd . These data directly arise from:
(a) A separable approximation of multi-variate functions;
(b) Nyström/collocation/Galerkin discretization of integral operators with the Green’s
kernels;
(c) The tensor-product approximation of some analytic matrix-valued functions.

The constructive analytic approximation methods are based on sinc-quadrature rep-


resentations for analytic functions [271, 215]. These techniques apply, in particular, to
the class of Green’s kernels (the Poisson, Yukawa, Helmholtz potentials), cf. [220, 113],
to certain kernel functions arising in the Boltzmann equation [162], in electronic struc-
ture calculations [122, 30, 174, 187], and to correlation functions in the construction of
the Karhünen–Loéve expansion [266, 182, 185], as well as in multidimensional data
analysis [71, 206].

Error estimate for tensor approximation of analytic generating function


In the following, we define FRTs corresponding to collocation-type discretization, see
[113]. The Nyström and Galerkin approximations to function-related tensors have been
discussed in [111, 161].
40 | 3 Rank-structured grid-based representations of functions in ℝd

Given the function g : Ω := Π1 × ⋅ ⋅ ⋅ × Πd → ℝ with Πℓ = Π = [a, b]p and p = 1, 2, 3,


for ℓ = 1, . . . , d. Define the univariate grid-size n ∈ ℕ and the mesh-size h = (b − a)/n.
We denote by {xi(1) , . . . , xi(d) } a set of collocation points living in the midpoints of the
1 d
tensor grid with mesh-size h, where iℓ = (iℓ,1 , . . . , iℓ,p ) ∈ ℐℓ := Iℓ,1 × ⋅ ⋅ ⋅ × Iℓ,p (ℓ = 1, . . . , d).
Here we have iℓ,m ∈ In := {1, . . . , n} (m = 1, . . . , p).
We consider the case d ≥ 2 with some fixed p ∈ {1, 2, 3}. In particular, the case of
functions in ℝd is treated with p = 1, whereas the matrix (operator) decompositions
correspond to the choice p = 2. In the latter case, we introduce the reordered index set
of pairs

ℳℓ := {mℓ : mℓ = (iℓ , jℓ ), iℓ , jℓ ∈ In } (ℓ = 1, . . . , d)

such that ℐ = ℳ1 × ⋅ ⋅ ⋅ × ℳd with ℳℓ = In × In .


Here we follow [113] and focus on the collocation-type schemes that are based on
tensor-product ansatz functions
d
i
ψi (y1 , . . . , yd ) = ∏ ψℓℓ (yℓ ), i = (i1 , . . . , id ) ∈ ℐ1 × ⋅ ⋅ ⋅ × ℐd , yℓ ∈ Πℓ . (3.3)
ℓ=1

Definition 3.1 (FRT by collocation). Let p = 2. Given the function g : ∈ ℝpd → ℝ and
tensor-product basis set (3.3), we introduce the coupled variable ζi(ℓ) := (xi(ℓ) , yℓ ) in-
ℓ ℓ

cluding the collocation point xi(ℓ) and yℓ ∈ Π, the pair mℓ := (iℓ , jℓ ) ∈ ℳℓ and define

the collocation-type dth-order FRT by A ≡ A(g) := [am1 ...md ] ∈ ℝℳ1 ×⋅⋅⋅×ℳd with the
tensor entries

am1 ...md := ∫ g(ζi(1) , . . . , ζi(d) )ψj (y1 , . . . , yd )dy, mℓ ∈ ℳℓ . (3.4)


1 d
Ω

In the case p = 1, we simplify to ζi(ℓ) = yℓ , mℓ := jℓ .


d

The key observation is that there is a natural duality between separable approx-
imation of the multivariate generating function g and the tensor-product decompo-
sition of the related multidimensional array A(g). As result, the canonical decompo-
sitions of A(g) can be derived by using a corresponding separable expansion of the
generating function g (see [111, 116] for more details).

Lemma 3.2 ([113]). Suppose that a multivariate function g : Ω ⊂ ℝpd → ℝ can be accu-
rately approximated by a separable expansion
R
gR (ζ ) := ∑ μk Φ(1)
k
(ζ (1) ) ⋅ ⋅ ⋅ Φ(d)
k
(ζ (d) ) ≈ g(ζ ), ζ = (ζ (1) , . . . , ζ (d) ) ∈ ℝpd , (3.5)
k=1

where μk ∈ ℝ and Φℓk : Π ⊂ ℝ2 → ℝ. Introduce the canonical decomposition of A(g) via


A(R) := A(gR ) (cf. Definition 3.1) where the canonical skeleton vectors are defined by

j
Vk(ℓ) = {∫ Φ(ℓ)
k
(ζi(ℓ) )ψℓ (yℓ )dyℓ } ∈ ℝℐℓ ×𝒥ℓ , ℓ = 1, . . . , d, k = 1, . . . , R. (3.6)
(i,j)∈ℳℓ
3.1 Super-compression of function-related tensors | 41

Then the FRT A(R) approximates A(g) with the error estimated by
󵄩󵄩 󵄩
󵄩󵄩A(g) − A(R) (gR )󵄩󵄩󵄩∞ ≤ C‖g − gR ‖L∞ (Ω) .

Though in general a decomposition (3.5) with small separation rank R is a compli-


cated numerical task, in many interesting applications efficient approximation meth-
ods are available. In particular, for a class of multivariate functions (say, for radial
basis functions in ℝd ), it is possible to obtain a dimension independent bound on the
separation rank R = 𝒪(log n|log ε|), e. g., based on sinc-quadrature methods or a di-
rect approximation by exponential sums (see examples in [39, 40, 111, 161, 206]).
Next, we discuss the constructive canonical and Tucker tensor decomposition of
FRTs applied to a general class of analytic generating functions represented in terms
of their generalized Laplace transform.

sinc-quadrature approximation in the Hardy space


We use constructive approximation, based on the classical sinc-quadrature methods.
For the readers convenience, we recall the well-known approximation results by the
sinc-methods (cf. [271, 92, 96, 95]). Recall that the Hardy space H 1 (Dδ ) is defined as the
set of all complex-valued functions f that are analytic in the strip

Dδ := {z ∈ ℂ : |ℑm z| < δ}

and such that


󵄨 󵄨 󵄨 󵄨 󵄨 󵄨
N(f , Dδ ) := ∫ 󵄨󵄨󵄨f (z)󵄨󵄨󵄨|dz| = ∫(󵄨󵄨󵄨f (x + iδ)󵄨󵄨󵄨 + 󵄨󵄨󵄨f (x − iδ)󵄨󵄨󵄨)dx < ∞.
𝜕Dδ ℝ

Given f ∈ H 1 (Dδ ), the step size of the quadrature h > 0, and M ∈ ℕ0 , the corresponding
(2M + 1)-point sinc-quadrature approximating the integral ∫ℝ f (ξ )dξ reads

M
TM (f , h) := h ∑ f (kh) ≈ ∫ f (ξ )dξ . (3.7)
k=−M ℝ

Proposition 3.3. Let f ∈ H 1 (Dδ ), h > 0, and M ∈ ℕ0 be given. If

|f (ξ )| ≤ C exp(−b|ξ |) for all ξ ∈ ℝ with b, C > 0, (3.8)

then the quadrature error satisfies


󵄨󵄨 󵄨󵄨
󵄨󵄨 󵄨 −√2πδbM
󵄨󵄨∫ f (ξ )dξ − TM (f , h)󵄨󵄨󵄨 ≤ Ce with h = √2πδ/bM
󵄨󵄨 󵄨󵄨

with a positive constant C depending only on f , δ, b (cf. [271]).


42 | 3 Rank-structured grid-based representations of functions in ℝd

If f possesses the hyper-exponential decay


󵄨󵄨 󵄨 a|ξ |
󵄨󵄨f (ξ )󵄨󵄨󵄨 ≤ C exp(−be ) for all ξ ∈ ℝ with a, b, C > 0, (3.9)

then the choice h = log( 2πaM


b
)/(aM) leads to the improved error bound (cf. [95])

󵄨󵄨 󵄨󵄨
󵄨󵄨 󵄨 −2πδaM/ log(2πaM/b)
󵄨󵄨∫ f (ξ )dξ − TM (f , h)󵄨󵄨󵄨 ≤ CN(f , Dδ )e .
󵄨󵄨 󵄨󵄨

Note that 2M + 1 is the number of quadrature/interpolation points. If f is an even


function, then this number reduces to M + 1.

Error bounds for canonical and Tucker decomposition


Following [113], we consider a class of multivariate functions g : ℝd → ℝ parameter-
ized by g(ζ ) = G(ρ(ζ )) ≡ G(ρ) with

ρ ≡ ρ(ζ ) = ρ1 (ζ (1) ) + ⋅ ⋅ ⋅ + ρd (ζ (d) ) > 0, ρℓ : ℝ2 → ℝ+ ,

where the univariate function G : ℝ+ → ℝ is supposed to be represented via the Lap-


lace transform

G(ρ) = ∫ 𝒢 (τ)e−ρτ dτ. (3.10)


ℝ+

Consider the FRT approximation corresponding to p = 2, ζ (ℓ) = (xℓ , yℓ ) (cf. Defini-


tion 3.1). Without loss of generality, we introduce one and the same scaling function

ψi (⋅) = ψ(⋅ + (i − 1)h), i ∈ In , (3.11)

for all spatial dimensions ℓ = 1, . . . , d, where h > 0 is the mesh parameter of the spacial
grid. For the ease of exposition, we simplify further and set ρ ≡ ρ(ζ ) = ∑dℓ=1 ρ0 (ζ (ℓ) ),
i. e., ρℓ = ρ0 (xℓ , yℓ ) (ℓ = 1, . . . , d) with ρ0 : [a, b]2 → ℝ+ . For i ∈ In , let {x̄i } be the set of
cell-centered collocation points on the univariate grid of step size h in [a, b]. For each
i, j ∈ In , we introduce the parameter dependent integral

Ψi,j (τ) := ∫ e−ρ0 (xi ,y)τ ψ(y + (j − 1)h)dy, τ ≥ 0, (3.12)


̄

ℝ2

where τ is the integration variable in (3.10).

Theorem 3.4 (FRT approximation [113]). Assume that:


(a) The Laplace transform 𝒢 (τ) in (3.10) has an analytic extension 𝒢 (w), w ∈ Ω𝒢 , into a
certain domain Ω𝒢 ⊂ ℂ that can be transformed by a conformal map onto the strip
Dδ such that w = φ(z), z ∈ Dδ , and φ−1 : Ω𝒢 → Dδ ;
3.1 Super-compression of function-related tensors | 43

(b) for all (i, j) ∈ ℐ × 𝒥 , the transformed integrand

d
f (z) := φ󸀠 (z)𝒢 (φ(z)) ∏ Ψiℓ jℓ (φ(z)) (3.13)
ℓ=1

belongs to the Hardy space H 1 (Dδ ) with N(f , Dδ ) < ∞ uniformly in (i, j);
(c) the function f (t), t ∈ ℝ, in (3.13) has either exponential (c1) or hyper-exponential
(c2) decay as t → ±∞ (see Proposition 3.3).

Under the assumptions (a)–(c), we have that, for each M ∈ ℕ, the FRT A(g), defined
on [a, b]d , allows an exponentially convergent symmetric1 canonical approximation
A(R) ∈ 𝒞 R with Vk(ℓ) as in (3.6), where the expansion (3.5) is obtained by substitution of f
from (3.13) into the sinc-quadrature (3.7) such that we have

󵄩󵄩 󵄩 −αM ν
󵄩󵄩A(g) − A(R) 󵄩󵄩󵄩∞ ≤ Ce with R = 2M + 1, (3.14)

1 2πδb
where ν = 2
and α = √2πδb in case (c1), and ν = 1 and α = log(2πaM/b)
in case (c2).

Theorem 3.4 proves the existence of the canonical decomposition of the FRT A(g)
with the Kronecker rank r = 𝒪(|log ε| log 1/h) (in case (c2)) or r = 𝒪(log2 ε) (in case
(c1)), which provide an approximation of order 𝒪(ε). In our applications, we usually
have 1/h = 𝒪(n), where n is the number of grid-points in one spacial direction. The-
orem 5.12 applies to translation invariant or spherically symmetric (radial) functions,
in particular, to the classical Newton, Yukawa, Helmholtz, and Slater-type kernels

1 e−λ‖x−y‖ cos(λ‖x − y‖)


, , , and e−λ‖x−y‖ ,
‖x − y‖ ‖x − y‖ ‖x − y‖

where x, y ∈ ℝ3 , λ > 0; see [111] for the case of Newton kernel. We refer to [163, 164],
where the sinc-based CP approximations to the Yukawa and Helmholtz kernels have
been analyzed. In particular, the low-rank Tucker approximations to the Slater and
Yukawa kernels have been proven in [161] and in [164].

3.1.3 Tucker decomposition of function-related tensors

In what follows, we apply the Tucker decomposition algorithm to tensors generated


by a number of commonly used radial basis functions, including classical Green ker-
nels in 3D, and study numerically their properties. Recall that for a given continuous

1 A dth-order tensor is called symmetric if it is invariant under arbitrary permutations of indices in


{1, . . . , d}.
44 | 3 Rank-structured grid-based representations of functions in ℝd

Figure 3.1: Generation of a function-related


tensor in 3D on n1 × n2 × n3 Cartesian grid.
Tensor value in a voxel is computed by using
sample of a function at the corresponding
centers of the grid intervals.

function g : Ω → ℝ, Ω := ∏dℓ=1 [−bℓ , bℓ ] ⊂ ℝd , 0 < bℓ < ∞, the collocation-type


function-related tensor of order d is defined by

A0 ≡ A0 (g) := [ai1 ...id ] ∈ ℝI1 ×⋅⋅⋅×Id with ai1 ...id := g(xi(1) , . . . , xi(d) ), (3.15)
1 d

where (xi(1) , . . . , xi(d) ) ∈ ℝd are grid collocation points, indexed by ℐ = I1 × ⋅ ⋅ ⋅ × Id ,


1 d

xi(ℓ) = −bℓ + (iℓ − 1)hℓ , iℓ = 1, 2, . . . , nℓ , ℓ = 1, . . . , d, (3.16)


which are the nodes of equally spaced subintervals, with the mesh size hℓ =
2bℓ /(nℓ − 1); see Figure 3.1. When using an odd discretization parameter, the func-
tion is samples in the nodes of the grid, for example, for d = 3,

xi(ℓ) = −bℓ + (iℓ − 1/2)(2bℓ /nℓ ), iℓ = 1, 2, . . . , nℓ , ℓ = 1, 2, 3. (3.17)


For functions in ℝ3 , we generate a tensor A ∈ ℝn1 ×n2 ×n3 with entries aijk = g(xi(1) ,
xj(2) , xk(3) ). We test the rank-dependence of the Tucker approximation to the function-
related tensors A. Based on the examples of some classical Green’s kernels, one can
figure out if it is possible to rely on the Tucker tensor approximation to obtain alge-
braically their low-rank separable tensor representations. We consider the Slater-type,
Newton, and Helmholtz kernels in ℝ3 , which have the typical singularity at the origin.
The initial tensor A0 is approximated by a rank r = (r, . . . , r) Tucker representation
A(r) , where the rank-parameter r increases from r = 1, 2, . . . to some predefined value,
rmax . Then the orthogonal Tucker vectors and the core tensor of size r × r × r are used
for the construction back to full size tensor corresponding to A(r) , for estimating the
error of tensor decomposition, ‖A0 − A(r) ‖ for the given rank. For every Tucker rank r in
the respective range, we compute the relative error in the Frobenius norm as in (2.10)

‖A0 − A(r) ‖
EFN = (3.18)
‖A0 ‖
3.1 Super-compression of function-related tensors | 45

and the relative difference of norms (ℓ2 -energy norm)

‖A0 ‖ − ‖A(r) ‖
EFE = . (3.19)
‖A0 ‖

Notice that the projection property of the Tucker decomposition we have reads
‖A(r) ‖ ≤ ‖A0 ‖.

(1) Slater function. The Slater-type functions play a significant role in electronic struc-
ture calculations. For example, the Slater function given by

g(x) = exp(−α‖x‖) with x = (x1 , x2 , x3 )T ∈ ℝ3 ,

represents the electron “orbital” (α = 1) and the electron density function (α = 2) cor-
responding to the Hydrogen atom. Here and in the following, ‖x‖ = √∑dℓ=1 xℓ2 denotes
the Euclidean norm of x ∈ ℝd .
We compute the rank-(r, r, r) Tucker approximation to the function-related tensor
defined in the nodes of the n1 × n2 × n3 3D Cartesian grid with n1 = 65, n2 = 67, and
n3 = 69 in the interval 2b = 10. The slice of the discretized Slater function at the
middle of the z-axis is shown in Figure 3.2, top-left. Figure 3.2, top-right, shows the fast
exponential convergence of the approximation errors EFN , (3.18), and EFE , (3.19), with
respect to the Tucker rank. Thus, the Slater function can be efficiently approximated
by low-rank Tucker tensors. In fact, Tucker rank r = 10 provides a maximum absolute
error of the approximation of order 10−5 , and r = 18 provides approximation with
accuracy ∼10−10 . Note that the error of the Tucker tensor approximation only slightly
depends on the discretization parameter n. The corresponding numerical tests will
be demonstrated further in the section on multigrid tensor decomposition, since the
standard Tucker algorithm is practically restricted to univariate grid size of the order
of nℓ ≈ 200.
Figure 3.2, bottom-left, shows the example of the orthogonal vectors of the dom-
inating subspaces of the Tucker tensor decomposition. Note that the vectors corre-
sponding to the largest entries in the Tucker core exhibit essentially smooth shapes.
Figure 3.2, bottom-right, presents the entries of the Tucker core tensor β ∈ ℝ7×7×7 by
displaying its first four matrix slices Mβ,νr ∈ ℝ7×7×1 , νr = 1, . . . , 4. Numbers inside the
figure indicate the maximum values of the core entries at a given slice Mβ,νr ∈ ℝ7×7×1
of β. Figure 3.2 shows that the “energy” of the decomposed function is concentrated in
several upper slices of the core tensor, and the entries of the core are also decreasing
fast from slice to slice.

(2) Newton kernel. The best rank-r Tucker decomposition algorithm with r = (r, . . . , r)
is applied for approximating the Newton kernel [173]

1
g(x) = , x ∈ ℝ3 ,
‖x‖
46 | 3 Rank-structured grid-based representations of functions in ℝd

Figure 3.2: Top: discretized Slater function (left) and the error of its Tucker tensor approximation
versus the Tucker rank (right). Bottom: orthogonal vectors of the Tucker decomposition (left) and
entries of the Tucker core.

in the cube [−b, b]3 with b = 5, on the cell-centered uniform grid with discretization
parameter n = 64.
We consider the sampling points xi(ℓ) = h/2 + (i − 1)h, ℓ = 1, 2, 3, for three space
variables x(ℓ) . Figure 3.3, top-left, shows the potential at the plane close to zero point
(at z = h/2), and the top-right figure displays the absolute error of its approximation
with the Tucker rank r = 18, demonstrating the accuracy about 10−10 . Figure 3.3, left-
bottom, shows stable exponential convergence of the errors (3.18) and (3.19) with re-
spect to the Tucker rank. In particular, it follows that accuracy of the order of 10−5 , is
achieved with the Tucker rank r = 10, and for 10−3 , one can chose the rank r = 7. The
right hand side of Figure 3.3, bottom, shows the orthogonal vectors v(1) k
, k = 1, . . . , 6,
for the mode ℓ = 1 (x (1) -axis).

(3) Helmholtz potential. In the next example, we consider a Tucker approximation


of the third-order FRT generated by the Helmholtz functions given by

sin(κ‖x‖)
g1 (x) = with x = (x1 , x2 , x3 )T ∈ ℝ3 ,
‖x‖
3.1 Super-compression of function-related tensors | 47

Figure 3.3: Top: the plane of the 3D Newton potential (left), and the error of its Tucker tensor approxi-
mation with the rank r = 18. Bottom: decay of the Tucker approximation error versus the Tucker rank
(left) and the orthogonal Tucker vectors v(1)
k
, k = 1, . . . , 6.

with κ = 1 and κ = 3 and


cos(κ‖x‖)
g2 (x) = with x = (x1 , x2 , x3 )T ∈ ℝ3 .
‖x‖
We consider the FRT with the same “voxel-centered” collocation points with respect
to the n × n × n grid over [−b, b]3 , b = 5, as in the previous examples.
Figure 3.4 shows the potential for κ = 1 (top-left) and the error (top-right) of its
Tucker tensor approximation with the rank 7, which is of the order of 10−15 . Figure 3.4,
bottom, indicates the exponential convergence of the Tucker tensor approximation
in the rank parameter (left) and shows the examples of the orthogonal vectors of the
Tucker tensor decomposition. Figures in 3.5 show a similar decay for the Helmholz
potential with κ = 3. The approximation with the Tucker rank r = 10 provides the error
of the order of 10−10 . Figure 3.6 illustrates the results for the singular kernel cos‖x‖‖x‖ .
Other numerical results on the Tucker tensor decomposition can be found in [173,
146]. Recent numerics on the Tucker tensor decomposition for the Matérn radial basis
functions is presented in [206].
The following lemma proves (see [173], Lemma 2.4 and [146] for more details) that
the relative difference of norms of the best rank-(r1 , . . . , rd ) Tucker approximation A(r)
48 | 3 Rank-structured grid-based representations of functions in ℝd

Figure 3.4: Top: the plane of the 3D Helmholz potential sin‖x‖‖x‖ over cross-section (left) and the ab-
solute error for its Tucker approximation with the rank r = 6 (right). Bottom: decay of the Tucker
approximation error in the Frobenius norm with respect to the Tucker rank (left) and the orthogonal
Tucker vectors vk(1) , k = 1, . . . , 6, for the Helmholz potential sin‖x‖‖x‖ (right).

and the target A0 is estimated by the square of the relative Frobenius norm of A(r) −A0 ,
which was confirmed by the numerics above.

Lemma 3.5 (Quadratic convergence in norms). Let A(r) ∈ ℝI1 ×⋅⋅⋅×Id solve the minimiza-
tion problem (2.18) over A ∈ 𝒯 r . Then we have the “quadratic” relative error bound

‖A0 ‖ − ‖A(r) ‖ ‖A(r) − A0 ‖2


≤ . (3.20)
‖A0 ‖ ‖A0 ‖2

Moreover, ‖β‖ = ‖A(r) ‖ ≤ ‖A0 ‖.

The presented numerical experiments may be reproduced by the readers by


using the Matlab code for the Tucker decomposition algorithm for 3D tensors pre-
sented below. The Example 3 contains the main program (the reader can name it as
“Test_Tucker.m” ) and the subroutines 1, 2, and 3 contain all necessary functions:
Tucker_full_3D_ini(A3,NR,kmax,ir,Ini),
Tuck_2_F(LAM3F,U1,U2,U3),
TnormF(A).
3.1 Super-compression of function-related tensors | 49

Figure 3.5: Top: the slice of the 3D Helmholz potential sin‖x‖ ‖3x‖
over cross-section (left) and the abso-
lute error for its Tucker approximation with rank r = 10 (right). Bottom: decay of the Tucker approx-
imation error in the Frobenius norm with respect to the Tucker rank (left) and the orthogonal Tucker
vectors v(1)
k
, k = 1, . . . , 6, corresponding to sin‖x‖
‖3x‖
(right).

Figure 3.6: Decay of the Tucker approximation error in the Frobenius norm with respect to the
Tucker rank (left) and the orthogonal Tucker vectors v(1)
k
, k = 1, . . . , 6, for the Helmholz potential
cos ‖x‖
‖x‖
(right).
50 | 3 Rank-structured grid-based representations of functions in ℝd

We recommend to copy and paste first the main program from Example 3, and then
add all subroutines to the end of this file.
The Tucker_full_3D_ini function in subroutine 1 computes the Tucker decompo-
sition of a 3D tensor A3 for given Tucker ranks ir. Note that the initial guess by HOSVD
is computed only in the first call (since it is a repeated procedure for every call of the
function), and then it is stored in the auxiliary structure Ini. Function TnormF com-
putes the Frobenius norm of a given tensor A. The number of ALS iterations here is
chosen by kmax = 3.

%_______________subroutine 1____________________________________________
function [U1,U2,U3,LAM3F,Ini] = Tucker_full_3D_ini(A3,NR,kmax,ir,Ini)
[n1,n2,n3]=size(A3);
R1=NR(1); R2=NR(2); R3=NR(3); %nd=3;
[~,nstep]=size(Ini.U1);
if ir == 1
%___ Fase I - Initial Guess
D= permute(A3,[1,3,2]); B1= reshape(D,n1,n2*n3);
[Us, ~, ~]= svd(double(B1),0); U1=Us(:,1:R1);
Ini.U1=Us(:,1:nstep);
Ini.B1=B1;
D= permute(A3,[2,1,3]); B2= reshape(D,n2,n1*n3);
[Us, ~, ~]= svd(double(B2),0); U2=Us(:,1:R2);
Ini.U2=Us(:,1:nstep);
Ini.B2=B2;
D= permute(A3,[3,2,1]); B3= reshape(D,n3,n1*n2);
[Us, ~, ~]= svd(double(B3),0); U3=Us(:,1:R3);
Ini.U3=Us(:,1:nstep);
Ini.B3=B3;
end
if ir ~= 1;
U1=Ini.U1(:,1:R1); U2=Ini.U2(:,1:R2); U3=Ini.U3(:,1:R3);
B1=Ini.B1; B2=Ini.B2; B3=Ini.B3;
end
%_______ Fase II - ALS Iteration
for k1=1:kmax
Y1= B1*kron(U2,U3); C1=reshape(Y1,n1,R2*R3);
[W, ~, ~] = svd(double(C1), 0);
U1= W(:,1:R1);
Y2= B2*kron(U3,U1); C2=reshape(Y2,n2,R1*R3);
[W, ~, ~] = svd(double(C2), 0);
U2= W(:,1:R2);
Y3= B3*kron(U1,U2); C3=reshape(Y3,n3,R2*R1);
3.1 Super-compression of function-related tensors | 51

[W, ~, ~] = svd(double(C3), 0);


U3= W(:,1:R3);
end
Y1= B1*kron(U2,U3); LAM3 = U1'*Y1 ; LLL=reshape(LAM3,R1,R3,R2);
LAM3F=permute(LLL,[1,3,2]);
end

%_________subroutine 2 ______________________________________
function f = Tnorm(A)
NS = size(A); nd =length(NS); nsa =1;
for i = 1:nd
nsa = nsa*NS(i);
end
B = reshape(A,1,nsa); f = norm(B);
end
%_________________________________________________________

%_________subroutine 3___________________________________
function A3F=Tuck_2_F(LAM3F,U1,U2,U3)
[R1,R2,R3]=size(LAM3F);
[n1,~]=size(U1); [n2,~]=size(U2); [n3,~]=size(U3);
LAM31=reshape(LAM3F,R1,R2*R3);
CNT1=LAM31'*U1';
CNT2=reshape(CNT1,R2,R3*n1);
CNT3=CNT2'*U2';
CNT4=reshape(CNT3,R3,n1*n2);
CNT5=CNT4'*U3';
A3F=reshape(CNT5,n1,n2,n3);
end
%_________________________________________________________

In the main program, given in Example 3, first, the 3D tensor related to a Slater function
is generated in the rectangular computational box with the grid sizes n1 = 65, n2 = 67,
and n3 = 69 along x-, y-, and z-axis, respectively.
The Tucker tensor decomposition is performed with equal ranks for all three vari-
ables x, y, and z, in a loop starting from a rank equal to one up to a maximum rank
given by the parameter “max_Tr”. Number of ALS iterations is given by the parameter
“kmax”.
The main program prints out the same data as shown in Figure 3.5: the error of
the Tucker decomposition with respect to the Tucker rank in figure (1), displays the
generated tensor in figure (2), and examples of the Tucker vectors for one of the modes
are provided in figure (3).
52 | 3 Rank-structured grid-based representations of functions in ℝd

%________________Example 3___ main program ________


% Test_Tucker.m
clear; close all; max_Tr=10; % maximum Tucker rank
b=10.0; % size of the interval
T_error=zeros(1,max_Tr); nd=3; kmax =3;
T_en_error=zeros(1,max_Tr);
coef=1; al=2.0;
n= 64; b2=b/2; h1 = b/n;
x=-b2:h1:b2; [~,n1]=size(x); % interval in x: [-b,b]
y=-b2-h1:h1:b2+h1; [~,n2]=size(y); % interval in y: [-b-h1,b+h1]
z=-b2-2*h1:h1:b2+2*h1; [~,n3]=size(z); % interval in y: [-b-2h1,b+2h1]
A3=zeros(n1,n2,n3); A3F=zeros(n1,n2,n3);
%__________ generate a 3D tensor__________________
for i=1:n1
for j=1:n2
A3(i,j,:)=coef*exp(-al*sqrt(x(1,i)^2 + y(1,j)^2 + z(1,:).^2));
end
end
Ini.U1=zeros(n1,max_Tr); Ini.U2=zeros(n2,max_Tr);
Ini.U3=zeros(n3,max_Tr);
for ir=1:max_Tr
NR=[ir ir ir];
[U1,U2,U3,LAM3F,Ini] = Tucker_full_3D_ini(A3,NR,kmax,ir,Ini);
A3F=Tuck_2_F(LAM3F,U1,U2,U3);
err=Tnorm(A3F - A3)/Tnorm(A3);
T_error(1,ir)=abs(err);
enr=(Tnorm(A3F) -Tnorm(A3))/Tnorm(A3);
T_en_error(1,ir)=abs(enr);
fprintf(1, '\n iter = %d , err_Fro = %5.4e \n', ir, err);
end
figure(1); %____ draw convergence of the error____
semilogy(T_error(1,1:max_Tr),'Linewidth',2,'Marker','square');
hold on;
semilogy(T_en_error(1,1:max_Tr),'r','Linewidth',2,'Marker','square');
set(gca,'fontsize',16);
xlabel('Tucker rank','fontsize',16);
ylabel('error','fontsize',16);
grid on; axis tight;
%_______________draw the function______________________________
A2=A3F(:,:,(n3-1)/2); % take a 2D plane of the 3D tensor
figure(2); mesh(y,x,A2); axis tight;
%_________________draw__Tucker vectors_________________________
3.2 Multigrid Tucker tensor decomposition | 53

figure(3)
plot(x,U1(:,1),'Linewidth',2);
hold on;
for j=2:max_Tr-2
plot(x,U1(:,j),'Linewidth',2);
end
set(gca,'fontsize',16);
str3='Tucker vectors';
str2=[str3,' al= ',num2str(al) ', rank = ',num2str(max_Tr)];
title(str2,'fontsize',16); axis tight; grid on; hold off;
%______________________________________________________________

When changing the grid papameter “n”, please note that due to the restrictions for the
HOSVD, the size of the tensor should be not larger that n1 n2 n3 ≤ 1283 .
The following conclusions are the consequences from above numerics [146].

Remark 3.6. The Tucker approximation error for the considered class of function-
related tensors decays exponentially with respect to the Tucker rank.

Remark 3.7. The shape of the orthogonal vectors in the unitary matrices of the Tucker
decomposition for the class of function-related tensors is almost independent on n.

Remark 3.8. The entries of the core tensor of the Tucker decomposition for the con-
sidered function-related tensors decay fast vs. index kℓ = 1, . . . , r, ℓ = 1, 2, 3.

Properties of the Tucker decomposition for the function-related tensors described


in Remarks 3.7 and 3.8 will be used further in the development of the multigrid Tucker
algorithms.

3.2 Multigrid Tucker tensor decomposition


In the previous section, we have discussed the Tucker decomposition algorithm that
provides complexity of the Tucker tensor approximation of the order of

wF2T = O(nd+1 ) (3.21)

for full format target tensors of size n⊗d . This bound restricts application of this stan-
dard Tucker scheme from multilinear algebra to small dimensions d and moderate
grid sizes n. Thus, we have the computational work for the Tucker decomposition of
the full format tensors in 3D,
wF2T = O(n4 ), (3.22)

which practically restricts the maximum size of the input tensors to n ≈ 2003 for con-
ventional computers. Our goal is to reach linear in a volume complexity O(n3 ) by avoid-
54 | 3 Rank-structured grid-based representations of functions in ℝd

ing the HOSVD transform, thus allowing the maximum size of the input tensors cor-
responding to available computer storage.
The multigrid Tucker tensor decomposition, which gives the way to avoid storage
limitations of the standard Tucker algorithm, was introduced by V. Khoromskaia and
B. Khoromskij in 2008 [174, 146]. The idea of the multilevel Tucker approximation orig-
inates from investigating the numerical examples of the orthogonal Tucker decompo-
sition for the function-related tensors, in particular, the regularity of the orthogonal
Tucker vectors and the weak dependence of their shapes on the univariate grid param-
eter n.
The nonlinear multigrid Tucker tensor approximation problem for minimizing the
functional
A0,M ∈ 𝒮0 ⊂ 𝕍nM : f (A) := ‖A0,m − Am ‖2 → min (3.23)

is solved over a sequence of nested subspaces

𝕋r,0 ⊂ ⋅ ⋅ ⋅ ⊂ 𝕋r,m ⊂ ⋅ ⋅ ⋅ ⊂ 𝕋r,M ,

using the sequence of dyadically refined grids of size n = n0 2m−1 with m = 1, . . . , M.


Thus, the Tucker decomposition problem for a tensor A0,M ∈ 𝕍nM obtained as dis-
cretization of a function over the fine grid of size nM is based on the successive reit-
eration of the ALS Tucker approximation on a sequence of refined grids. In this case,
the initial guess is computed by HOSVD only at the coarsest grid with n0 ≪ nM , at
the moderate cost O(nd+1 0 ). Then, on finer grids, for the first run of ALS iterations,
the initial guess for {Vm(ℓ) }dℓ=1 is computed by interpolation of the orthogonal Tucker
(ℓ) d
vectors {Vm−1 }ℓ=1 from the previous grid level. At every current ALS iteration, it is up-
dated by contractions with the full tensor A0,m ∈ 𝕍nm at the corresponding grid. Thus,
the “single-hole” tensors obtained by using the interpolated orthogonal matrices ap-
pear to be of sufficient “quality”, as if they were obtained by the projections using the
HOSVD for the corresponding grid.
The resulting complexity of the Multigrid Tucker decomposition for full format
tensors is estimated by
wF2T = O(n3 ),

which currently makes possible the application of multigrid accelerated full-to-Tucker


(F2T) algorithm to the 3D function-related tensors with the maximal possible grid size
that is only bounded by the storage size for the input tensor. We refer also to [231]
where the adaptive cross approximation to 3D Tucker tensors has been applied.
The concept of the multigrid Tucker approximation applies to the multidimen-
sional data obtained as a discretization of a regular (say, continuous or analytic) mul-
tivariate function on a sequence of refined spatial grids. The typical application ar-
eas include the tensor approximation of multi-dimensional operators and functionals,
the solution of integral-differential equations in ℝd , data-structured representation of
physically relevant quantities, see for example [36].
3.2 Multigrid Tucker tensor decomposition | 55

For a fixed grid parameter n, let us introduce the equidistant tensor grid

ωd,n := ω1 × ω2 × ⋅ ⋅ ⋅ × ωd , (3.24)

where ωℓ := {−b + (k − 1)h : k = 1, . . . , n + 1} (ℓ = 1, . . . , d) with mesh-size h = 2b/n.


Define a set of collocation points {xi } in Ω := [−b, b]d ∈ ℝd , located at the midpoints of
the grid-cells and numbered by i ∈ ℐ := {1, . . . , n}d (see the explicit definition in (3.16)).
For fixed n, the target tensor

Am = [anm ,im ] ∈ ℝℐm (3.25)

is defined by sampling of the given continuous multivariate function f : Ω → ℝ on the


set of collocation points {xi } as follows:

anm ,i = f (xim ), im ∈ ℐm .

The algorithm for the multigrid Tucker tensor approximation for full size tensors is
described as follows.

Algorithm Multigrid Tucker (𝕍n →𝒯 r ) (Multigrid full-to-Tucker approximation).


(1) Given Am ∈ 𝕍n in the form (3.25), corresponding to a sequence of grid parameters
nm := n0 2m , m = 0, 1, . . . , M. Fix the Tucker rank r, and the iteration number kmax .
(2) For m = 0, solve the approximation problem by Tucker algorithm (𝕍n0 →𝒯 r ) by
using HOSVD and kmax steps of ALS iteration.
(3) For m = 1, . . . , M, perform the cascadic multigrid Tucker approximation:
(3a) Compute the initial guess for the side matrices on level m by interpolation
I(m−1,m) from level m − 1 (using piecewise linear or cubic splines)

V (ℓ) = Vm(ℓ) = I(m−1,m) (Vm−1


(ℓ)
), ℓ = 1, . . . , d.

(3b)Starting with the initial guess V (ℓ) (ℓ = 1, . . . , d), perform kmax steps of the ALS
iteration as in Step (2) of Basic Tucker algorithm (see Section 2.2.3).
(4) Compute the core β by the orthogonal projection of A onto 𝕋n = ⨂dℓ=1 𝕋ℓ with
rℓ
𝕋ℓ = span{vν(ℓ) }ν=1 (see Remark 2.6),

T T
β = A ×1 V (1) ×2 ⋅ ⋅ ⋅ ×d V (d) ∈ 𝔹r ,

at the cost O(r d n).

Figure 3.7, left, shows the numerical example of the multigrid Tucker approxi-
mation to fully populated tensors given by the 3D Slater function e−‖x‖ (x ∈ [−b, b]3 ,
b = 5.0), sampled over large n × n × n uniform grids with n = 128, 256, and 512. The
corresponding computation times (in MATLAB) (sec) of the multigrid Tucker decom-
position algorithm are shown in Figure 3.7, right.
56 | 3 Rank-structured grid-based representations of functions in ℝd

Figure 3.7: Convergence of the multigrid Tucker approximation with respect to the Tucker rank r (left)
and times for the multigrid algorithm (right).

Figure 3.8: Tucker vectors for the Slater potential on the grid with n = 129 (left) and n = 513 (right).

Figure 3.8 shows the shape of the Tucker vectors for the values of the discretization
parameter n = 129 and n = 513.
For testing the programs for multigrid Tucker tensor decomposition, first generate
3D tensors by the program in Example 4, with n = 32, 64, 128, 256, (if storage allows,
also 512). Before starting the program in Example 5, add the subroutines MG, and In-
terpolation, as well as subroutines 2, 3 from the previous example.
Then one can start the program, choosing the parameter MG (MG = 3, 4, or 5 cor-
respond to the largest grid sizes 128, 256, or 512, respectively).

%________________Example 4__Main program 4_________________________


clear; max_Tr=18; % maximum Tucker rank
b=10.0; % size of the interval
T_error=zeros(1,max_Tr); T_energy=zeros(1,max_Tr);
nd=3; kmax =3;
coef=1; al=2.0;
n= 32; b2=b/2; h1 = b/n;
3.2 Multigrid Tucker tensor decomposition | 57

x=-b2:h1:b2; [~,n1]=size(x); % interval in x: [-b,b]


y=-b2-h1:h1:b2+h1; [~,n2]=size(y); % interval in y: [-b-h1,b+h1]
z=-b2-2*h1:h1:b2+2*h1; [~,n3]=size(z); % interval in y: [-b-2h1,b+2h1]
A3=zeros(n1,n2,n3); A3F=zeros(n1,n2,n3);
for i=1:n1
for j=1:n2
A3(i,j,:) = coef*exp(-al*sqrt(x(1,i)^2 + y(1,j)^2 + z(1,:).^2));
end
end

filename2 = ['A3_' int2str(n1) '_N_Slat.mat']


save(filename2,'A3');
%_______________end of Example 4_______________________________

The complexity of the multigrid Tucker approximation by the ALS algorithm applied
to full format tensors is given in the following lemma.

Lemma 3.9 ([174]). Suppose that r 2 ≤ nm for large m. Then the numerical cost of the
multigrid Tucker algorithm is estimated by

WF→T = O(n3M r + n2M r 2 + nM r 4 + n40 ) = O(n3M r + n40 ).

Proof. In Step (2), the HOSVD on the coarsest grid level requires O(n40 ) operations
(which for large n = nm is negligible compared to other costs in the algorithm). Next,
for fixed n = nm , the assumption r 2 ≤ n implies that at every step of the ALS iterations
the costs of the consequent contractions to compute the n × r 2 unfolding matrix B(q) is
estimated by O(n3 r + n2 r 2 ), whereas the SVD of B(q) requires O(nr 4 ) operations. Sum-
ming up over the levels completes the proof, taking into account that the Tucker core
is computed in O(n3M r) operations.

%_______________Example 5___Main program_5___________________________


clear; nstep=10; MG=3; b=10.0;
nd=3; kmax =5; ng=32;
T_error=zeros(nstep,MG); T_energy=zeros(nstep,MG);

for nr=1:nstep
NR=[nr nr nr]; disp(nr);
for im=1:MG
n1=ng*2^(im-1)+1; disp(n1);
b2=b/2; Hunif = b/(n1-1); xcol1=-b2:Hunif:b2;
ycol1=-b2-Hunif:Hunif:b2+Hunif;
zcol1=-b2-2*Hunif:Hunif:b2+2*Hunif;
filename1 = ['A3_' int2str(n1) '_N_Slat.mat'];
58 | 3 Rank-structured grid-based representations of functions in ℝd

load(filename1);
[n1,n2,n3]=size(A3);
if im==1; UC1=zeros(n1,nr); UC2=zeros(n2,nr); UC3=zeros(n3,nr);
save('INTER_COMPS_MG.mat','UC1','UC2','UC3');
else
load INTER_COMPS_MG.mat;
end
Kopt = 0; if im >1; Kopt = 1; end %MG
[U1,U2,U3,LAM3F] = TensR_3sub_OPT_MG(A3,NR,kmax,Kopt,UC1,UC2,UC3);
if im < MG
n11=2*n1-1;
Hunif11 = b/(n11-1); xcol11=-b2:Hunif11:b2;
ycol11=-b2-Hunif11:Hunif11:b2+Hunif11;
zcol11=-b2-2*Hunif11:Hunif11:b2+2*Hunif11;
[UC1,UC2,UC3] = Make_Inter_Vect_xyz(xcol1,xcol11,ycol1,ycol11,...
zcol1,zcol11,n11,U1,U2,U3);
end
save INTER_COMPS_MG.mat UC1 UC2 UC3;
A3F=Tuck_2_F(LAM3F,U1,U2,U3);
err=Tnorm(A3F - A3)/Tnorm(A3);
enr=(Tnorm(A3F) -Tnorm(A3))/Tnorm(A3);
T_error(nr,im)=abs(err);
T_energy(nr,im)=abs(enr);
fprintf(1, '\n iter = %d , err_Fro = %5.4e \n', nr, err);
end
end
figure(20);
for i=1:MG
semilogy(T_error(2:nstep,i),'Linewidth',2,'Marker','square');
hold on;
semilogy(T_energy(2:nstep,i),':','Linewidth',2,'Marker','square');
set(gca,'fontsize',16);
xlabel('Tucker rank','fontsize',16);
ylabel('error','fontsize',16);
grid on; axis tight;
end
%___________________end of main program___________________________________

%_______________subroutine_MG__________________________________
function [U1,U2,U3,LAM3F] = TensR_3sub_OPT_MG(A3,NR,kmax,...
Kopt,UC1,UC2,UC3)
[n1,n2,n3]=size(A3);
3.2 Multigrid Tucker tensor decomposition | 59

R1=NR(1); R2=NR(2); R3=NR(3);


if Kopt == 1
U1 = UC1; U2 = UC2; U3 = UC3;
end
D= permute(A3,[1,3,2]); B1= reshape(D,n1,n2*n3);
D= permute(A3,[2,1,3]); B2= reshape(D,n2,n1*n3);
D= permute(A3,[3,2,1]); B3= reshape(D,n3,n1*n2);
if Kopt == 0
%____Fase I - Initial Guess ________
[Us,~,~]= svd(double(B1),0); U1=Us(:,1:R1);
[Us,~,~]= svd(double(B2),0); U2=Us(:,1:R2);
[Us,~,~]= svd(double(B3),0); U3=Us(:,1:R3);
end
%_______ Fase II - ALS Iteration ____
for k1=1:kmax
Y1= B1*kron(U2,U3);
C1=reshape(Y1,n1,R2*R3);
[W,~,~] = svd(double(C1), 0);
U1= W(:,1:R1); %
Y2= B2*kron(U3,U1);
C2=reshape(Y2,n2,R1*R3);
[W,~,~] = svd(double(C2), 0);
U2= W(:,1:R2); %
Y3= B3*kron(U1,U2);
C3=reshape(Y3,n3,R2*R1);
[W,~,~] = svd(double(C3), 0);
U3= W(:,1:R3); %
end
Y1= B1*kron(U2,U3);
LAM3 = U1'*Y1 ; %'
LLL=reshape(LAM3,R1,R3,R2);
LAM3F=permute(LLL,[1,3,2]);
end
%___________________________________________________________________

%______subroutine Interpolation______
function [U10,U20,U30] = Make_Inter_Vect_xyz(xcol,ixcol,ycol,iycol,...
zcol,izcol,n11,UT1,UT2,UT3)
n12=n11+2; n13=n11+4;
[~,R1]=size(UT1); [~,R2]=size(UT2); [~,R3]=size(UT3);
U10=zeros(n11,R1); U20=zeros(n12,R2); U30=zeros(n13,R3);
for i=1:R1
60 | 3 Rank-structured grid-based representations of functions in ℝd

U10(:,i) = interp1(xcol,UT1(:,i),ixcol,'spline');
end
for i=1:R2
U20(:,i) = interp1(ycol,UT2(:,i),iycol,'spline');
end
for i=1:R3
U30(:,i) = interp1(zcol,UT3(:,i),izcol,'spline');
end
end
%--------------------end of example------------------

3.2.1 Examples of potentials on lattices

(5) Periodic structures of Slater functions. Finally, we analyze the “multi-centered


Slater potential” obtained by displacing a single Slater function with respect to the
L × L × L lattice nodes, with a distance between nodes H > 0, specifying centers of
Slater functions,
m m m m+1
H)2 +(x2 −jH+ m+1 H)2 +(x3 −kH+ m+1 H)2
g(x) = c ∑ ∑ ∑ e−α√(x1 −iH+ 2 2 2 . (3.26)
i=1 j=1 k=1

Figure 3.9 (top-left) recalls a single Slater function. The corresponding convergence of
the multigrid Tucker approximation error in Frobenius norm for the grids 653 , 1293 ,
and 2573 , respectively, are shown in Figure 3.9 (top-right). Figure 3.9 (bottom-left)
shows the cross-section of a multi-centered Slater potential on an 8 × 8 × 8 lattice and
the corresponding Tucker tensor approximation error for the same grids is shown in
Figure 3.9 (bottom-right).
Inspection of these periodic structures shows that the convergence rate of the
rank-(r, r, r) Tucker approximation practically does not depend on the size of the
lattice-type structure. And accuracies are nearly the same. For example, for the
Tucker rank r = 10, it is exactly 10−5 for all versions of the single/multicentered
Slater function. These properties were first demonstrated on numerical examples of
multicentered Slater function in [146] for L×L×L lattices with L = 10 and L = 16. These
features can be valuable in the grid-based modeling of periodic (or nearly periodic)
structures in the density functional theory. It indicates that the Tucker decomposition
can be helpful in constructing of a small number of problem-adapted basis functions
for large lattice-type clusters of atoms.
Figure 3.10 shows the Tucker vectors of the multi-Slater function for L = 10. Grid
size is 1293 . Next remark, see V. Khoromskaia [146], became a prerequisite for the de-
velopment of powerful methods for summation of the long-range potentials on large
finite 3D lattices [148, 149, 153].
3.2 Multigrid Tucker tensor decomposition | 61

Figure 3.9: Comparison of the decay of the Tucker tensor decomposition error vs. r for a single Slater
function and for Slater functions positioned at 3D lattice nodes.

Figure 3.10: Tucker vectors of a 3D multi-centered


Slater potential with 103 centers.

Remark 3.10. For a fixed approximation error, the Tucker rank of lattice-type struc-
tures practically does not depend on the number of cells included in the computa-
tional box.
62 | 3 Rank-structured grid-based representations of functions in ℝd

Figure 3.11: Convergence of the approximation error for the multi-centered unperturbed (left panel)
and randomly perturbed Slater potential (middle, and right panels).

3.2.2 Tucker tensor decomposition as a measure of randomness

The Tucker tensor decomposition can be used for measuring the level of noise in a
tensor resulting from finite element calculations [173]. In what follows, we show the
behavior of the approximation error, under random perturbation in the function re-
lated tensor. Figure 3.11 demonstrates such an example for the Slater potential, where
the random complement equals to 1 percent, 0.1 percent, and 0.01 percent of the max-
imum amplitude. It can be seen that the exponential convergence in the Tucker rank is
observed only till the order of the random perturbation. Further increase in the Tucker
rank does not improve the approximation. In some cases it is convenient to use the
Tucker decomposition to estimate the accuracy of finite elements calculations [36].

3.3 Reduced higher order SVD and canonical-to-Tucker transform


In applications related to the solution of high dimensional PDEs, the typical situation
may arise when the target tensor is already presented in the rank-R canonical format,
A ∈ 𝒞 R,n , but with relatively large R and large mode size n. Moreover, it often happens
that a tensor corresponding to a multidimensional quantity may be represented in a
discretized separable form, using a polynomial interpolation, or a Laplace transform,
resulting in a sum of a huge number of Gaussians, due to the accurate sinc quadrature
approximation. Even in case of initially moderate ranks, the rank-structured tensor
operations (2.34)–(2.37), lead to increase of resultant rank parameters in the course
of multilinear algebra, due to multiplication of the tensor ranks. As we already men-
tioned, the standard Tucker decomposition algorithm, based on the HOSVD, is not
computationally feasible for the case of large mode size n, due to its complexity scal-
ing as O(nd+1 ).
To that end, an essential advance was brought by the so-called reduced higher-
order singular value decomposition (RHOSVD) as part of the canonical-to-Tucker
(C2T) transform introduced in 2008, [174]. It was demonstrated that for the Tucker
decomposition of function-related tensors, given in the canonical form, there is no
3.3 Reduced higher order SVD and canonical-to-Tucker transform | 63

need to build a full tensor. Instead, the orthogonal basis is computed using only the
directional (side) matrices of the canonical tensor, that consist of skeleton vectors in
every single dimension. The RHOSVD can be considered as the generalization of the
reduced SVD for rank-R matrices (see Section 2.1.5) to higher-order canonical rank-R
tensors. Actually, RHOSVD is considered as the SVD in many dimensions that can be
performed without tensor–matrix unfolding, and it is free of the so-called “curse of
dimensionality”.

3.3.1 Reduced higher order SVD for canonical target

We consider the function-related tensor presented in the rank-R canonical format,


A ∈ 𝒞 R,n (here we recall (2.13) from section 2.2.2 for convenience of presentation),
R
A = ∑ ξk u(1)
k
⊗ ⋅ ⋅ ⋅ ⊗ u(d)
k
, ξk ∈ ℝ.
k=1

In what follows, we use the equivalent contacted product tensor representation of A


(see Figure 3.12),

A = ξ ×1 U (1) ×2 U (2) ×3 ⋅ ⋅ ⋅ ×d U (d) , (3.27)


ξ = diag{ξ1 , . . . , ξd } ∈ ℝR , nℓ ×R
⊗d
U (ℓ) = [u(ℓ)
1 . . . uR ] ∈ ℝ
(ℓ)
,

where the core tensor ξ is a diagonal of a hypercube.


Canonical tensors with large ranks appear, for example, in electronic structure
calculations. The electron density of a molecule is a Hadamard product of two large
sums of discrete Gaussians representing the Gaussian-type orbitals, and it may be rep-
resented in a rank-R canonical tensor with relatively large rank R ≈ 104 . At the same
time, the n × n × n 3D Cartesian grid for representation of the electron density should
be large enough, with n of the order of 104 , to resolve multiple nuclear cusps, corre-
sponding to locations of nuclei in a molecule. Then the size of the respective tensor
in full format is about n3 = 1012 , which is far from being feasible for the Tucker tensor
decomposition algorithm discussed in the previous sections. The same applies to the
multigrid version.

Figure 3.12: Representation of the 3D canonical rank-R tensor as contractions (3.27) with side matri-
ces.
64 | 3 Rank-structured grid-based representations of functions in ℝd

The canonical-to-Tucker (C2T) transform, introduced in [174], proved to be an efficient


tool for reducing the redundant rank parameter for the canonical rank-R tensors, in
a large scale of the sizes of their grids and ranks. The favorable feature of the C2T
algorithm is the capability to produce the Tucker decomposition of a canonical tensor
without the need to compute its representation in the full tensor format.
It is worth to note that this transform could hardly appear in multilinear algebra
since it is appropriate particularly for function-related rank-R canonical tensors ex-
hibiting exponentially fast decay of singular values with respect to the Tucker rank.
In fact, this algorithm, as well as, the theory on low-rank approximation of the mul-
tidimensional functions and operators [161, 111], was the starting point for tensor nu-
merical methods for PDEs. This is because it provides an efficient tool for reducing the
ranks increasing due to multiplication of tensor ranks in the course of tensor opera-
tions.
The main idea of the C2T transform is as follows:
– Reduced HOSVD, which provides the initial guess for the Tucker tensor decompo-
sition, is computed by the SVD of side matrices U (ℓ) . There is no need to construct
the full size tensor and make its HOSVD at enormous cost.
– The Tucker core tensor is computed by simple contractions; ALS iteration for the
nearly best Tucker approximation practically requires only few steps.
– C2T transform accomplished with the CP decomposition of the small size Tucker
core provides the algorithm for the rank reduction in the canonical tensor (Tucker-
to-canonical (T2C) transform will be discussed in the next section).

Instead of the Tucker transform decomposition of full size tensors that requires the
HOSVD via SVD of full unfolding matrices of size n × nd−1 , it is sufficient to make the
reduced HOSVD based on the SVD of small side matrices U (ℓ) of size n × R, ℓ = 1, . . . , d,

U (ℓ) = V (ℓ) Σℓ W (ℓ) , ℓ = 1, . . . , d. (3.28)

Figure 3.13 shows the SVD step in RHOSVD for the dimension ℓ = 1.
The RHOSVD transform is defined as follows, [174].

Definition 3.11 (RHOSVD). Given A = ξ ×1 U (1) ×2 U (2) ⋅ ⋅ ⋅ ×d U (d) ∈ 𝒞 R,n , ξ = diag{ξ1 , . . . ,


ξR }, and the Tucker rank parameter r = (r1 , . . . , rd ), introduce the truncated SVD of the
T
side-matrices U (ℓ) , V0(ℓ) Dℓ,0 W0(ℓ) (ℓ = 1, . . . , d), where Dℓ,0 = diag{σℓ,1 , σℓ,2 , . . . , σℓ,rℓ },

Figure 3.13: RHOSVD: truncated SVD of the side matrix U (1) in the C2T transform.
3.3 Reduced higher order SVD and canonical-to-Tucker transform | 65

Figure 3.14: C2T transform converts


a 3D canonical tensor (3.27) into a
contracted product of two orthogonal
matrices and a single-hole tensor.

whereas V0(ℓ) ∈ ℝn×rℓ and W0 (ℓ) ∈ ℝR×rℓ are the respective submatrices of V (ℓ) and W (ℓ)
in SVD of U (ℓ) in (3.28). Then the RHOSVD approximation of A is given by

T T T
A0(r) = ξ ×1 [V0(1) D1,0 W0(1) ] ×2 [V0(2) D2,0 W0(2) ] ⋅ ⋅ ⋅ ×d [V0(d) Dd,0 W0(d) ]. (3.29)

Notice that A0(r) in (3.29) is obtained by the projection of the tensor A onto the
matrices of left singular vectors V0(ℓ) . Using projections of the initial CP tensor A onto
the orthogonal matrices V0(ℓ) , it is possible to construct the single-hole tensor for every
mode of A. For example, if d = 3, the tensor given in (3.27) converts into a contraction
of two orthogonal matrices and the single-hole tensor, that is actually the tensor train
(TT) representation, see Figure 3.14.
The C2T decomposition with RHOSVD was originally developed for reducing the
ranks of the canonical tensor representation of the electron density. Now it is used in
many other applications, for example, in summation of many-particle potentials and
low-rank representation of the radial basis functions. In fact, the RHOSVD can be used
as a first step in multiplicative tensor formats like TT and HT, when the original tensor
is given in the canonical tensor format.
In what follows, we recall Theorem 2.5 from [174] describing the algorithm of
canonical-to-Tucker approximation and proving the error estimate.

Theorem 3.12 (Canonical to Tucker approximation).


(a) Let A ∈ 𝒞 R,n be given by (2.13). For given Tucker rank r = (r1 , . . . , rd ), the minimiza-
tion problem
A ∈ 𝒞 R,n ⊂ 𝕍n : A(r) = argminT∈𝒯r,n ‖A − T‖𝕍n (3.30)

is equivalent to the dual maximization problem

󵄩󵄩 R 󵄩2
󵄩󵄩 T (1) T (d) 󵄩󵄩󵄩
̂ (1) ̂ (d)
[V , . . . , V ] = argmax󵄩󵄩󵄩 ∑ ξν (Y (1)
uν ) ⊗ ⋅ ⋅ ⋅ ⊗ (Y (d)
uν )󵄩󵄩󵄩 (3.31)
Y (ℓ) ∈𝒢ℓ 󵄩
󵄩ν=1 󵄩󵄩
󵄩 󵄩𝔹r

over the Grassman manifolds 𝒢ℓ , Y (ℓ) = [y(ℓ)


1 ⋅ ⋅ ⋅ yrℓ ] ∈ 𝒢ℓ (ℓ = 1, . . . , d), where
(ℓ)

T rℓ
Y (ℓ) u(ℓ)
ν ∈ℝ .
66 | 3 Rank-structured grid-based representations of functions in ℝd

(b) The compatibility condition (2.23) is simplified to


n×R
rℓ ≤ rank(U (ℓ) ) with U (ℓ) = [u(ℓ)
1 ⋅⋅⋅ u(ℓ)
R ]∈ℝ ,

and we have the solvability of (3.31), assuming that the above relation is valid.
n×rℓ
The maximizer in (3.31) is given by orthogonal matrices V (ℓ) = [v(ℓ)
1 ⋅ ⋅ ⋅ vrℓ ] ∈ ℝ
(ℓ)
,
which can be computed similarly to the Tucker decomposition for full size tensors,
where the truncated HOSVD at Step (1) is now substituted by RHOSVD; see (3.29).
(c) The minimizer in (3.30) is then calculated by the orthogonal projection
r
A(r) = ∑ μk v(1)
k
⊗ ⋅ ⋅ ⋅ ⊗ v(d)
k
, μk = ⟨v(1)
k
⊗ ⋅ ⋅ ⋅ ⊗ v(d)
k
, A⟩,
1 d 1 d
k=1

so that the core tensor μ = [μk ] can be represented in the rank-R canonical format
R
T T
μ = ∑ ξν (V (1) u(1)
ν ) ⊗ ⋅ ⋅ ⋅ ⊗ (V
(d)
u(d)
ν ) ∈ 𝒞 R,r . (3.32)
ν=1

(d) Let σℓ,1 ≥ σℓ,2 ≥ ⋅ ⋅ ⋅ ≥ σℓ,min(n,R) be the singular values of the ℓ-mode side-matrix
U (ℓ) ∈ ℝn×R (ℓ = 1, . . . , d). Then the RHOSVD approximation A0(r) , as in (3.29), ex-
hibits the error estimate
1/2
d min(n,R) R
󵄩󵄩 0 󵄩 2
󵄩󵄩A − A(r) 󵄩󵄩󵄩 ≤ ‖ξ ‖ ∑ ( ∑ σℓ,k ) , where ‖ξ ‖ = √ ∑ ξν2 . (3.33)
ℓ=1 k=rℓ +1 ν=1

The complexity of the C2T transform for the 3D canonical tensor is estimated by

WC→T = O(nR2 ).

We notice that the error estimate (3.33) in Theorem 3.12 actually provides the control
of the RHOSVD approximation error via the computable ℓ-mode error bounds since,
by the construction, we have
n
󵄩󵄩 (ℓ) (ℓ) 󵄩2 2
󵄩󵄩U − V0 Dℓ,0 W0 󵄩󵄩󵄩F = ∑ σℓ,k , ℓ = 1, . . . , d.
(ℓ)

k=rℓ +1

This result is similar to the well-known error estimate for the HOSVD approximation;
see [61].

3.3.2 Canonical-to-Tucker transform via RHOSVD

In the following, we specify the details of the C2T computational scheme for the case
d = 3. To define the RHOSVD-type rank-r Tucker approximation to the tensor in (2.13),
we set nℓ = n and suppose for definiteness that n ≤ R. Now the SVD of the side-matrix
U (ℓ) is given by
3.3 Reduced higher order SVD and canonical-to-Tucker transform | 67

n
T T
U (ℓ) = V (ℓ) Dℓ W (ℓ) = ∑ σℓ,k v(ℓ)
k
w(ℓ)
k
, v(ℓ)
k
∈ ℝn , w(ℓ)
k
∈ ℝR , (3.34)
k=1

with the orthogonal matrices V (ℓ) = [v(ℓ) 1 , . . . , vn ], and W


(ℓ) (ℓ)
1 , . . . , wn ], ℓ =
= [w(ℓ) (ℓ)

1, 2, 3. Given the rank parameter r = (r1 , r2 , r3 ) with r1 , r2 , r3 < n, we recall the truncated
SVD of the side-matrix
rℓ
T T
U (ℓ) 󳨃→ U0(ℓ) = ∑ σℓ,k v(ℓ)
k
w(ℓ)
k
= V0(ℓ) Dℓ,0 W0(ℓ) , ℓ = 1, 2, 3,
k=1

where Dℓ,0 = diag{σℓ,1 , σℓ,2 , . . . , σℓ,rℓ } and matrices V0(ℓ) ∈ ℝn×rℓ , W0 (ℓ) ∈ ℝR×rℓ represent
the orthogonal factors being the respective sub-matrices in the SVD factors of U (ℓ) .
Based on Theorem 3.12, the corresponding algorithm C2T for the rank-R input data
can be designed. The algorithm Canonical-to-Tucker (for the 3D tensor) includes the
following steps [174]:
nℓ ×R
Input data: Side matrices U (ℓ) = [u(ℓ) 1 . . . uR ] ∈ ℝ
(ℓ)
, ℓ = 1, 2, 3, composed of the
nℓ
vectors uk ∈ ℝ , k = 1, . . . , R, see (2.13); maximal Tucker-rank parameter r; maximal
(ℓ)

number of the ALS iterations mmax (usually a small number).


(I) Compute the SVD of the side matrices:

U (ℓ) = V (ℓ) D(ℓ) W (ℓ) , ℓ = 1, 2, 3.

Discard the singular vectors in V (ℓ) and the respective singular values up to given rank
threshold, yielding the small orthogonal matrices V0(ℓ) ∈ ℝnℓ ×rℓ , W0(ℓ) ∈ ℝR×rℓ , and
diagonal matrices Dℓ,0 ∈ ℝrℓ ×rℓ , ℓ = 1, 2, 3.
(II) Project the side matrices U (ℓ) onto the orthogonal basis set defined by V0(ℓ)

̃ (ℓ) = (V (ℓ) )T U (ℓ) = Dℓ,0 W (ℓ) T ,


U (ℓ) 󳨃→ U ̃ (ℓ) ∈ ℝrℓ ×R ,
U ℓ = 1, 2, 3, (3.35)
0 0

and compute A0(r) as in (3.29).


(III) (Find dominating subspaces). Perform the following ALS iteration for ℓ =
1, 2, 3, mmax times at most, starting from the RHOSVD initial guess A0(r) :
– For ℓ = 1: construct the partially projected image of the full tensor,
R
̃ (1) = ∑ ck u(1) ⊗ u
A 󳨃→ B ̃ (2) ⊗u
̃ (3) , ck ∈ ℝ. (3.36)
k k k
k=1

Figure 3.15 shows that it is exactly the same construction as for the so-called
single-hole tensor B(q) appearing at the ALS step in the Tucker decomposition
algorithm for the full size tensors.2 Here u(1)
k
∈ ℝn1 lives in the physical space for
mode ℓ = 1, whereas u ̃ k ∈ ℝr2 and u
(2)
̃ k ∈ ℝr3 , the column vectors of U
(3) ̃ (2) , and
̃
U , respectively, live in the index sets of V -projections.
(3) (ℓ)

2 But now we are not restricted by the storage for the full size tensor.
68 | 3 Rank-structured grid-based representations of functions in ℝd

Figure 3.15: Building a single-hole tensor in the C2T algorithm.

– Reshape the tensor B ̃ (1) ∈ ℝn1 ×r2 ×r3 into a matrix MA ∈ ℝn1 ×(r2 r3 ) , representing the
1
span of the optimized subset of mode-1 columns of the partially projected ten-
̃ (1) . Compute the SVD of the matrix MA :
sor B 1

MA1 = V (1) S(1) W (1) ,

and truncate the set of singular vectors in V (1) 󳨃→ Vr(1)


1
∈ ℝn1 ×r1 , according to the
restriction on the mode-1 Tucker rank, r1 .
– Update the current approximation to the mode-1 dominating subspace, Vr(1) 1
󳨃→
̃
V .
(1)

– Implement the single step of the ALS iteration for mode ℓ = 2 and ℓ = 3.
– End of the complete ALS iteration sweep.
– Repeat the complete ALS iteration mmax times at most to obtain the optimized
Tucker orthogonal side matrices Ṽ (1) , V
̃ (2) , V ̃ 3.
̃ (3) , and final projected image B

(IV) Project the final iterated tensor B ̃ 3 in (3.36) using the resultant basis set in V
̃ (3) to
r1 ×r2 ×r3
obtain the core tensor β ∈ ℝ .
Output data: The Tucker core tensor β and the Tucker orthogonal side matrices
̃ (ℓ) , ℓ = 1, 2, 3.
V
In such a way, it is possible to obtain the Tucker decomposition of a canonical
tensor with large mode-size and with rather large ranks, as it may be the case for
electrostatic potentials of biomolecules or the electron densities in electronic struc-
ture calculations. The Canonical-to-Tucker algorithm can be easily modified to use an
ε-truncation stopping criterion. Notice that the maximal canonical rank3 of the core
tensor β does not exceed minℓ (r1 r2 r3 )/rℓ ; see [161].
Our numerical study indicates that in the case of tensors via grid-based represen-
tation of functions describing physical quantities in electronic structure calculations,
the ALS step in the C2T transform is usually not required, that is, the RHOSVD approx-
imation is sufficient.

3 Further optimization of the canonical rank in the small-size core tensor β can be implemented by
applying the ALS iterative scheme in the canonical format, see e. g. [193].
3.3 Reduced higher order SVD and canonical-to-Tucker transform | 69

The following remark addresses the complexity issues.

Remark 3.13 ([174]). Algorithm C2T (𝒞 R,n →𝒯 𝒞R ,r ) exhibits polynomial cost in R, r, n,

O(dRn min{n, R} + dr d−1 n min{r d−1 , n}),

with exponential scaling in d. In absence of Step (2) (i. e., if RHOSVD provides a sat-
isfactory approximation), the algorithm does not contain iteration loops, and for any
d ≥ 2, it is a finite SVD-based scheme that is free of the curse of dimensionality.

Numerical tests show that Algorithm C2T(𝒞 R,n →𝒯 𝒞R ,r ) is efficient for moderate R
and n; in particular, it works well in electronic structure calculations on 3D Cartesian
grids for moderate grid size n ≲ 103 and for R ≤ 103 . However, in real life applica-
tions the computations may require one-dimension grid sizes in the range nℓ ≲ 3 ⋅ 104
(ℓ = 1, 2, 3) with canonical ranks R ≤ 104 . Therefore, to get rid of a polynomial scaling
in R, n, r for 3D applications one can apply the best Tucker approximation methods
based on the multigrid acceleration of the nonlinear ALS iteration as described in the
following section.

3.3.3 Multigrid canonical-to-Tucker algorithm

The concept of multigrid acceleration (MGA) in tensor calculations can be applied to


the multidimensional data obtained as a discretization of some smooth enough func-
tions on a sequence of refined spacial grids [174]. A typical application area is solving
integral-differential equations in ℝd , approximation of multidimensional operators
and functionals, and data-structured representation of physically relevant quantities,
such as molecular or electron densities, the Hartree and exchange potentials, and elec-
trostatic potentials of proteins. This concept can be applied to the fully populated and
to the canonical rank-R target tensors. In the case of rank-R input data, it can be un-
derstood as an adaptive tensor approximation method running over an incomplete set
of data in the dual space.
We introduce the equidistant tensor grid ωd,n := ω1 × ω2 × ⋅ ⋅ ⋅ × ωd , where ωℓ :=
{−b + (m − 1)h : m = 1, . . . , n + 1} (ℓ = 1, . . . , d) with mesh-size h = 2b/n. Define a set of
collocation points {xm } in Ω ∈ ℝd , located at the midpoints of the grid-cells numbered
by m ∈ ℐ := {1, . . . , n}d . For fixed n, the target tensor An = [an,m ] ∈ ℝℐ is defined as the
trace of the given continuous multivariate function f : Ω → ℝ on the set of collocation
points {xm } as follows:
an,m = f (xm ), m ∈ ℐ.

Notice that projected Galerkin discretization method can be applied as well. For fur-
ther constructions, we also need an “accurate” 1D interpolation operator ℐm−1→m from
the coarse to fine grids, acting in each spacial direction. For example, this might be
either the interpolation by piecewise linear or cubic splines.
70 | 3 Rank-structured grid-based representations of functions in ℝd

The idea of the multigrid accelerated best orthogonal Tucker approximation , see
[174], can be described as follows (for 𝒞 R,n initial data):
(1) General multigrid concept. Solving a sequence of nonlinear approximation prob-
lems for A = An as in (2.18) with n = nm := n0 2m , m = 0, 1, . . . , M, corresponding to
a sequence of (d-adic) refined spacial grids ωd,nm . The sequence of approximation
problems is treated successively in one run from coarse-to-fine grid (reminiscent
of the cascadic version of the MG method).
(2) Coarse initial approximation to the side-matrices U (q) . The initial approximation
of U (q) on finer grid ωd,nm is obtained by the linear interpolation from coarser grid
ωd,nm−1 , up to interpolation accuracy O(n−αm ), α > 0.
(3) Most important fibers. We employ the idea of “most important fibers” (MIFs) of
the q-mode unfolding matrices B(q) ∈ ℝn×rq , whose positions are extracted from
the coarser grids. To identify the location of MIFs, the so-called maximum energy
principle is applied as follows:
(3a) On the coarse grid, we calculate a projection of the q-mode unfolding ma-
trix B(q) onto the true q-mode orthogonal subspace Im U (q) = span{u(q) 1 , . . . , urq },
(q)

which is computed as the matrix product


T
β(q) = U (q) B(q) ∈ ℝrq ×rq . (3.37)

(3b) Now the maximal energy principle specifies the location of MIFs by finding
pr columns in β(q) with maximal Euclidean norms (supposing that pr ≪ r q ), see
Figures 3.16 and 3.17.4 The positions of MIFs are numbered by the index set ℐq,p
with #ℐq,p = pr, being the subset of the larger index set,

ℐq,p ⊂ ℐr q := Ir1 × ⋅ ⋅ ⋅ × Irq−1 × Irq+1 × ⋅ ⋅ ⋅ × Ird , #ℐrq = r q = O(r d−1 ).

The practical significance of the use the MIFs is justified by the observation that
positions of MIFs5 remain almost independent on the grid parameters n = nm .
(4) Restricted ALS iteration. The proposed choice of MIFs allows to accelerate the ALS
iteration to solving the problem of best rank-r approximation to the large unfold-
ing matrix B(q) ∈ ℝn×rq with dominating second dimension r q = r d−1 (always the
case for large d). This approach allows to reduce the ALS iteration to computation
of the r-dimensional dominating subspace of small n × pr submatrices B(q,p) of
B(q) (q = 1, . . . , d), where p = O(1) is some fixed small parameter.

4 This strategy allows a “blind search” sampling of a fixed portion of q-mode fibers in the Tucker
core that accumulate the maximum part of ℓ2 -energy. The union of selected fibers from every space
dimension (specified by the index set ℐq,p , q = 1, . . . , d) accumulates the most important information
about the structure of the rank R-tensor in the dual space ℝr1 ×⋅⋅⋅×rd . This knowledge reduces the amount
of computational work on fine grids (SVD with matrix-size n × pr instead of n × r q ).
5 It resembles the multidimensional “adaptive cross approximation” (see, e. g., [231] and [87] related
to the 3D case) but now acting on a fixed subset of fibers defined by MIFs.
3.3 Reduced higher order SVD and canonical-to-Tucker transform | 71

Figure 3.16: Illustration for d = 3. Finding MIFs in the “preliminary” core β(q) for q = 1 for the rank-R
initial data on the coarse grid n = n0 = (n1 , n2 , n3 ). B(q) is presented in a tensor form for explanatory
reasons.

Figure 3.17: MIFs: selected projections of the fibers of the “preliminary” cores for computing U (1)
(left), U (2) (middle), and U (3) (right). The example is taken from the multigrid rank compression in the
computation of the Hartree potential for the water molecule with the choice r = 14, p = 4.

The above guidelines lead to considerable complexity reduction of the standard


Tucker tensor decomposition algorithms (𝕍n → 𝒯 r ) (discussed in Section 3.2 and
for C2T, (𝒞 R,n →𝒯 𝒞R ,r ). In the latter case, this approach leads to the efficient tensor
approximation method with linear scaling in all governing parameters: d, n, R, and r
up to the computational cost on the “very coarse” level.
The algorithm of MG accelerated (MGA) best Tucker approximation for the canon-
ical tensor input A ∈ 𝒞 R,n can be outlined as follows [174]:

Algorithm MG-C2T (𝒞 R,nM →𝒯 𝒞R ,r ) (MGA canonical-to-Tucker approximation).


(1) Given Am ∈ 𝒞 R,nm in the form (2.13), corresponding to a sequence of grid param-
eters nm := n0 2m , m = 0, 1, . . . , M. Fix a reliability threshold parameter ε > 0,
a structural constant p = O(1), the critical grid level m0 < M, and the Tucker rank
parameter r.
(2) For m = 0, solve the approximation problem C2T(𝒞 R,nM →𝒯 𝒞R ,r ) and compute the
index set ℐq,p (n0 ) ⊂ ℐrq via identification of MIFs in the matrix unfolding B(q) ,
q = 1, . . . , d, using the maximum energy principle applied to the “preliminary core”
β(q) in (3.37).
(3) For m = 1, . . . , m0 , perform the cascadic MG nonlinear ALS iteration:
(3a) Compute initial orthogonal basis by interpolation (say, using cubic splines)
{U (1) , . . . , U (d) }m = ℐm−1→m ({U (1) , . . . , U (d) }m−1 ).
72 | 3 Rank-structured grid-based representations of functions in ℝd

Figure 3.18: Linear scaling in R and in n (left). Plot of SVD for the mode-1 matrix unfolding B(1,p) ,
p = 4 (right).

For each q = 1, . . . , d and with fixed U (ℓ) (ℓ = 1, . . . , d, ℓ ≠ q), perform:


(3b)Define the index set ℐq,p (nm ) = ℐq,p (nm−1 ) ⊂ ℐrq and check the reliability (ap-
proximation) criteria by SVD analysis of the small-size matrix (see illustration
in Figure 3.18, right)

B(q,p) = B(q) |ℐ ∈ ℝnm ×pr ,


q,p (nm )

If σmin (B(q,p) ) ≤ ε, then the index set ℐq,p is admissible. If for m = m0 the
approximation criteria above is not satisfactory, then choose p = p + 1 and
repeat steps m = 0, . . . , m0 .
(3c) Determine the orthogonal matrix U (q) ∈ ℝn×r via computing the r-dimensional
dominating subspace for the “restricted” matrix unfolding B(q,p) .
(4) For levels m = m0 + 1, . . . , M, perform the MGA Tucker approximation by ALS iter-
ation as in Steps (3a) and (3c), but now with fixed positions of MIFs specified by
the index set ℐq,p (nm0 ), i. e., by discarding all fibers in B(q) corresponding to the
“less important” index set ℐrq̄ \ ℐq,p .
(5) Compute the rank-R core tensor β ∈ 𝒞 R,r , as in Step (3) of basic algorithm C2T
(𝒞R,n → 𝒯𝒞R ,r ).

Theorem 3.14 ([174]). Algorithm MG-C2T(𝒞 R,nM →𝒯 𝒞R ,r ) amounts to

O(dRrnM + dp2 r 2 nM )

operations per ALS loop, plus extra cost Wn0 = O(dRn20 ) of the coarse mesh solver C2T
(𝒞 R,n0 →𝒯 𝒞R ,r ). It requires O(drnM + drR) storage to represent the result.

Proof. Step (3a) requires O(drnM ) operations and memory. Notice that for large M, we
have pr ≤ nM . Hence, the complexity of Step (3c) is bounded by O(dRrnM + prnM +
p2 r 2 nM ) per iteration loop, and same for Step (3b). Rank-R representation of β ∈ 𝒞 R,r
3.4 Mixed Tucker-canonical transform | 73

requires O(drRnM ) operations and O(drR)-storage. Summing up these costs over levels
m = 0, . . . , M proves the result.
Theorem 3.14 shows that Algorithm MG-C2T realizes the fast rank reduction
method that scales linearly in d, nM , R, and r. Moreover, the complexity and error
of the multigrid Tucker approximation can be effectively controlled by the tuning of
the governing parameters p, m0 , and n0 .
Figure 3.18 (left) demonstrates linear complexity scaling of the multigrid Tucker
approximation in the input rank R, and in the grid size n (electron density for the CH4
molecule). Figure 3.18 (right) shows the exponentially fast decaying singular values
of the mode-1 matrix unfolding B(1,p) with the choice p = 4, which demonstrates the
reliability of the maximal energy principal in the error control. Similar fast decay of
respective singular values is typical in most of our numerical examples in electronic
structure calculations considered so far.

3.4 Mixed Tucker-canonical transform


Since the Tucker core still presupposes r d storage, we consider the approximation
methods, using a mixed (two-level) representation [161, 173] that gainfully combines
the beneficial features of both the Tucker and canonical models. The main idea of the
mixed approximation consists of a rank-structured representation to the Tucker core
β in certain tensor classes 𝒮 ⊂ 𝔹r . In particular, we consider a class 𝒮 = 𝒞 R,r of rank-R
canonical tensors, i. e., β ∈ 𝒞 R,r .

Definition 3.15 (The mixed two-level Tucker-canonical format). Given the rank param-
eters r, R, we denote by 𝒯 𝒞R,r the subclass of tensors in 𝒯 r,n with the core β represented
in the canonical format, β ∈ 𝒞 R,r ⊂ 𝔹r . An explicit representation of A ∈ 𝒯 𝒞R,r is given
by
R
A = ( ∑ ξν u(1)
ν ⊗ ⋅ ⋅ ⋅ ⊗ uν ) ×1 V
(d) (1)
×2 V (2) ⋅ ⋅ ⋅ ×d V (d) , (3.38)
ν=1
rℓ
with some u(ℓ)
ν ∈ ℝ . Clearly, we have the embedding 𝒯 𝒞R,r ⊂ 𝒞 R,n with the corre-
sponding (non-orthogonal) side-matrices U (ℓ) = [V (ℓ) u(ℓ)
1 ⋅ ⋅ ⋅ V (ℓ) u(ℓ)
R ] and scaling
coefficients ξν (ν = 1, . . . , R).

A target tensor A ∈ 𝕍n can be approximated by a sum of rank-1 tensors as in (2.15),


(2.13), or using the mixed format 𝒯 𝒞R,r as in (3.38). In what follows, we discuss fast
and efficient methods to compute the corresponding rank structured approximations
in different problem settings.
In this case, to reduce the ranks of input tensors, we present the two-level
canonical-to-Tucker (C2T) approximation with the consequent Tucker-to-canonical
(T2C) transform. The corresponding canonical-to-Tucker-to-canonical approximation
scheme introduced in [161, 173] can be presented as the following two-level chain:
74 | 3 Rank-structured grid-based representations of functions in ℝd

Figure 3.19: Mixed Tucker-canonical decomposition.

I II
𝒞 R,n → 𝒯 𝒞R ,r → 𝒯 𝒞 ,r ⊂ 𝒞 R󸀠 ,n . (3.39)
R󸀠

Here, on Level-I, we compute the best orthogonal Tucker approximation applied to


the 𝒞 R,n -type input, so that the resultant core is represented in the 𝒞 R,r format with
the same CP rank R as for the target tensor. On Level-II, the small-size Tucker core in
𝒞 R,r is approximated by a tensor in 𝒞 R󸀠 ,r with R󸀠 < R. Here we describe the Algorithm
on Level-I (which is, in fact, the most laborious part in computational scheme (3.39)),
which has a polynomial cost in the size of the input data in 𝒞 R,n (see Remark 3.13).
In the case of full format tensors, the two-level version of Algorithm ALS Tucker
(𝕍n → 𝒯 r,n ) can be described as the following computation chain:
I II
𝕍n → 𝒯 r,n → 𝒯 𝒞R,r ⊂ 𝒞 R,n ,

where the Level-I is understood as application of Tucker decomposition Algorithm to


full format tensors (𝕍n → 𝒯 r,n ) or its multigrid version, and the Level-II includes
the rank-R canonical approximation to the small size Tucker core β ∈ 𝔹r . Figure 3.19
illustrates the computational scheme of the two-level Tucker approximation.
In the case of function-related tensors, it is possible to compute the Level-I ap-
proximation with linear cost in the size of the input data (see Section 3.2).
If the input tensor A0 is already presented in the rank-r Tucker format, then one
can apply the following Lemma 3.16. This lemma presents a simple but useful char-
acterization of the mixed (two-level) Tucker model (cf. [161, 173]) that allows one to
approximate the elements in 𝒯 r via the canonical decomposition applied to the small
sized core tensor.

Lemma 3.16 (Mixed Tucker-to-canonical approximation). Let the target tensor A ∈


𝒯 r,n in (2.18) have the form A = β ×1 V (1) ×2 ⋅ ⋅ ⋅ ×d V (d) with the orthogonal side-matrices
n×rℓ
V (ℓ) = [v(ℓ)
1 ⋅ ⋅ ⋅ vrℓ ] ∈ ℝ
(ℓ)
and with β ∈ ℝr1 ×⋅⋅⋅×rd . Then, for a given R ≤ min1≤ℓ≤d r ℓ ,

min ‖A − Z‖ = min ‖β − μ‖. (3.40)


Z∈𝒞 R,n μ∈𝒞 R,r
3.4 Mixed Tucker-canonical transform | 75

Assume that there exists the best rank-R approximation A(R) ∈ 𝒞 R,n of A, then there is
the best rank-R approximation β(R) ∈ 𝒞 R,r of β, such that

A(R) = β(R) ×1 V (1) ×2 ⋅ ⋅ ⋅ ×d V (d) . (3.41)

Proof. We present the more detailed proof compared with the sketch in Lemma 2.5,
[173]. Notice that the canonical vectors y(ℓ)
k
of any test element (see (2.13)) in the left-
hand side of (3.40),
R
Z = ∑ λk y(1)
k
⊗ ⋅ ⋅ ⋅ ⊗ y(d)
k
∈ 𝒞 R,n , (3.42)
k=1

can be chosen in span{v(ℓ)


1 , . . . , vrℓ }, i. e.,
(ℓ)

rℓ
y(ℓ)
k
= ∑ μ(ℓ) v(ℓ) ,
k,m m
k = 1, . . . , R, ℓ = 1, . . . , d. (3.43)
m=1

Indeed, assuming
rℓ
yk(ℓ) = ∑ μ(ℓ) v(ℓ) + Ek(ℓ)
k,m m
with Ek(ℓ) ⊥ span{v(ℓ)
1 , . . . , vrℓ },
(ℓ)

m=1

we conclude that Ek(ℓ) does not effect the cost function in (3.40) because of the orthog-
onality of V (ℓ) . Hence, setting Ek(ℓ) = 0 and substituting (3.43) into (3.42), we arrive at
the desired Tucker decomposition of Z,

Z = βz ×1 V (1) ×2 ⋅ ⋅ ⋅ ×d V (d) , βz ∈ 𝒞 R,r .

This implies
󵄩 󵄩2
‖A − Z‖2 = 󵄩󵄩󵄩(βz − β) ×1 V (1) ×2 ⋅ ⋅ ⋅ ×d V (d) 󵄩󵄩󵄩 = ‖β − βz ‖2 ≥ min ‖β − μ‖2 .
μ∈𝒞 R,r

On the other hand, we have


󵄩 󵄩2
min ‖A − Z‖2 ≤ min 󵄩󵄩󵄩(β − βz ) ×1 V (1) ×2 ⋅ ⋅ ⋅ ×d V (d) 󵄩󵄩󵄩 = min ‖β − μ‖2 .
Z∈𝒞 R,n β ∈𝒞 z R,r μ∈𝒞 R,r

Hence, we arrive at (3.40).


Likewise, for any minimizer A(R) ∈ 𝒞 R,n in the right-hand side of (3.40), we obtain

A(R) = β(R) ×1 V (1) ×2 V (2) ⋅ ⋅ ⋅ ×d V (d)

with the respective rank-R core tensor


R
β(R) = ∑ λk u(1)
k
⊗ ⋅ ⋅ ⋅ ⊗ u(d)
k
∈ 𝒞 R,r ,
k=1
76 | 3 Rank-structured grid-based representations of functions in ℝd

r
where u(ℓ)
k
= {μ(ℓ) }ℓ
k,mℓ mℓ =1
∈ ℝrℓ are calculated by using representation (3.43). Now
changing the order of summation, we have
R
A(R) = ∑ λk y(1)
k
⊗ ⋅ ⋅ ⋅ ⊗ y(d)
k
k=1
R r1 rd
= ∑ λk ( ∑ μ(1) v(1) ) ⊗ ⋅ ⋅ ⋅ ⊗ ( ∑ μ(d)
k,m m1
v(d) )
k,m md
1 d
k=1 m1 =1 md =1
r1 rd R d
= ∑ ⋅ ⋅ ⋅ ∑ { ∑ λk ∏ μ(ℓ)
k,m m1 ⊗ ⋅ ⋅ ⋅ ⊗ vmd .
}v(1) (d)

m1 =1 md =1 k=1 ℓ=1

The relation (3.41) implies that

‖A − AR ‖ = ‖β − β(R) ‖,

since the ℓ-mode multiplication with orthogonal side matrices V (ℓ) does not change
the cost function. Using the already proven relation (3.40), this indicates that β(R) is
the minimizer in the right-hand side of (3.40).
Lemma 3.16 means that the corresponding low-rank Tucker-canonical approxima-
tion of A ∈ 𝒯 r,n can be reduced to the canonical approximation of a small size core
tensor.
Lemma 3.16 suggests a two-level dimensionality reduction approach that leads to
a sparser data structure compared with the standard Tucker model. Though A(R) ∈ 𝒞 R,n
can be represented in the mixed Tucker-canonical format, its efficient storage depends
on further multilinear operations. In fact, if the resultant tensor is further used in
scalar, Hadamard, or convolution products with canonical tensors, it is better to store
A(R) in the canonical format of the complexity rdn.
The numerics for illustrating the performance of the multigrid canonical-to-
Tucker algorithm will be presented in Section 8.1 describing calculation of the Hartree
potential and Coulomb matrix in the Hartree–Fock equation.

3.5 On Tucker-to-canonical transform


In the rank reduction scheme for the canonical rank-R tensors, we use consequently
the canonical-to-Tucker (C2T) transform, and then the Tucker-to-canonical (T2C) ten-
sor approximation. Next, we give two useful remarks which characterize the canonical
representation of the full format tensors.
Remark 3.17 applied to the Tucker core tensor of the size r × r × r indicates that
the ultimate canonical rank of a large-size tensor in 𝕍n has the upper bound r 2 , as
illustrated by Figure 3.20. According to Remark 3.18, its canonical rank can be reduced
to a smaller value using the SVD-based truncation procedure up to a fixed tolerance
ε > 0.
3.5 On Tucker-to-canonical transform | 77

Denote by nℓ the single-hole product of ℓ-mode dimensions

nℓ = n1 ⋅ ⋅ ⋅ nℓ−1 nℓ+1 ⋅ ⋅ ⋅ nd . (3.44)

Remark 3.17. The canonical rank of a tensor A ∈ 𝕍n has the upper bound

R ≤ min nℓ . (3.45)
1≤ℓ≤d

Proof. First, consider the case d = 3. Let n1 = max1≤ℓ≤d nℓ for definiteness. We can
represent a tensor A as
n3
A = ∑ Bk ⊗ Zk , Bk ∈ ℝn1 ×n2 , Zk ∈ ℝn3 ,
k=1

where Bk = A( : , : , k) (k = 1, . . . , n3 ) is the n1 × n2 matrix slice of A, and Zk (i) = 0, for


i ≠ k, Zk (k) = 1. Let rank(Bk ) = rk ≤ n2 , k = 1, . . . , n3 . Then

rank(Bk ⊗ Zk ) = rank(Bk ) ≤ n2 ,

and we obtain
n3
rank(A) ≤ ∑ rank(Bk ) ≤ n2 n3 = min nℓ .
1≤ℓ≤3
n=1

The general case of d > 3 can be proven similarly by induction argument.

Figure 3.20: Tucker-to-canonical decomposition for a small core tensor.

The next remark shows that the maximal canonical rank of the Tucker core of 3rd-
order tensor can be easily reduced to the value ≤ r 2 by the SVD-based procedure.
Though being not practically attractive for arbitrary high-order tensors, the simple
algorithm described in Remark 3.18 is proved to be useful for the treatment of small
size 3rd-order Tucker core tensors in the rank reduction algorithms described in the
previous sections.

Remark 3.18. There is a simple procedure based on SVD to reduce the canonical rank
of the core tensor β within the accuracy ε > 0. Let d = 3 for the sake of clearness.
78 | 3 Rank-structured grid-based representations of functions in ℝd

Denote by Bm ∈ ℝr×r , m = 1, . . . , r, the matrix slices of β in some fixed mode. Hence,


we can represent
r
β = ∑ Bm ⊗ zm , zm ∈ ℝr , (3.46)
m=1

where zm (m) = 1, zm (j) = 0 for j = 1, . . . , r, j ≠ m (there are exactly d possible decom-


positions). Let pm be the minimal integer such that the singular values of Bm satisfy
ε ε
σk(m) ≤ r3/2 for k = pm + 1, . . . , r (if σr(m) > r3/2 , then set pm = r). Then, denoting by
pm
Bpm = ∑ σk(m) ukm ⊗ vkm ,
m
km =1

the corresponding rank-pm approximation to Bm (by truncation of σp(m)+1 , . . . , σr(m) ), we


m
arrive at the rank-R canonical approximation to β,
r
β(R) := ∑ Bpm ⊗ zm , zm ∈ ℝr , (3.47)
m=1

providing the error estimate

r r r r 2
2 ε
‖β − β(R) ‖ ≤ ∑ ‖Bm − Bpm ‖ = ∑ √ ∑ (σk(m) ) ≤ ∑ √r 3 = ε.
m=1 m=1 k =p +1
m
m=1 r
m m

Representation (3.47) is a sum of rank-pm terms, so that the total rank is bounded by
R ≤ p1 + ⋅ ⋅ ⋅ + pr ≤ r 2 .
This approach can be easily extended to arbitrary d ≥ 3 with the bound R ≤ r d−1 .

Figure 3.21 illustrates the canonical decomposition of the core tensor by using the
pm pm
SVD of slices Bm of the core tensor β, yielding matrices Um = {ukm }k=1 , Vm = {vkm }k=1
and a diagonal matrix of small size pm × pm containing the truncated singular values.
It also shows the vector zm = [0, . . . , 0, 1, 0, . . . , 0], containing all entries equal to 0
except 1 at the mth position.

Figure 3.21: Tucker-to-canonical decomposition for a small core tensor, see Remark 3.18.
4 Multiplicative tensor formats in ℝd
4.1 Tensor train format: linear scaling in d
The product-type representation of dth-order tensors, which is called the matrix prod-
uct states (MPS) decomposition in the physical literature, was introduced and success-
fully applied in DMRG quantum computations [302, 294, 293], and, independently, in
quantum molecular dynamics as the multilayer (ML) MCTDH methods [297, 221, 211].
Representations by MPS-type formats in multidimensional problems reduce the com-
plexity of storage to O(dr 2 N), where r is the maximal rank parameter.
In recent years, the various versions of the MPS-type tensor format were discussed
and further investigated in mathematical literature, including the hierarchical dimen-
sion splitting [161], the tensor train (TT) [229, 226], the tensor chain and combined
Tucker-TT [167], the QTT-Tucker [66] formats, and the hierarchical Tucker (HT) repre-
sentation [110], which belongs to the class of ML-MCTDH methods [297], or more gener-
ally tensor network states models. The MPS-type tensor approximation was proved by
extensive numerics to be efficient in high-dimensional electronic/molecular structure
calculations, in molecular dynamics, and in quantum information theory (see survey
papers [293, 138, 169, 264]).
Note that although the multiplicative TT and HT parametrizations formally apply
to any full format tensor in higher dimensions, they become computationally feasible
only when using the RHOSVD-like procedures applied either to the canonical format
input or to tensors already given in the TT form. The HOSVD in MPS-type formats was
discussed in [294, 100, 226].
The TT format that is the particular case of MPS-type factorization in the case
of open boundary conditions, can be defined as follows: for a given rank parameter
r = (r0 , . . . , rd ) and the respective index sets Jℓ = {1, . . . , rℓ } (ℓ = 0, 1, . . . , d) with the
constraint J0 = Jd = {1} (i. e., r0 = rd = 1), the rank-r TT format contains all elements
A = [a(i1 , . . . , id )] ∈ ℝn1 ×⋅⋅⋅×nd that can be represented as the contracted product of
3-tensors over the d-fold product index set 𝒥 := ×dℓ=1 Jℓ such that

A = ∑ a(1) (2) (d) (1)


α1 ⊗ aα1 ,α2 ⊗ ⋅ ⋅ ⋅ ⊗ aαd−1 ≡ A ⋈ A
(2)
⋈ ⋅ ⋅ ⋅ ⋈ A(d) ,
α∈𝒥

Nℓ Nℓ ×rℓ ×rℓ+1
where a(ℓ)
αℓ ,αℓ+1 ∈ ℝ (ℓ = 1, . . . , d), and A
(ℓ)
= [a(ℓ)
αℓ ,αℓ+1 ] ∈ ℝ is the vector-valued
rℓ ×rℓ+1 matrix (3-tensor). Here, and in the following (see Definition 4.3), the rank prod-
uct operation “⋈” is defined as a regular matrix product of the two core vector-valued
matrices, their fibers (blocks) being multiplied by means of tensor product [142]. The
particular entry of A is represented by

r1 rd
a(i1 , . . . , id ) = ∑ ⋅ ⋅ ⋅ ∑ a(1) (2) (d) (1) (2) (d)
α1 (i1 )aα1 ,α2 (i2 ) ⋅ ⋅ ⋅ aαd−1 (id ) ≡ A (i1 )A (i2 ) ⋅ ⋅ ⋅ A (id ),
α1 =1 αd =1

https://doi.org/10.1515/9783110365832-004
80 | 4 Multiplicative tensor formats in ℝd

so that the latter is written in the matrix product form (explaining the notion MPS),
where A(ℓ) (iℓ ) is an rℓ−1 × rℓ matrix.

Example 4.1. Figure 4.1 illustrates the TT representation of a 5th-order tensor; each
particular entry a(i1 , i2 , . . . , i5 ) is presented as a product of five matrices (and vectors)
corresponding to indexes iℓ of three-tensors, iℓ ∈ {1, . . . , nℓ }, ℓ = 1, 2, . . . , 5.

Figure 4.1: Visualizing 5th-order TT tensor.

In case J0 = Jd ≠ {1}, we arrive at the more general form of MPS, the so-called tensor
chain (TC) format [167]. In some cases, TC tensor can be represented as a sum of not
more than r∗ TT-tensors (r∗ = min rℓ ), which can be converted to the TT tensor based
on multilinear algebra operations like sum-and-compress. The storage cost for both
TC and TT formats is bounded by O(dr 2 N), r = max rℓ .
Clearly, one and the same tensor might have different ranks in different formats
(and, hence, different number of representation parameters). The next example con-
siders the Tucker and TT representations of a function-related canonical tensor F :=
T(f ), obtained by sampling of the function f (x) = x1 + ⋅ ⋅ ⋅ + xd , x ∈ [0, 1]d , on the Carte-
sian grid of size N ⊗d and specified by N-vectors Xℓ = {ih}Ni=1 (h = 1/N, ℓ = 1, . . . , d),
and all-ones vector 1 ∈ ℝN . The canonical rank of this tensor can be proven to be
exactly d [201].

Example 4.2. We have rankTuck (F) = 2, with the explicit representation

2 d
F = ∑ bk Vk(1) ⊗ ⋅ ⋅ ⋅ ⊗ Vk(d) , V1(ℓ) = 1, V2(ℓ) = Xℓ , [bk ] ∈ ⨂ ℝ2 .
1 d
k=1 ℓ=1

Moreover, rankTT (F) = 2 in view of the exact decomposition

1 0 1 0 1
F = [X1 1] ⋈ [ ] ⋈ ⋅⋅⋅ ⋈ [ ] ⋈ [ ].
X2 1 Xd−1 1 Xd

The rank-structured tensor formats like canonical, Tucker and MPS/TT-type de-
compositions induce the important concept of canonical, Tucker or matrix product op-
4.1 Tensor train format: linear scaling in d | 81

erators (CO/TO/MPO) acting between two tensor-product Hilbert spaces, each of di-
mension d,

d d
(ℓ)
𝒜 : 𝕏 = ⨂X → 𝕐 = ⨂ Y (ℓ) .
ℓ=1 ℓ=1

For example, the R-term canonical operator (matrix) takes a form

R d
(ℓ) (ℓ) (ℓ)
𝒜 = ∑ ⨂ 𝒜α , 𝒜α : X → Y (ℓ) .
α=1 ℓ=1

The action 𝒜X on rank-RX canonical tensor X ∈ 𝕏 is defined as RRX -term canonical


sum in 𝕐,

R RX d
(ℓ) (ℓ)
𝒜X = ∑ ∑ ⨂ 𝒜α xβ ∈ 𝕐.
α=1 β=1 ℓ=1

The rank-r Tucker matrix can be defined in a similar way.


In the case of rank-r TT format, the respective matrices are defined as follows.

Definition 4.3. The rank-r TT-operator (TTO/MPO) decomposition symbolized by a set


of factorized operators 𝒜 is defined by

(1) (2) (d) (1)


𝒜 = ∑ Aα1 ⊗ Aα1 α2 ⊗ ⋅ ⋅ ⋅ ⊗ Aαd−1 ≡ 𝒜 ⋈ 𝒜(2) ⋈ ⋅ ⋅ ⋅ ⋈ 𝒜(d) ,
α∈𝒥

where 𝒜(ℓ) = [A(ℓ)


αℓ αℓ+1 ] denotes the operator-valued rℓ × rℓ+1 matrix, and where Aαℓ αℓ+1 :
(ℓ)

X (ℓ) → Y (ℓ) (ℓ = 1, . . . , d), or, in the index notation,


r1 rd−1
(1) (2)
𝒜(i1 , j1 , . . . , id , jd ) = ∑ ⋅ ⋅ ⋅ ∑ Aα1 (i1 , j1 )Aα1 α2 (i2 , j2 ) ⋅ ⋅ ⋅
α1 =1 αd−1 =1

⋅ A(d−1) (d)
αd−2 αd−1 (id−1 , jd−1 )Aαd−1 (id , jd ). (4.1)

Given a rank-rX TT-tensor X = X(1) ⋈ X(2) ⋈ ⋅ ⋅ ⋅ ⋈ X(d) ∈ 𝕏, the action AX = Y is


defined as the TT element Y = Y(1) ⋈ Y(2) ⋈ ⋅ ⋅ ⋅ ⋈ Y(d) ∈ 𝕐,

AX = Y(1) ⋈ Y(2) ⋈ ⋅ ⋅ ⋅ ⋈ Y(d) ∈ 𝕐, with Y(ℓ) = [𝒜(ℓ) (ℓ)


α1 α2 Xβ β ]α β ,α β ,1 2 1 1 2 2

where, in the brackets, we use the standard matrix–vector multiplication. The TT-rank
of Y is bounded by rY ≤ r ⊙ rX , where ⊙ means the standard Hadamard (entry-wise)
product of two vectors.
To describe the index-free operator representation of the TT matrix–vector prod-
uct, we introduce the tensor operation denoted by ⋈∗ that can be viewed as dual to ⋈;
it is defined as the tensor (Kronecker) product of the two corresponding core matrices,
82 | 4 Multiplicative tensor formats in ℝd

their blocks being multiplied by means of a regular matrix product operation. Now,
with the substitution Y(ℓ) = 𝒜(ℓ) ⋈∗ X(ℓ) , the matrix–vector product in TT format takes
the operator form,
AX = (𝒜(1) ⋈∗ X(1) ) ⋈ ⋅ ⋅ ⋅ ⋈ (𝒜(d) ⋈∗ X(d) ).

As an example, we consider the finite difference negative d-Laplacian over uni-


form tensor grid, which is known to have the Kronecker rank-d representation

Δd = A ⊗ IN ⊗ ⋅ ⋅ ⋅ ⊗ IN + IN ⊗ A ⊗ IN ⊗ ⋅ ⋅ ⋅ ⊗ IN + ⋅ ⋅ ⋅ + IN ⊗ IN ⊗ ⋅ ⋅ ⋅ ⊗ A ∈ ℝN
⊗d
×N ⊗d
, (4.2)

with A = Δ1 = tridiag{−1, 2, −1} ∈ ℝN×N and the N × N identity matrix IN .


For the canonical rank we have rankCan (Δd ) = d, whereas the TT-rank of Δd is
equal to 2 for any dimension due to the explicit representation [142]
⊗(d−2)
I 0 I
Δd = [Δ1 IN ] ⋈ [ N ] ⋈ [ N] ,
Δ1 IN Δ1

where the rank product operation “⋈” in the matrix case is defined as above [142]. The
similar statement is true concerning the Tucker rank, rankTuck (Δd ) = 2.
Application of tensor methods for multidimensional PDEs are reported in [65, 67,
68], [212, 214, 213, 21, 257] and in [182, 188]. The basic mathematical models in quantum
molecular dynamics have been previously described in [210, 211]. Greedy algorithms
for high-dimensional non-symmetric linear problems have been considered in [48].
Basic multilinear algebra operations and solution of linear systems in TT and HT
formats have been addressed [10, 228, 11, 21, 195]. The corresponding theoretical anal-
ysis can be found in [57, 196, 250, 249, 8] and [136, 137, 250]. Some applications of HT
tensor format have been discussed in [261, 262, 242].
Recently the TT and QTT tensor formats were applied in electronic structure cal-
culations for small molecules [240, 239].

4.2 O(log n)-quantics (QTT) tensor approximation


The quantized (or quantics) tensor train (QTT) approximation was introduced and rig-
orously analyzed by B. Khoromskij in 2009 [165, 167]. It was initiated by the idea to
test the TT-ranks of long function-related vectors reshaped to multidimensional hyper-
cubes. For function-generated vectors (tensors), the QTT approximations were proved
to provide the logarithmic data compression O(d log N) on the wide class of functions
in ℝd , sampled on a tensor grid of size N d . The basic approximation theory indicates
that for a class of function-generated vector of size N = qL , its reshaping into a q×⋅ ⋅ ⋅×q
hypercube allows a small TT rank decomposition of the resultant L-dimensional ten-
sor, [165, 167]. The storage of vectors of size qL is reduced to qLr 2 , where r is a small
QTT rank. Thus, for example, if we have a vector of size 2L = 220 , obtained by the grid
4.2 O(log n)-quantics (QTT) tensor approximation | 83

discretization of an exponential function, then its quantized representation will need


only 2 ⋅ 20 numbers, that is, 2 ⋅ log(2L ), since the exponential function has a QTT rank
equal to 1. Correspondingly, algebraic operations with the QTT images are performed
with logarithmic cost.
The QTT- or QCP-type approximation of an N-vector with N = qL , L ∈ ℕ, is de-
fined as the tensor decomposition (approximation) in the TT or canonical [189] formats
applied to a tensor obtained by the q-adic folding (reshaping) of the target vector to
an L-dimensional q × ⋅ ⋅ ⋅ × q data array (tensor) that is thought as an element of the
L-dimensional quantized tensor space.
In particular, in the vector case, i. e., for d = 1, a vector x = [x(i)]i∈I ∈ 𝕍N,1 , is
reshaped to its quantized image in ℚq,L = ⨂Lj=1 𝕂q , 𝕂 ∈ {ℝ, ℂ}, by q-adic folding,

ℱq,L : x → Y = [Y(j)] ∈ ℚq,L , j = {j1 , . . . , jL }, with jν ∈ {1, 2, . . . , q}, ν = 1, . . . , L,

where for fixed i, we have Y(j) := x(i), and jν = jν (i) is defined via q-coding, jν −1 = C−1+ν ,
such that the coefficients C−1+ν are found from the q-adic representation of i − 1,

L
i − 1 = C0 + C1 q1 + ⋅ ⋅ ⋅ + CL−1 qL−1 ≡ ∑ (jν − 1)qν−1 .
ν=1

For d > 1, the construction is similar [167].


Suppose that the quantized image for certain N-d tensor (i. e., an element of
D-dimensional quantized tensor space with D = d logq N = dL) can be effectively
represented (approximated) in the low-rank TT (or CP) format living in the higher-
dimensional tensor space ℚq,dL . In this way, we introduce the QTT approximation
of an N-d tensor. For given rank {rk } (k = 1, . . . , dL), the number of representation
parameters for the QTT approximation of an N-d tensor can be estimated by

dqr 2 logq N ≪ N d , where rk ≤ r, k = 1, . . . , dL,

providing log-volume scaling in the size of initial tensor O(N d ). The optimal choice of
the base q is shown to be q = 2 or q = 3 [167]. However, the numerical realizations are
usually implemented by using binary coding, i. e., for q = 2. Figure 4.2 illustrates the
QTT tensor approximation in cases L = 3 and L = 10.
The principal question arises: either there is the rigorous theoretical substantia-
tion of the QTT approximation scheme that establishes it as the new powerful approx-
imation tool applicable to the broad class of data, or this is simply the heuristic alge-
braic procedure that may be efficient in certain numerical examples.
The answer is positive: the power of QTT approximation method is due to the
perfect rank-r decomposition discovered in [165, 167] for the wide-ranging class of
function-related tensors obtained by sampling a continuous functions over uniform
(or properly refined) grid. In particular, we have
– r = 1 for complex exponents;
84 | 4 Multiplicative tensor formats in ℝd

Figure 4.2: Visualizing the QTT tensor approximation in cases L = 3 and L = 10.

– r = 2 for trigonometric functions and for Chebyshev polynomials sampled on


Chebyshev–Gauss–Lobatto grid;
– r ≤ m + 1 for polynomials of degree m;
– r is a small constant for standard wavelet basis functions, etc.

The above rank bounds remain valid independently on the vector size N, and they are
applicable to the general case q = 2, 3, . . ..
Approximation of 2d × 2d Laplacian-type matrices using TT tensor decomposition
was introduced in [225].
Notice that the name quantics (or quantized) tensor approximation (with a short-
hand QTT), originally introduced in 2009 [165], is a reminiscent of the entity “quan-
tum of information” that mimics the minimal possible mode size (q = 2 or q = 3) of
the quantized image. Later on, in some publications the QTT approximation method
was renamed as ”vector tensorization” [101, 110].

4.3 Low-rank representation of functions in quantized tensor


spaces
The simple isometric folding of a multi-index data array into the 2 × 2 × ⋅ ⋅ ⋅ × 2 format
living in the virtual (higher) dimension D = d log N is the conventional reshaping oper-
ation in computer data representation. The most gainful features of numerical compu-
tations in the quantized tensor space appear via the remarkable rank-approximation
properties figured out for the wide-ranging class of function-related vectors/tensors
[167].
The next lemma presents the basic results on the rank-1 (resp. rank-2) q-folding
representation of the exponential (resp. trigonometric) vectors.

Lemma 4.4 ([167]). For given N = qL , with q = 2, 3, . . ., L ∈ ℕ, and z ∈ ℂ, the exponential


N-vector z := {xn = z n−1 }Nn=1 can be reshaped by the q-folding to the rank-1 q⊗L -tensor,

L p−1 p−1 T
ℱq,L : z 󳨃→ Z = ⨂ [1 zq ⋅⋅⋅ z (q−1)q ] ∈ ℚq,L . (4.3)
p=1
4.3 Low-rank representation of functions in quantized tensor spaces | 85

The number of representation parameters specifying the QTT image is reduced dramat-
ically from N to qL = q log N.
The trigonometric N-vector t = ℑm(z) := {tn = sin(ω(n − 1))}Nn=1 , ω ∈ ℝ, can be
reshaped by the successive q-adic folding

ℱq,L : t 󳨃→ T ∈ ℚq,L

to the q⊗L -tensor T that has both the canonical ℂ-rank and the TT-rank equal exactly
to 2. The number of representation parameters does not exceed 4qL.

Example 4.5. In case q = 2, the single sin-vector has the explicit rank-2 QTT-represen-
tation in {0, 1}⊗L (see [69, 227]) with kp = 2p−L ip − 1, ip ∈ {0, 1},

cos ωkp − sin ωkp cos ωkL


t 󳨃→ T = ℑm(Z) = [sin ωk1 cos ωk1 ] ⋈L−1
p=2 [ ]⋈[ ].
sin ωkp cos ωkp sin ωkL

Other results on QTT representation of polynomial, Chebyshev polynomial,


Gaussian-type vectors, multivariate polynomials, and their piecewise continuous ver-
sions have been derived in [167] and in subsequent papers [177, 227, 68] substantiating
the capability of numerical calculus in quantized tensor spaces.
In computational practice the binary coding representation with q = 2 is the most
convenient choice, though the Euler number q∗ = e ≈ 2.7 . . . is shown to be the optimal
value [167].
The following example demonstrates that the low-rank QTT approximation can be
applied for O(|log ε|) complexity integration of functions. Given continuous function
f (x) and weight function w(x), x ∈ [0, A], consider the rectangular N-point quadra-
ture IN , N = 2L , ensuring the error bound |I − IN | = O(2−αL ). Assume that the corre-
sponding functional vectors allow low-rank QTT approximation. Then the rectangu-
lar quadrature can be implemented as the scalar product on QTT tensors in O(log N)
operations.
A N L
∫ w(x)f (x)dx ≈ IN (f ) := h ∑ w(xi )f (xi ) = ⟨W, F⟩QTT , W, F ∈ ⨂ ℝ2 .
0 i=1 ℓ=1

Example 4.6 illustrates below the uniform bound on the QTT rank for nontrivial highly
oscillating functions. Here and in the following the threshold error like ϵQTT corre-
sponds to the Euclidean norm.

Example 4.6. Highly oscillating and singular functions on [0, A], ω = 100, ϵQTT = 10−6 ,

x + ak sin(ωx), x ∈ 10( k−1


p
; k−0.5
p
],
f3 (x) = {
0, x ∈ 10( k−0.5
p
; pk ],
f4 (x) = (x + 1) sin(ω(x + 1)2 ), x ∈ [0, 1] (Fresnel integral),
86 | 4 Multiplicative tensor formats in ℝd

where the function f3 (x), x ∈ [0, 10], k = 1, . . . , p, p = 16, ak = 0.3 + 0.05(k − 1), is
recognized on three different scales.

Notice that in the following, in all numerical results, we use the average QTT rank
r defined as

1 d−1
r := √ ∑r r . (4.4)
d − 1 k=1 k k+1

The average QTT ranks over all directional ranks for the corresponding functional vec-
tors are given in Table 4.1. The maximum rank over all the fibers is nearly the same as
the average one.

Table 4.1: Average QTT ranks of N-vectors generated by f3 and f4 .

N\r rQTT (f3 ) rQTT (f4 )


14
2 3.5 6.5
215 3.6 7.0
216 3.6 7.5
217 3.6 7.9

Further examples concerning the low-rank QTT tensor approximation will be pre-
sented in sections related to computation of the two-electron integrals and the sum-
mation of electrostatic potentials over large lattice structured system of particles.
Notice that 1D and 2D numerical quadratures, based on interpolation by Cheby-
shev polynomials, have been developed [120]. Taking into account that Chebyshev
polynomial, sampled on Chebyshev grid, has the exact rank-2 QTT representation [167]
allows us to perform the efficient numerical integration via Chebyshev interpolation
by using the QTT approximation.
In application to multidimensional PDEs, the tensor representation of operators
in quantized spaces is also important. Several results on the QTT approximation of dis-
cretized multidimensional operators (matrices) were presented in [179, 177, 176, 178,
155] and in [142, 66, 67].
Superfast FFT, wavelet and circulant convolution-type data transforms of loga-
rithmic complexity have been introduced [69, 143, 175].
Various applications of the QTT format to the solution of PDEs were reported in
[68, 188, 67, 65, 144, 180, 181].
5 Multidimensional tensor-product convolution

The important prerequisites for the grid-based calculation of the convolution integrals
in ℝd arising in computational quantum chemistry are the multidimensional tensor-
product convolution techniques and the efficient canonical tensor representation of
the Green’s kernels by using the Laplace transform and sinc-quadrature methods.
The tensor-product approximation of multidimensional convolution transform
discretized via collocation-projection scheme on the uniform or composite refined
grids was introduced in 2007 (see [173, 166]). In what follows, we present some of the
results in [166], where the examples of convolving kernels are given by the classical
Newton, Slater (exponential), and Yukawa potentials, 1/‖x‖, e−λ‖x‖ , and e−λ‖x‖ /‖x‖ with
x ∈ ℝd . For piecewise constant elements on the uniform grid of size nd , the quadratic
convergence rate O(h2 ) in the mesh parameter h = 1/n is proved in [166], where it
was also shown that the Richardson extrapolation method on a sequence of grids
improves the order of approximation up to O(h3 ). The fast algorithm of complexity
O(dR1 R2 n log n) is described for tensor-product convolution on the uniform/compos-
ite grids of size nd , where R1 , R2 are the tensor ranks of convolving functions. We also
discuss the tensor-product convolution scheme in the two-level Tucker-canonical
format and discuss the consequent rank reduction strategy. The numerical illustra-
tions confirming the approximation theory for convolution schemes of order O(h2 )
and O(h3 ) can be found in [166]. The linear-logarithmic complexity scaling in n of 1D
discrete convolution on large composite grids and for convolution method on n × n × n
grids in the range n ≤ 16 384 was also demonstrated.

5.1 Grid-based discretization of the convolution


transform
The multidimensional convolution in L2 (ℝd ) is defined by the integral transform

w(x) := (f ∗ g)(x) := ∫ f (y)g(x − y)dy f , g ∈ L2 (ℝd ), x ∈ ℝd . (5.1)


ℝd

We are interested in approximate computation of f ∗ g in some fixed box Ω = [−A, A]d ,


assuming that the convolving function f has a support in Ω󸀠 := [−B, B]d ⊂ Ω (B < A),
i. e., supp f ⊂ Ω󸀠 . In electronic structure calculations, the convolving function f may
represent electron orbitals or electron densities, which normally have an exponential
decay, and, hence, they could be truncated beyond some fixed spacial box.
The common example of the convolving kernel g is given by the restriction of the
fundamental solution of an elliptic operator in ℝd . For example, in the case of the
Laplacian in ℝd , d ≥ 3, we have

https://doi.org/10.1515/9783110365832-005
88 | 5 Multidimensional tensor-product convolution

g(x) = c(d)/‖x‖d−2 , x = (x1 , . . . , xd ) ∈ ℝd , ‖x‖ = √x12 + ⋅ ⋅ ⋅ + xd2 ,

d
where c(d) = −2 4−d /Γ(d/2 − 1). This example will be considered in more detail.
There are three commonly used discretization methods for the integral operators
the so-called Nyström, collocation and Galerkin-type schemes. Below, we consider
the case of uniform grids, referring to [166] for complete theory, including the case of
composite grids.
Introduce the equidistant tensor-product lattice ωd := ω1 ×⋅ ⋅ ⋅×ωd of size h = 2A/n
by setting ωℓ := {−A + (k − 1)h : k = 1, . . . , n + 1}, where, for the sake of convenience, n =
2p, p ∈ ℕ, and define the tensor-product index set ℐ := {1, . . . , n}d . Hence Ω = ⋃i∈ℐ Ωi
becomes the union of closed boxes Ωi = ⨂dℓ=1 Ωiℓ specified by segments

Ωiℓ := {xℓ : xℓ ∈ [−A + (iℓ − 1)h, −A + iℓ h]} ⊂ ℝ (ℓ = 1, . . . , d). (5.2)

The Nyström-type scheme leads to simple discretization

(f ∗ g)(xj ) ≈ hd ∑ f (yi )g(xj − yi ), j ∈ ℐ,


i∈ℐ

where, for the ease of presentation, the evaluation points xj , and the collocation points
yi , i, j ∈ ℐ are assumed to be located on the same cell-centered tensor-product grid
corresponding to ωd . The Nyström-type scheme applies to the continuous functions
f , g, which leads to certain limitations in the case of singular kernels g.
The collocation-projection discretization can be applied to a much more general
class of integral operators than the Nyström methods, including Green’s kernels with
the diagonal singularity, say to the Newton potential g(x) = 1/‖x‖. We consider the
case of tensor-product piecewise constant basis functions {ϕi } associated with ωd , so
that ϕi = χΩi is the characteristic function of Ωi ,

d
ϕi (x) = ∏ ϕiℓ (xℓ ), where ϕiℓ = χΩi . (5.3)

ℓ=1

Let xm ∈ ωd be the set of collocation points with m ∈ ℳn := {1, . . . , n + 1}d (we


use the notation ℳn = ℳ if there is no confusion), and let fi be the representation
coefficients of f in {ϕi },

f (y) ≈ ̃f (y) := ∑ fi ϕi (y).


i∈ℐ

In what follows, we specify the coefficients as fi = f (yi ), where yi is the midpoint of Ωi ,


i ∈ ℐ . We consider the following discrete collocation-projection scheme:

f ∗ g ≈ {wm }, wm := ∑ fi ∫ ϕi (y)g(xm − y)dy, xm ∈ ωd , m ∈ ℳ. (5.4)


i∈ℐ
ℝd
5.1 Grid-based discretization of the convolution transform | 89

The straightforward pointwise evaluation of this scheme requires O(n2d ) operations.


In the case of equidistant grids, the computational complexity can be reduced to
O(nd log n) by applying the multidimensional FFT. Our goal is to reduce the numerical
complexity to the linear scale in the dimension d.
To transform the collocation scheme (5.4) to the discrete convolution, we precom-
pute the collocation coefficients

gi = ∫ ϕi (y)g(−y)dy, i ∈ ℐ, (5.5)
ℝd

define the dth-order tensors F = {fi }, G = {gi } ∈ ℝℐ , and introduce the d-dimensional
discrete convolution

F ∗ G := {zj }, zj := ∑ fi gj−i+1 , j ∈ 𝒥 := {1, . . . , 2n − 1}d , (5.6)


i

where the sum is taken over all i ∈ ℐ , which leads to legal subscripts for gj−i+1 ,
j − i + 1 ∈ ℐ . Specifically, for jℓ = 1, . . . , 2n − 1,

iℓ ∈ [max(1, jℓ + 1 − n), min(jℓ , n)] ℓ = 1, . . . , d.

The discrete convolution can be gainfully applied to fast calculation of {wm }m∈ℳ
in the collocation scheme (5.4) as shown in the following statement.

Proposition 5.1 ([166]). The discrete collocation scheme {wm }, m ∈ ℳ, is obtained by


copying the corresponding portion of {zj } from (5.6), centered at j = n = n⊗d ,

{wm } = {zj }|j=j0 +m , m ∈ ℳ, j0 = n/2.

Proof. In the 1D case, we have

z(1) = f (1) ⋅ g(1), z(2) = f (1) ⋅ g(2) + f (2) ⋅ g(1), ...,


z(n) = f (1) ⋅ g(n) + f (2) ⋅ g(n − 1) + ⋅ ⋅ ⋅ + f (n) ⋅ g(1), ..., z(2n − 1) = f (n) ⋅ g(n).

Then we find that elements {wm } coincide with {zj }|j=j0 +m , m ∈ ℳ, j0 = n/2. The general
case d ≥ 1 can be justified by applying the above argument to each spatial variable.

Notice that the Galerkin method of discretization reads as follows:

f ∗g ≈ ∑ fi gj−i+1 with gj−i+1 := ∫ ϕj (x)ϕi (y)g(x − y)dxdy


i, j−i+1∈ℐ, j∈j0 +ℳ
ℝd

with the choice fi = ⟨f , ϕi ⟩L2 . The Galerkin scheme is known as the most convenient
for theoretical error analysis. However, compared with the collocation method, it has
higher implementation cost because of the presence of double integration. Hence,
90 | 5 Multidimensional tensor-product convolution

classical discretization methods mentioned above may differ from each other by con-
struction of the tensor-product decompositions. To keep a reasonable compromise be-
tween the numerical complexity of the scheme and its generality, in the following we
focus on the collocation method by simple low-order finite elements.
Recall that in the case of piecewise constant basis functions the error bound O(h2 )
for the collocation scheme is proved in [166], whereas the Richardson extrapolation
method on a sequence of grids proved to provide the improved approximation error
O(h3 ). Such an extrapolation, when available, allows a substantial reduction of the
approximation error without extra cost. It is worth noting that the Richardson extrap-
olation can also be applied to some functionals of the convolution product, say to
eigenvalues of the operator that includes the discrete convolution.

5.2 Tensor approximation to discrete convolution on


uniform grids
Recall that in the case of uniform grids, the discrete convolution in ℝd can be imple-
mented by d-dimensional FFT with linear cost in the volume size, O(nd log n), which
preserves the exponential scaling in d. To avoid the curse of dimensionality, we rep-
resent the d-dimensional convolution product approximately in the low-rank tensor
product formats. This reduces dramatically the computational cost to O(dn log n). Note
that tensor approximation to discrete convolution on non-uniform grids is considered
in full detail in [166]; see also [109].
We notice that the multidimensional convolution product appears to be one of the
most computationally elaborate multilinear operations. The key idea is to calculate
the d-dimensional convolution approximately using rank-structured tensor approxi-
mations [166]. Recall that for given d-th order tensors F, G ∈ 𝒯 r in the Tucker format,
represented by

F = β ×1 F (1) ×2 F (2) ⋅ ⋅ ⋅ ×d F (d) and G = γ ×1 G(1) ×2 G(2) ⋅ ⋅ ⋅ ×d G(d) ,

the convolution product can be represented in the separable form (cf. [173])
r r
k m k m
F ∗ G := ∑ ∑ βk1 ...kd γm1 ...md (f1 1 ∗ g1 1 ) ⊗ ⋅ ⋅ ⋅ ⊗ (fdd ∗ gd d ). (5.7)
k=1 m=1

k m
Computing 1D convolution fℓℓ ∗gℓ ℓ ∈ ℝ2n−1 in O(n log n) operations leads to the overall
linear-logarithmic complexity in n,

2
𝒩T∗T = O(dr n log n + #β ⋅ #γ).

In general one might have #β ⋅ #γ = O(r 2d ), which may be restrictive even for moder-
ate d.
5.2 Tensor approximation to discrete convolution on uniform grids | 91

A significant complexity reduction is observed if at least one of the convolving


tensors can be represented in the canonical format. Letting F ∈ 𝒯 r , G ∈ 𝒞 R , i. e.,
γ = diag{γ1 , . . . , γR }, we tensorize the convolution product as follows:

r R
k k
F ∗ G = ∑ ∑ βk1 ...kd γm (f1 1 ∗ gm m
1 ) ⊗ ⋅ ⋅ ⋅ ⊗ (fd ∗ gd ).
d
(5.8)
k=1 m=1

However, the calculation by (5.8) still scales exponentially in d, which leads to certain
limitations in the case of higher dimensions.
To get rid of this exponential scaling, it is better to perform the convolution trans-
form using the two-level tensor format, i. e., F ∈ 𝒯 𝒞R ,r (see Definition 3.15) in such
1
a way that the result U = F ∗ G with G ∈ 𝒞 RG is represented in the two-level Tucker
format 𝒯 𝒞R R ,rR . Recall that an explicit representation for F ∈ 𝒯 𝒞R ,r is given by
1 G G 1

R1
F = ( ∑ βν zν1 ⊗ ⋅ ⋅ ⋅ ⊗ zνd ) ×1 F (1) ×2 F (2) ⋅ ⋅ ⋅ ×d F (d) , (5.9)
ν=1

so that we have the imbedding 𝒯 𝒞R ,r ⊂ 𝒞 R1 ,n with the corresponding (non-orthogonal)


1
R
side-matrices S(ℓ) = [F (ℓ) z1ℓ ⋅ ⋅ ⋅ F (ℓ) zℓ 1 ] ∈ ℝn×R1 and scaling factors βν (ν = 1, . . . , R1 ).
Now we represent the tensor-product convolution in the two-level format

RG R1
F ∗ G = ∑ γm ( ∑ βν zν1 ⊗ ⋅ ⋅ ⋅ ⊗ zνd ) ×1 (F (1) ∗ gm
1 ) ×2 ⋅ ⋅ ⋅ ×d (F
(d)
∗ gm
d ), (5.10)
m=1 ν=1

such that the above expansion can be evaluated by the following algorithm.

Algorithm 5.1 (d-dimensional tensor convolution of type 𝒯 𝒞R ,r ∗ 𝒞 RG ,n →𝒯 𝒞R R ).


1 1 G ,rRG
R1
(1) Given F ∈ 𝒯 𝒞R ,r with the core β = ∑ν=1 βν zν1 ⊗ ⋅⋅⋅ ⊗ zνd ∈ 𝒞 R1 ,r , and G ∈ 𝒞 RG ,n .
1

(2) For ℓ = 1, . . . , d, compute the set of 1D convolutions uk,m ℓ = fkℓ ∗ gm


ℓ (k = 1, . . . , r,
m = 1, . . . , RG ) of size 2n − 1, restrict the results onto the index set Iℓ , and form
the n × rRG side-matrices U (ℓ) = [U1(ℓ) ⋅ ⋅ ⋅ UR(ℓ) ], composed of the blocks Um (ℓ)
with
G

columns uk,m 1 m r m
ℓ as Um = [fℓ ∗ gℓ ⋅ ⋅ ⋅ fℓ ∗ gℓ ], all at the cost O(drRG n log n).
(ℓ)

(3) Build the core tensor ω = blockdiag{γ1 β, . . . , γR β} and represent the resultant two-
level Tucker tensor in the form (storage demand is RG + R1 + drR1 + drRG n),

U = ω ×1 U (1) ×2 ⋅ ⋅ ⋅ ×d U (d) ∈ 𝒯 𝒞R R .
1 G ,rRG

In some cases, one may require the consequent rank reduction for the target ten-
sor U to the two-level format 𝒯 𝒞R ,r with moderate rank parameters R0 and r0 =
0 0
(r0 , . . . , r0 ) [166].
92 | 5 Multidimensional tensor-product convolution

If both convolving tensors are given in the canonical format, F ∈ 𝒞 RF with coeffi-
cients βk , k = 1, . . . , RF and G ∈ 𝒞 RG , with coefficients γm , m = 1, . . . , RG , then

RF RG
F ∗ G = ∑ ∑ βk γm (fk1 ∗ gm k m
1 ) ⊗ ⋅ ⋅ ⋅ ⊗ (fd ∗ gd ), (5.11)
k=1 m=1

leading to the reduced cost that scales linearly in dimensionality parameter d and
linear-logarithmically in n,

𝒲C∗C→C = O(dRF RG n log n).

Algorithm 5.2 (Multidimensional tensor product convolution of type “C ∗ C → C”).


(1) Given F ∈ 𝒞 RF ,n , G ∈ 𝒞 RG ,n .
(2) For ℓ = 1, . . . , d, compute the set of 1D convolutions fkℓ ∗ gm ℓ (k = 1, . . . , RF ,
m = 1, . . . , RG ) of size 2n − 1, restrict the results onto the index set Iℓ , and form
the n × RF RG side-matrix U (ℓ) (cost dRF RG n log n).
(3) Compute the set of scaling factors βk γm .

Complexity bound O(dRF RG n log n) is proven in [166].


The resulting convolution product F ∗ G in (5.11) may be approximated in either
Tucker or canonical formats, depending on further multi-linear operations applied to
this tensor. In the framework of approximate iterations with structured matrices and
vectors, we can fix the 𝒞 R0 -format for the output tensors. Hence, the rank-R0 canonical
approximation (with R0 < RF RG ) would be the proper choice to represent F ∗ G. The
tensor truncation of the rank-(RF RG ) auxiliary result to rank-R0 tensor can be accom-
plished by fast multigrid C2T plus T2C tensor approximation, and then the result can
be stored by O(dR0 n) reals.
Based on our experience with Algorithms 5.1 and 5.2, applied in electronic struc-
ture calculations in 3D, we notice that Algorithm 5.2 is preferable in the case of moder-
ate grid-size (say, n ≤ 104 ), while Algorithm 5.1 is faster for larger grids. For example,
both algorithms work perfectly in electronic structure calculations in the framework
of the Hartree–Fock model for d = 3 [174, 186]. A case in point is that the Hartree po-
tential of moderate size molecules can be calculated on the n × n × n 3D Cartesian grids
with n ≤ 1.6 ⋅ 104 in a few minutes providing the relative accuracy about 10−7 already
with n = 8192. Further numerical illustrations will be given in Chapter 8.

5.3 Low-rank approximation of convolving tensors


In applications related to electronic structure calculations, the function-related collo-
cation coefficient tensor F = [fi ]i∈ℐ can be generated by the electron density ρ(x), by
5.3 Low-rank approximation of convolving tensors | 93

the product of the interaction potential V(x) with the electron orbitals, V(x)ψ(x), or
by some related terms. In this way, we make an a priori assumption on the existence
of low-rank approximation to the corresponding tensors. In general, this assumption
is not easy to justify. However, it works well in practice.

Example 5.1. In the case of Hydrogen atom, we have

e−‖x‖ 1
ρ(x) = e−2‖x‖ and V(x)ψ(x) = with V(x) = , x ∈ ℝ3 ,
‖x‖ ‖x‖

hence, the existence of corresponding low-rank tensor approximations can be proven


along the lines of [161, Lemma 4.3] and [163, Theorem 3].

To construct a low-rank approximation of the convolving tensor G, we consider


a class of multivariate spherically symmetric (radial) convolving kernels g : ℝd → ℝ
parameterized by

g = g(ρ(y)) with ρ ≡ ρ(y) = y12 + ⋅ ⋅ ⋅ + yd2 ,

where the univariate function g : ℝ+ → ℝ can be represented via the generalized


Laplace transform
2
g(ρ) = ∫ ĝ (τ2 )e−ρτ dτ. (5.12)
ℝ+

Without loss of generality, we introduce one and the same scaling function

ϕi (⋅) = ϕ(⋅ + (i − 1)h), i ∈ In ,

for all spatial dimensions ℓ = 1, . . . , d, where h > 0 is the mesh parameter, so that the
corresponding tensor-product basis function ϕi is defined by (5.3).
Using sinc-quadrature methods, [271], we approximate the collocation coefficient
tensor G = [gi ]i∈ℐ in (5.5) via the rank-(2M + 1) canonical decomposition

M
g ≈ ∑ wk E(τk ) with E(τk ) = [ei (τk )], i ∈ ℐ , (5.13)
k=−M

with suitably chosen coefficients wk ∈ ℝ and quadrature points τk ∈ ℝ+ , where the


rank-1 tensor E(τk ) ∈ ℝℐ is given entrywise by

d 2 2
ei (τk ) = ĝ (τk2 ) ∏ ∫ e−yℓ τk ϕiℓ (yℓ )dyℓ . (5.14)
ℓ=1 ℝ

For a class of analytic functions the exponentially fast convergence in M of the above
quadrature can be proven (see [111, 163]). Notice that the quadrature points τk can be
94 | 5 Multidimensional tensor-product convolution

chosen symmetrically, i. e., τk = τ−k , hence reducing the number of terms in (5.13) to
r = M + 1.
In the particular applications in electronic structure calculations, we are inter-
ested in fast convolution with the Newton or Yukawa kernels. In the case of the New-
ton kernel, g(x) = 1/‖x‖, the approximation theory can be found in [111]. In the case
of the Yukawa potential e−κ‖x‖ /‖x‖ for κ ∈ [0, ∞), we apply the generalized Laplace
transform (cf. (5.12))

e−κ√ρ 2
g(ρ) = = ∫ exp(−ρτ2 − κ2 /4τ2 )dτ, (5.15)
√ ρ √π
ℝ+

corresponding to the choice


2 −κ2 /4τ2
ĝ (τ2 ) = e .
√π

Approximation theory in the case of Yukawa potential is presented in [163].


In our numerical experiments, the collocation coefficient tensor G ∈ ℝℐ for the
Newton kernel is approximated in the rank-R canonical format with R ∈ [20, 40] pro-
viding high accuracies about 10−6 –10−8 for the grid sizes up to n3 = 131 0723 .

5.4 Algebraic recompression of the sinc approximation


In the case of large computational grids, the tensor rank of the (problem independent)
convolving kernel g can be reduced by an algebraic recompression procedure [166].
For ease of presentation let us consider the case d = 3. The idea of our recompres-
sion algorithm is based on the observation that a typical feature of the analytic tensor
approximation by the sinc quadratures as in (5.13)–(5.14) (for symmetric quadrature
points it is agglomerated to the sequence with k = 0, 1, . . . , M) is the presence of many
terms all supported only by a few grid points belonging to the small p × p × p sub-grid
in domain Ω(p) that is a vicinity of the point-type singularity (say, at x = 0). Assume
that this group of rank-1 tensors is numbered by k = 0, . . . , K < M. The sum of these
tensors, further denoted as Ap , effectively belongs to the low-dimensional space of
tri-linear p × p × p-tensors. Hence, the maximal tensor rank of Ap does not exceed
r = p2 ≤ K. Furthermore, we can perform the rank-R0 canonical approximation of this
small tensor with R0 < K using the ALS or gradient type optimization. The follow-
ing Algorithm sketches the main steps of the rank recompression scheme described
above.

Algorithm 5.3 (Rank recompression for the canonical sinc-based approximation).


(1) Given the canonical tensor A with rank R = M + 1.
(2) Agglomerate all rank-1 terms supported by the only one point, say by Ω(1) , into
one rank-1 tensor, further called A1 .
5.5 Numerical verification on quantum chemistry data | 95

(3) Agglomerate by a summation all terms supported by Ω(2) \ Ω(1) in one tensor A2
(with maximal rank 3), approximate with the tensor rank r2 ≤ 3, and so on until
we end up with tensor Ap supported by Ω(p) \ Ω(p−1) \ ⋅ ⋅ ⋅ \ Ω(1) .
(4) Approximate the canonical sum A1 + ⋅ ⋅ ⋅ + Ap by a low-rank tensor.

Notice that in the sinc-quadrature approximations most of these “local” terms are
supported by only one point, say by Ω(1) , hence they are all agglomerated in the rank-1
tensor. In approximation of the classical potentials like 1/‖x‖ or e−‖x‖ /‖x‖ the usual
choice is p = 1, 2.
The simple rank recompression procedure described above allows to reduce no-
ticeably the initial rank R = M + 1 appearing in the (symmetric) sinc quadratures. Nu-
merical examples on the corresponding rank reduction by Algorithm 5.3 are depicted
in [163], Figure 2.

Figure 5.1: Tensor rank of the sinc- and recompressed sinc-approximation for 1/‖x‖ (left). Conver-
gence history for the O(h2 ) and O(h3 ) Richardson extrapolated convolution schemes (right).

Figure 5.1 (left) presents the rank parameters obtained from the sinc approximations
of g(x) = 1/‖x‖ up to threshold ε = 0.5 ⋅ 10−6 in max-norm, computed on n × n × n grids
with n = 2L+3 for the level number L = 1, . . . , 8 (upper curve), and the corresponding
values obtained by Algorithm 5.3 with p = 1 (lower curve). One observes the significant
reduction of the tensor rank.

5.5 Numerical verification on quantum chemistry data


We test the approximation error of the tensor-product collocation convolution scheme
on practically interesting data arising in electronic structure calculations using the
Hartree–Fock equation (see [174] for more detail). We consider the pseudo electron
96 | 5 Multidimensional tensor-product convolution

density of the CH4 -molecule represented by the exponential sum

R0 2
M
βk −λk (x−xk )2
f (x) := ∑ ( ∑ cν,k (x − xk ) e ), x ∈ ℝ3 , R0 = 50, M = 4, (5.16)
ν=1 k=1

with xk corresponding to the locations of the C and H atoms. We extract the “principal
exponential” approximation of the electron density, f0 , obtained by setting βk = 0
(k = 1, . . . , R0 ) in (5.16). Using the fast tensor-product convolution method, the Hartree
potential of f0 ,

f0 (y)
VH (x) = ∫ dy, x ∈ Ω = [−A, A]3 ,
‖x − y‖
Ω

is computed with high accuracy on a sequence of uniform n×n×n grids with n = 2p +1,
p = 5, 6, . . . , 12, and A = 9.6. The initial rank of the input tensor F = [f0 (yi )]i∈ℐ , pre-
R (R +1)
sented in the canonical format, is bounded by R ≤ 0 20 (even for simple molecules
it normally amounts about several thousands). The collocation coefficients tensor G
in (5.5) for the Newton kernel is approximated by the sinc-method with the algebraic
rank-recompression described in Algorithm 5.3.
Note that the Hartree potential has slow polynomial decay, i. e.,

1
VH (x) = O( ) as ‖x‖ → ∞.
‖x‖

However, the molecular orbitals decay exponentially. Hence, the accurate tensor ap-
proximation is computed in some smaller box Ω󸀠 = [−B, B]3 ⊂ Ω, B < A.
In this numerical example the resultant convolution product with the Newton con-
volving kernel can be calculated exactly by using the analytic representation for each
individual Gaussian,

1 α 1
−3/2
2
(e−α‖⋅‖ ∗ )(x) = ( ) erf(√α‖x‖),
‖⋅‖ π ‖x‖

where the erf-function is defined by


t
2
erf(t) := ∫ exp(−τ)dτ, t ≥ 0.
√π
0

The Hartree potential VH = f0 ∗ 1/‖ ⋅ ‖ attains its maximum value at the origin x = 0
that is VH (0) = 7.19. Figure 5.1 (right) demonstrates the accuracy O(h2 ) of our tensor
approximation and O(h3 ) of the corresponding improved values, due to the Richard-
son extrapolation. Here, the grid-size is given by n = nℓ = 2ℓ+4 for the level number
ℓ = 1, . . . , 7, with the finest grid-size n7 = 2048. It can be seen that beginning from the
level number ℓ = 5 (n5 = 512) the extrapolated scheme already achieves the saturation
5.5 Numerical verification on quantum chemistry data | 97

error 10−6 of the tensor approximation related to the chosen Tucker rank r = 22. This
example demonstrates high accuracy of the Richardson extrapolation.
The numerical results on tensor product approximation of the convolution oper-
ators in the Hartree–Fock equation compared with the commonly used MOLPRO cal-
culations will be presented in the forthcoming Chapter 11.
6 Tensor decomposition for analytic potentials
Methods of separable approximation of the 3D Newton kernel (electrostatic potential
of the Hydrogen atom) using Gaussian sums have been addressed in the chemical and
mathematical literature since [38] and [39, 40]. However, these methods were based on
non-explicit heuristic approaches, not explaining how to derive such Gaussian sums
in an optimal way and with controllable accuracy. A constructive tensor-product ap-
proximation to the multivariate Newton kernel was first proposed in [96, 111] based
on the sinc approximation [271], and then efficiently implemented and analyzed for a
three-dimensional case in [30]. This tensor decomposition has been already success-
fully applied to assembled tensor-based summation of electrostatic potentials on 3D
rectangular lattices invented in [148, 153], and it was one of the basic tools in the con-
struction of the range-separated tensor format introduced in [24].
An alternative method for computation of the convolution transform with the
Newton kernel is based on the direct solution of the Poisson equation. The data-
sparse elliptic operator inverse based on explicit approximation to the Green function
is presented in [159].

6.1 Grid-based canonical/Tucker representation of the Newton


kernel
We discuss the grid-based method for the low-rank canonical and Tucker tensor rep-
resentations of a spherically symmetric kernel function p(‖x‖), x ∈ ℝ3 (for example,
1
for the 3D Newton kernel, we have p(‖x‖) = ‖x‖ , x ∈ ℝ3 ) by its projection onto the set
of piecewise constant basis functions; see [30] for more detail.
In the computational domain Ω = [−b/2, b/2]3 , let us introduce the uniform n ×
n × n rectangular Cartesian grid Ωn with the mesh size h = b/n. Let {ψi } be a set of
tensor-product piecewise constant basis functions ψi (x) = ∏3ℓ=1 ψ(ℓ) iℓ
(xℓ ) for the 3-tuple
index i = (i1 , i2 , i3 ), iℓ ∈ {1, . . . , n}, ℓ = 1, 2, 3. The kernel p(‖x‖) can be discretized by its
projection onto the basis set {ψi } in the form of a third-order tensor of size n × n × n
defined, pointwise, as

P := [pi ] ∈ ℝn×n×n , pi = ∫ ψi (x)p(‖x‖)dx. (6.1)


ℝ3

The low-rank canonical decomposition of the 3rd-order tensor P can be based on


using exponentially convergent sinc-quadratures for approximation of the Laplace–
Gauss transform to the analytic function p(z) specified by a certain coefficient a(t) > 0,

2 2
M 2 2
z
p(z) = ∫ a(t)e−t dt ≈ ∑ ak e−tk z for |z| > 0, (6.2)
ℝ+ k=−M

https://doi.org/10.1515/9783110365832-006
100 | 6 Tensor decomposition for analytic potentials

where the quadrature points and weights are given by

tk = khM , ak = a(tk )hM , hM = C0 log(M)/M, C0 > 0. (6.3)

Under the assumption 0 < a ≤ ‖z‖ < ∞, this quadrature can be proven to provide
the exponential convergence rate in M for a class of analytic functions p(z); see [271,
111, 163, 166]. For example, in the particular case p(z) = 1/z, which can be adapted
to the Newton kernel by substitution z = √x12 + x22 + x32 , we apply the Laplace–Gauss
transform
1 2 2 2
= ∫ e−t z dt.
z √π
ℝ+

Now for any fixed x = (x1 , x2 , x3 ) ∈ ℝ3 such that ‖x‖ > 0, we apply the sinc-quad-
rature approximation to obtain the separable expansion

2
M M 3
‖x‖2 2 2 2 2
p(‖x‖) = ∫ a(t)e−t dt ≈ ∑ ak e−tk ‖x‖ = ∑ ak ∏ e−tk xℓ , ak = a(tk ). (6.4)
ℝ+ k=−M k=−M ℓ=1

Under the assumption 0 < a ≤ ‖x‖ ≤ A < ∞, this approximation can be proven to
provide the exponential convergence rate in M:
󵄨󵄨 M 󵄨
󵄨󵄨p(‖x‖) − ∑ ak e−tk ‖x‖ 󵄨󵄨󵄨 ≤ C e−β√M
󵄨󵄨 2 2 󵄨󵄨

󵄨󵄨 󵄨󵄨 a with some C, β > 0. (6.5)


󵄨󵄨 k=−M 󵄨󵄨

Combining (6.1) and (6.4) and taking into account the separability of the Gaussian
functions, we arrive at the separable approximation for each entry of the tensor P,
M 2 2
M 3 2 2
−tk xℓ
pi ≈ ∑ ak ∫ ψi (x)e−tk ‖x‖ dx = ∑ ak ∏ ∫ ψ(ℓ)
i (xℓ )e dxℓ .

k=−M k=−M ℓ=1 ℝ
ℝ3

Define the vector (recall that ak > 0) p(ℓ)


k
= a1/3
k
b(ℓ) (tk ) ∈ ℝnℓ , where

n
nℓ −tk xℓ 2 2
b(ℓ) (tk ) = [b(ℓ) ℓ
i (tk )]i =1 ∈ ℝ with b(ℓ)
i (tk ) = ∫ ψi (xℓ )e
(ℓ)
dxℓ .
ℓ ℓ ℓ ℓ

Then the 3rd-order tensor P can be approximated by the R-term canonical representa-
tion
M 3 R
n×n×n
P ≈ PR = ∑ ak ⨂ b(ℓ) (tk ) = ∑ p(1)
q ⊗ pq ⊗ pq ∈ ℝ
(2) (3)
, (6.6)
k=−M ℓ=1 q=1

where R = 2M + 1. For the given threshold ε > 0, M is chosen as the minimal number
such that, in the max-norm,

‖P − PR ‖ ≤ ε‖P‖.
6.1 Grid-based canonical/Tucker representation of the Newton kernel | 101

R r
q }q=1 (left) and Tucker {tk }k=1 (right) tensor rep-
Figure 6.1: Examples of vectors of the canonical {p(1) (1) 1

resentations for the single Newton kernel displayed along x-axis.

n
The canonical skeleton vectors are renumbered by k → q = k + M + 1, p(ℓ) q ← pk ∈ ℝ ,
(ℓ)

ℓ = 1, 2, 3. The canonical tensor PR in (6.6) approximates the discretized 3D symmetric


kernel function p(‖x‖) (x ∈ Ω) centered at the origin, giving rise to p(1) q = pq = pq
(2) (3)

(q = 1, . . . , R).
In the following, we also consider the Tucker approximation to the 3rd-order ten-
sor P. Given rank parameters r = (r1 , r2 , r3 ), the rank-r Tucker tensor approximating P
is defined by the following parameterization: Tr = [ti1 i2 i3 ] ∈ ℝn×n×n (iℓ ∈ {1, . . . , n}),

r
Tr := ∑ bk t(1)
k
⊗ t(2)
k
⊗ t(3)
k
≡ B ×1 T (1) ×2 T (2) ×3 T (3) , (6.7)
1 2 3
k=1

n×rℓ
where the orthogonal side-matrices T (ℓ) = [t(ℓ) 1 ⋅ ⋅ ⋅ trℓ ] ∈ ℝ
(ℓ)
, ℓ = 1, 2, 3, define the
r1 ×r2 ×r3
set of Tucker vectors, and B ∈ ℝ is the Tucker core tensor. Choose the truncation
error ε > 0 for the canonical approximation PR obtained by the quadrature method,
then compute the best orthogonal Tucker approximation of P with tolerance O(ε) by
applying the canonical-to-Tucker algorithm [174] to the canonical tensor PR 󳨃→ Tr .
The latter algorithm is based on the rank optimization via ALS iteration. The rank pa-
rameter r of the resultant Tucker approximand Tr is minimized subject to the ε-error
control,

‖PR − Tr ‖ ≤ ε‖PR ‖.

Remark 6.1. Since the maximal Tucker rank does not exceed the canonical one, we
apply the approximation results for canonical tensor to derive the exponential con-
vergence in the Tucker rank for a wide class of functions p. This implies the relation
max{rℓ } = O(| log ε|2 ), which can be observed in all numerical tests implemented so
far.
102 | 6 Tensor decomposition for analytic potentials

Table 6.1: CPU times (Matlab) to compute with tolerance ε = 10−6 canonical and Tucker vectors of PR
for the single Newton kernel in a box.

grid size n3 46083 92163 18 4323 36 8643 73 7683


mesh size h (Å) 0.0019 0.001 4.9 ⋅ 10−4 2.8 ⋅ 10−4 1.2 ⋅ 10−4
Time (Canon.) 2. 2.7 8.1 38 164
Canonical rank R 34 37 39 41 43
Time (C2T) 17 38 85 200 435
Tucker rank 12 11 10 8 6

Figure 6.1 displays several skeleton vectors of the canonical and Tucker tensor repre-
R
sentations for a single Newton kernel along the x-axis from a set {p(1)
q }q=1 . Symmetry
of the tensor PR implies that the canonical vectors p(2)
q and pq corresponding to y
(3)

and z-axes, respectively, are of the same shape as p(1) q . It is clearly seen that there are
canonical/Tucker vectors representing the long-, intermediate- and short-range con-
tributions to the total electrostatic potential. This interesting feature will be also rec-
ognized for the low-rank lattice sum of potentials (see Section 14.2).
Table 6.1 presents CPU times (sec) for generating a canonical rank-R tensor ap-
proximation of the single Newton kernel over n×n×n 3D Cartesian grid corresponding
to Matlab implementation on a terminal of the 8 AMD Opteron Dual-Core processor.
The corresponding mesh sizes are given in Angstroms. We observe the logarithmic
scaling of the canonical rank R in the grid size n, whereas the maximal Tucker rank
has the tendency to decrease for larger n. The compression rate related to the grid
73 7683 , which is the ratio n3 /(nR) for the canonical format and n3 /(r 3 + rn) for the
Tucker format, is of orders 108 and 107 , respectively.
Notice that the low-rank canonical/Tucker approximation of the tensor P is the
problem independent task, hence the respective canonical/Tucker vectors can be pre-
computed at once on large enough 3D n × n × n grid, and then stored for the multiple
use. The storage size is bounded by Rn or rn + r 3 in the case of canonical and Tucker
formats, respectively.

6.2 Low-rank representation for the general class of kernels


1
Along with Coulombic systems corresponding to p(‖x‖) = ‖x‖ , the tensor approxima-
tion described above can be also applied to a wide class of commonly used long-range
potentials p(‖x‖) in ℝ3 , for example, to the Slater, Yukawa, Lennard-Jones or Van der
Waals, and dipole–dipole interactions potentials defined as follows:

Slater function: p(‖x‖) = exp(−λ‖x‖), λ > 0;


exp(−λ‖x‖)
Yukawa kernel: p(‖x‖) = , λ > 0;
‖x‖
6.2 Low-rank representation for the general class of kernels | 103

12 6
σ σ
Lennard-Jones potential: p(‖x‖) = 4ϵ[( ) −( ) ].
‖x‖ ‖x‖

The simplified version of the Lennard-Jones potential is the so-called Buckingham


function:
6
σ
Buckingham potential: p(‖x‖) = 4ϵ[e‖x‖/r0 − ( ) ].
‖x‖

The electrostatic potential energy for the dipole–dipole interaction due to Van der
Waals forces is defined by

C0
Dipole–dipole interaction energy: p(‖x‖) = .
‖x‖3

The existence of quasi-optimal low-rank decompositions based on the sinc-quadrature


approximation to the Laplace transform of the above-mentioned functions can be rig-
orously proven for a wide class of generating kernels. In particular, the following
Laplace (or Laplace–Gauss) integral transforms [309] with parameter ρ > 0 can be
combined with the sinc-quadrature approximation to obtain the low-rank represen-
tation to the corresponding function generated tensor:

√κ
e−2√κρ = ∫ t −3/2 e−κ/t e−ρt dt, (6.8)
√π
ℝ+

e−κ√ρ 2 2 2 2
= ∫ e−κ /t e−ρt dt, (6.9)
√ρ √π
ℝ+
1 2 2
= ∫ e−ρt dt, (6.10)
√ρ √π
ℝ+
1 1
= ∫ t n−1 e−ρt dt, n = 1, 2, . . . . (6.11)
ρn (n − 1)!
ℝ+

This approach is combined with the subsequent substitution of a parameter ρ by the


appropriate function ρ(x) = ρ(x1 , x2 , x3 ), usually by using an additive representation
ρ(x) = c1 x12 + c2 x22 + c3 x32 . In cases (6.11) (n = 1) and (6.10), the convergence rate for the
sinc-quadrature approximations of type (6.3) has been estimated in [39, 40] and later
analyzed in more detail in [95, 111]. The case of the Yukawa and Slater kernel has been
investigated in [161, 163]. The exponentially fast error decay for the general transform
(6.11) can be derived by minor modification of the above-mentioned results.

Remark 6.2. The idea behind the low-rank tensor representation for a sum of spheri-
cally symmetric potentials on a 3D lattice can be already recognized on the continuous
level by introducing the Laplace transform of the generating kernel. For example, in
representation (6.9) with the particular choice κ = 0, given by (6.10), we can set up
104 | 6 Tensor decomposition for analytic potentials

ρ = x12 + x22 + x32 , i. e., p(‖x‖) = 1/‖x‖ (1 ≤ xℓ < ∞), and apply the sinc-quadrature
approximation as in (6.2)–(6.3),

M
2 2 2 2 2
p(z) = ∫ e−t z dt ≈ ∑ ak e−tk z for |z| > 0. (6.12)
√π k=−M
ℝ+

Now the simple sum

L
1
ΣL (x) = ∑
i1 ,i2 ,i3 =1 √(x1 + i1 b)2 + (x2 + i2 b)2 + (x3 + i3 b)2

on a rectangular L×L×L lattice of width b > 0 can be represented by the agglomerated


integral transform

L
2 2 2 2 2
ΣL (x) = ∫ [ ∑ e−[(x1 +i1 b) +(x2 +i2 b) +(x3 +i3 b) ]t ]dt
√π i ,i ,i =1
ℝ+ 1 2 3

L L L
2 2 2 2
= ∫ ∑ e−(x1 +i1 b) t ∑ e−(x2 +i2 b) t ∑ e−(x3 +i3 b) t dt, (6.13)
√π i =1 i =1 i =1
ℝ+ 1 2 3

where the integrand is separable. Representation (6.13) indicates that applying the
same quadrature approximation to the lattice sum integral (6.13) as that for the single
kernel (6.12) leads to the decomposition of the total sum of potentials with the same
canonical rank as for the single one.

In the following, we construct the low-rank canonical and Tucker decompositions


to the lattice sum of long range interaction potentials discretized on the fine 3D-grid
and applied to the general class of kernel functions and to more general configuration
of a lattice, including the case of lattices with vacancies.
7 The Hartree–Fock equation
The Hartree–Fock (HF) equation governed by the 3D integral-differential operator is
the basic model in ab initio calculations of the ground state energy and electronic
structure of molecular systems [123, 277, 128]. It is a strongly nonlinear eigenvalue
problem for which one should find the solution when the part of the governing opera-
tor depends on the eigenfunctions. This dependence is expressed by the convolution
of the electron density, which is a function of the solution (molecular orbitals) with the
Newton kernel in ℝ3 . Multiple strong singularities, due to nuclear cusps in the electron
density of a molecule, impose strong requirements on the accuracy of Hartree–Fock
calculations. Finally, the eigenvalues and the ground state energy should be computed
with high accuracy to be suitable for more precise post-Hartree–Fock computations.

7.1 Electronic Schrödinger equation


The Hartree–Fock equation provides the model reduction to the electronic Schrödin-
ger equation

ℋe Ψ = EΨ, (7.1)

with the Hamiltonian


N N M N
1 ZA 1
ℋ e = − ∑ Δi + ∑ ∑ + ∑ , aA , xi , xj ∈ ℝ3 , (7.2)
2 i=1 i=1 A=1
x i − aA |xi − xj |
i, j = 1
i ≠ j

which describes the energy of an N-electron molecular system in the framework of the
so-called Born–Oppenheimer approximation, implying a system with clapped nuclei.
Here, M is the number of nuclei, ZA are nuclei charges located at the distinct points aA ,
A = 1, . . . , M. Since the nuclei are much heavier than electrons, and their motion is
much slower, the nuclei and electronic parts of the energy can be considered sepa-
rately. Thus, the electronic Schrödinger equation specifies the energy of a molecular
system at a fixed nuclei geometry. The Hamiltonian (7.2) includes the kinetic energy of
electrons, the potential energy of the interaction between nuclei and electrons, and
the electron correlation energy. The electronic Schrödinger equation is a multidimen-
sional problem in ℝ3N , and it is computationally unfeasible except for the simple Hy-
drogen or Hydrogen-like atoms.
The Hartree–Fock equation is a 3D eigenvalue problem in space variables ob-
tained as a result of the minimization of the energy functional for the electronic
Schrödinger equation [277, 128]. The underlying condition for the wavefunction is
that it should be a single Slater determinant containing the products of molecular

https://doi.org/10.1515/9783110365832-007
106 | 7 The Hartree–Fock equation

orbitals. For fermions the wavefunction Ψ should be antisymmetric, therefore, it is


parameterized using a Slater determinant representation,

󵄨󵄨 φ (x ) φ2 (x1 ) ... φN (x1 ) 󵄨󵄨󵄨󵄨


󵄨󵄨 1 1
󵄨 󵄨
1 󵄨󵄨󵄨󵄨 φ1 (x2 ) φ2 (x2 ) ... φN (x2 ) 󵄨󵄨󵄨󵄨
Ψ(x1 , . . . , xN ) = 󵄨 󵄨,
N! 󵄨󵄨󵄨󵄨 ⋅ ⋅ ⋅ ⋅⋅⋅ ⋅⋅⋅ ⋅ ⋅ ⋅ 󵄨󵄨󵄨󵄨
󵄨󵄨 󵄨
󵄨󵄨φ1 (xN ) φ2 (xN ) ... φN (xN )󵄨󵄨󵄨

where φi (xj ) are the one-electron wavefunctions, i, j = 1, . . . N. We refer to the literature


on electronic structure calculations for the derivation of the Hartree–Fock equation
[277, 128].
The Hartree–Fock equations are orbital equations obtained within a mean-field
approximation to the many-electron problem [128]. They are derived from application
of the variational principle to the expectation value of the many-electron Hamiltonian
over a configuration state function (CSF) characterizing the desired state of the many-
electron system under study. In simple cases, like the ground state of a closed-shell
system (N even) to which we restrict ourselves here, this CSF reduces to a single Slater
determinant built up from the orbitals.

7.2 The Hartree–Fock eigenvalue problem


Here, we consider the Hartree–Fock problem for the closed shell systems, where the
number of molecular orbitals equals the number of electron pairs, Norb = N/2. The
Hartree–Fock equation is a nonlinear eigenvalue problem,

ℱ φi (x) = λi φi (x), x ∈ ℝ3 , (7.3)

with respect to the (orthogonal) molecular orbitals φi (x),

∫ φi φj = δij , i = 1, . . . , Norb , x ∈ ℝ3 ,
ℝ3

and the Fock operator is given by

ℱ = Hc + VH − 𝒦. (7.4)

The core Hamiltonian part Hc of the Fock operator consists of the kinetic energy of
electrons specified by the Laplace operator and the nuclear potential energy of inter-
action of electrons and nuclei,

M
1 ZA
Hc (x) = − Δ − ∑ , ZA > 0, x, aA ∈ ℝ3 , (7.5)
2 A=1
‖x − aA ‖
7.3 The standard Galerkin scheme for the Hartree–Fock equation | 107

where M is the number of nuclei in a molecule, and ZA and aA are their charges and
positions, respectively. Here,

M
ZA
Vc (x) = − ∑
A=1
‖x − aA ‖

is the nuclear potential operator. The electron correlation parts of the Fock operator
are described by the Hartree potential

ρ(y)
VH (x) := ∫ dy (7.6)
‖x − y‖
ℝ3

with the electron density

Norb
2
ρ(y) = 2 ∑ (φi (y)) , x, y ∈ ℝ3 , (7.7)
i=1

and the exchange operator

Norb
τ(x, y)
(𝒦φ)(x) := ∫ φ(y)dy, τ(x, y) = ∑ φi (x)φi (y), x ∈ ℝ3 , (7.8)
‖x − y‖ i=1
ℝ3

where τ(x, y) is the density matrix. Since both operators VH and 𝒦 depend on the so-
lution of the eigenvalue problem (7.3), the nonlinear Hartree–Fock equation is solved
iteratively by using self-consistent field (SCF) iteration [238, 44].
The Hartree–Fock model is often called a mean-field approximation, since the en-
ergy of electrons in a molecule is computed with respect to the mean field created by
all electrons in a molecular system, including the target electrons.

7.3 The standard Galerkin scheme for the Hartree–Fock equation


The standard Galerkin approach to the numerical solution of the Hartree–Fock prob-
lem [277, 128] is based on the expansion of the molecular orbitals in a separable
Gaussian-type basis {gμ }1≤μ≤Nb ,

Nb
φi (x) = ∑ ciμ gμ (x), i = 1, . . . , Norb , x ∈ ℝ3 , (7.9)
μ=1

which yields the system of nonlinear equations for the coefficients matrix C = {ciμ } ∈
ℝNorb ×Nb (and the density matrix D = 2CC ∗ ∈ ℝNb ×Nb ),

F(C)C = SCΛ, Λ = diag(λ1 , . . . , λNb ), C T SC = INb , (7.10)


108 | 7 The Hartree–Fock equation

where S = {sμν } is the overlap matrix for the chosen Galerkin basis, where sμν =
∫ℝ3 gμ gν dx. The Galerkin counterpart of the Fock operator

F(C) = H + J(C) + K(C) (7.11)

includes the core Hamiltonian H discretizing the Laplacian and the nuclear potential
operators (7.5), and the matrices J(C) and K(C) corresponding to the Galerkin projec-
tions of the operators VH and 𝒦, respectively.
In this way, one can precompute the one-electron integrals in the core Hamilto-
Nb
nian H = {hμν }μ,ν=1 ,

1
hμν = ∫ ∇gμ ⋅ ∇gν dx + ∫ Vc (x)gμ gν dx 1 ≤ μ, ν ≤ Nb , (7.12)
2
ℝ3 ℝ3

and the so-called two-electron integrals (TEI) tensor, also known as electron repulsion
integrals,

gμ (x)gν (x)gκ (y)gλ (y)


bμνκλ = ∫ ∫ dxdy, 1 ≤ μ, ν ≤ Nb , x, y ∈ ℝ3 , (7.13)
‖x − y‖
ℝ3 ℝ3

since they depend only on the choice of the basis functions in (7.9).
Then, the solution is sought by the self-consistent fields (SCF) iteration using the
core Hamiltonian H as the initial guess, and by updating the Coulomb

Nb
J(C)μν = ∑ bμν,κλ Dκλ , (7.14)
κ,λ=1

and the exchange Galerkin matrices


N
1 b
K(C)μν = − ∑ b D , (7.15)
2 κ,λ=1 μλ,νκ κλ

at every iteration step. The direct inversion of iterative subspaces (DIIS) method, in-
troduced in 1982 by Pulay [238], provides stable convergence of iteration. The DIIS
method is based on defining the weights of the previous solutions to be used as the
initial guess for the current step of iteration.
Finally, the Hartree–Fock energy (or electronic energy, [277]) is computed as

Norb Norb
EHF = 2 ∑ λi − ∑ (̃Ji − K
̃i ),
i=1 i=1

where

̃Ji = (φi , VH φi ) 2 = ⟨Ci , JCi ⟩


L
7.4 Rank-structured grid-based approximation of the Hartree–Fock problem | 109

and

K
̃i = (φi , Kφi ) 2 = ⟨Ci , KCi ⟩,
L i = 1, . . . , Norb ,

are the Coulomb and exchange integrals in the basis of Hartree–Fock orbitals φi .
Given the geometry of nuclei, the resulting ground state energy E0 of the molecule
is defined by

E0 = EHF + Enuc , (7.16)

where the so-called nuclear shift


M M
Zk Zm
Enuc = ∑ ∑ (7.17)
k=1 m<k
‖ak − am ‖

describes the repulsion energy of nuclei in a molecule.


Commonly used numerical methods for solving the Hartree–Fock equation are
based on the analytical computation of the two-electron integrals (7.13) in the prob-
lem adapted naturally separable Gaussian-type bases [3] by using erf -function expan-
sions. This rigorous approach is a well-established standard for the ab initio Hartree–
Fock calculations, and there is a number of efficient programm packages which re-
quired years of development by large scientific groups.
The success of the analytical integration methods stems from the big amount of
precomputed information based on the physical insight, including the construction of
problem adapted atomic orbitals basis sets and elaborate nonlinear optimization for
calculation of density fitting basis. The known limitations of this approach appear due
to a strong dependence of the numerical efficiency on the size and quality of the cho-
sen Gaussian basis sets that might be crucial for larger molecular clusters and heavier
atoms.

7.4 Rank-structured grid-based approximation


of the Hartree–Fock problem
The tensor-structured numerical methods, both the name and the concept, first ap-
peared as efficient grid-based rank-structured algorithms for calculation of the mul-
tidimensional convolution operators in the Hartree–Fock equation [174, 145], supple-
mented by the approximation theory on low-rank approximation of the multivariate
functions and operators [166]. These numerical studies were particularly convenient
for testing novel tensor approaches since the results of calculations could be easily
checked by comparison with the standard software packages in computational chem-
istry like MOLPRO [299], which present the accurate results from analytical evaluation
of the same multidimensional operators.
110 | 7 The Hartree–Fock equation

In the next chapters, we describe the two ab initio Hartree–Fock solvers using the
tensor-structured grid-based calculation of all quantities involved, including the rank-
structured calculation of the core Hamiltonian introduced in [156]. We briefly summa-
rize the basic approaches as follows:
– In Section 8, we describe the multilevel Hartree–Fock solver using the nontradi-
tional concept for the numerical solution of the eigenvalue problem, which avoids
the computation of TEI. Instead, it employes the grid-based rank-structured com-
putation of the Coulomb and exchange matrices on the fly in the course of SCF
iterations. This solver was introduced in [174, 145, 146, 187]. Though this approach
eliminates the requirement for the challenging computation of the two-electron
integrals, it exhibits time limitations in the loops for computation of the exchange
operator 𝒦. Therefore, its MATLAB implementation is non-competitive with the
standard packages based on the analytical calculations. However, it may be good
for parallel implementations.
– In Section 11, we present the fast TESC1 Hartree–Fock solver2 introduced in
[157, 147], which is comparable in time and accuracy (in MATLAB implemen-
tation) with the benchmark packages. It is based on efficient rank-structured
calculation of the TEI tensor in a factorized form by using an algebraic “1D den-
sity fitting” scheme and the truncated Cholesky decomposition algorithm. Due
to the rank-structured representation of the two-electron integrals, this tensor-
based solver proved to be attractive as a starting point for computation of the
excitation energies of molecules.

Note that the grid-based approaches are not restricted to Gaussian-type basis func-
tions and may be applied for the construction of new well-separable grid-based basis
functions, for example, combination of Gaussians with plane waves or/and the Slater-
type orbitals.

1 TESC is the abbreviation for the tensor-based electronic structure calculations.


2 It was first called the “black-box” Hartree–Fock solver.
8 Multilevel grid-based tensor-structured HF solver
In this chapter, we describe the nontraditional concept for the numerical solution
of the Hartree–Fock (HF) equation, which avoids computation of the two-electron
integrals. Alternatively, it employs the grid-based rank-structured computation of
the Coulomb and exchange matrices on the fly in the framework of SCF iterations
[174, 145]. This solver was introduced in [146, 187] and called the “multilevel HF
solver”since both the 3D-dimensional integrals in the Hartree potential and the non-
local six-dimensional integral exchange potential are calculated on a sequence of
dyadically refined n × n × n Cartesian grids in ℝ3 .
The important ingredients of grid-based approach to the 3D Hartree–Fock equa-
tion include many classical numerical techniques, in particular, the discretization
of differential and integral operators, error analysis, fast data transforms, algebraic
solvers for linear systems, and eigenvalue problems, which are widely presented in
the literature. We refer to some of many existing monographs addressing these issues,
[273, 246, 104, 29, 282, 98, 281, 275, 53, 184, 160].

8.1 Calculation of the Hartree and exchange operators


The rank-structured grid-based representation of the multivariate functions and op-
erators was first applied in evaluation of the convolution operators in the Hartree–
Fock equation, where both accuracy and timing were essential and calculations had
to be performed on large 3D grids to be competitive with analytically based methods
[174, 145]. The efficient low-rank canonical tensor representation to the Newton ker-
nel was also an essential ingredient [161, 30]. To reduce the initial rank of the electron
density that is quadratically proportional to the number of Gaussian-type basis func-
tions, the canonical-to-Tucker tensor transform was applied [174, 173], which made
computations tractable for large tensor grids and for extended molecules even when
using MATLAB. In particular, it was shown that the rank-structured tensor approach
allows reducing the calculation of the three-dimensional (3D) convolution integrals to
sequences of one-dimensional (1D) convolution transforms via 1D-FFT, 1D Hadamard,
and scalar products.
The idea to replace or assist the analytical computations for the Hartree–Fock
problem by a data-sparse grid-based approach is not new. In particular, the wavelet
multiresolution schemes [122, 305, 83, 80, 267, 35], as well as the sparse grids approach
in [43, 308, 106] have been proposed, though the entirely wavelet-based method is suc-
cessful only for small atomic systems with one or two electrons [105, 7]. The multireso-
lution approach in density functional computations of the electronic structure is con-
sidered in [97, 263]. The grid-based numerical method in Hartree–Fock calculations
for diatomic molecules proposed in [276, 200] is hardly extendable to compact (3D)

https://doi.org/10.1515/9783110365832-008
112 | 8 Multilevel grid-based tensor-structured HF solver

molecules. The domain decomposition approach to problems with “linear geometry”


of molecules is discussed in [12]. Recently it was shown that the QTT tensor approxi-
mation can be applied for Hartree–Fock calculations for small molecules [239, 240].
Rank decomposition of the Schrödinger operator and the corresponding wavefunc-
tions have been first discussed in [31, 32, 33]. The results on the regularity properties
of the wave functions in the Schrödinger equation, which motivate the data-sparse
approaches in electronic structure calculations, are presented in [307, 308, 82].

8.1.1 Agglomerated representation of the Galerkin matrices

For the multilevel Hartree–Fock solver [146, 187], fast and accurate evaluation of the
Galerkin matrices J(D) and K(D) is based on a certain reorganization of the standard
iterative computational scheme for the eigenvalue problem (7.10) given in Section 7.3.
Specifically, instead of precomputing the full set of two-electron integrals bμν,κλ and
computation of matrices in (7.14) and (7.15) employing the updated elements of the
density matrix D, we use the explicit integral representations for J(D) and K(D). In
particular, the Galerkin representation of the Hartree operator (the Coulomb matrix)
is now calculated by the grid-based quadrature integration of

J(D)μν = ∫ gμ (x)VH (x)gν (x)dx, 1 ≤ μ, ν ≤ Nb , (8.1)


ℝ3

including a single convolution transform in ℝ3 to compute the Hartree potential in


(8.1),
1
VH = ρ ∗ ,
‖⋅‖
where the electron density is given by
Norb Nb
ρ(y) = 2 ∑ ( ∑ Cκa Cλa gκ (y)gλ (y)). (8.2)
a=1 κ,λ=1

In turn, as proposed in [145], we represent the matrix entries of the exchange operator
K(D) by the following loops. For a = 1, . . . , Norb , compute the convolution integrals
N
gν (y) ∑κ=1
b
Cκa gκ (y)
Waν (x) = ∫ dy, ν = 1, . . . , Nb , (8.3)
‖x − y‖
ℝ3

and then the scalar products


Nb
Kμν,a = ∫ [ ∑ Cκa gκ (x)]gμ (x)Waν (x)dx, μ, ν = 1, . . . , Nb . (8.4)
κ=1
ℝ3
8.1 Calculation of the Hartree and exchange operators | 113

Finally, the entries of the exchange matrix are given by sums over all orbitals,

Norb
K(C)μν = ∑ Kμν,a , μ, ν = 1, . . . , Nb . (8.5)
a=1

The advantage of above representations is in the minimization of the number of con-


volution products that have to be computed by numerical quadratures. What is even
more important is that we have the possibility of efficient low-rank separable ap-
proximation of the discretized density ρ(x) and of the auxiliary potentials Waν (x) at
step (8.3).
Effective realization of this concept is based on certain separability assumptions
on the Galerkin basis functions gμ . First, we suppose that the initial problem is posed
in a finite volume box Ω = [−b, b]3 ∈ ℝ3 subject to the homogeneous Dirichlet bound-
ary conditions on 𝜕Ω (due to the exponential decay of the orbitals ψi (x) as ‖x‖ → ∞).
For a given discretization parameter n ∈ ℕ, introduce the equidistant tensor grid

ω3,n := ω1 × ω2 × ω3 , ωℓ := {−b + (m − 1)h : m = 1, . . . , n + 1}, ℓ = 1, . . . , 3, (8.6)

with the mesh-size h = 2b/n. Define the set of piecewise constant basis functions {ϕi },
i ∈ ℐ := {1, . . . , n}3 , associated with the respective grid-cells in ω3,n (indicator func-
tions), and the corresponding set {χj }, j ∈ 𝒥 := {1, . . . , n − 1}3 , of tensor-product con-
tinuous piecewise linear polynomials in each spacial variable. We denote the corre-
sponding finite element spaces as

𝒱n = span{ϕi } and 𝒲n = span{χj } ∈ H01 (Ω). (8.7)

Now the basis set {gμ } is supposed to satisfy the following properties:
– The Galerkin approximation error over the reduced basis set {gμ } is physically ad-
missible, presupposing the sufficient approximation quality.
– Each basis function gμ (x) ∈ H01 (Ω) can be represented (approximated) by the
RG -term separable expansion in x = (x1 , x2 , x3 ) with moderate number of terms RG ,

RG
gμ (x) = ∑ gμ,k
(1) (2)
(x1 )gμ,k (3)
(x2 )gμ,k (x3 ), μ = 1, . . . , Nb . (8.8)
k=1

– The property of discrete separability. The functions gμ (x) (and gμ,k


(ℓ)
) allow the ap-
proximate representation in either basis sets {ϕi } and {χj } by the rank-RG coeffi-
cients tensors Gμ = [Gμ,i ] ∈ ℝℐ and Xμ = [Xμ,j ] ∈ ℝ𝒥 , respectively.
– We suppose that the Galerkin integrals for J(D) and K(D), given by (8.1)–(8.5), can
be accurately represented by the well-separable numerical quadratures in the dis-
cretized basis sets {Gμ } and {Xμ }, providing asymptotic convergence as h → 0. The
representation with respect to both gμ (x) and its piecewise linear version provides
satisfactory accuracy, which may require large grid parameter n.
114 | 8 Multilevel grid-based tensor-structured HF solver

8.1.2 On the choice of the Galerkin basis functions

The examples of problem-independent grid-oriented basis sets are given by plane


waves, wavelets, and by the piecewise polynomial finite element (FE) basis functions.
The practically tractable grids of size n × n × n for representing problem-independent
basis sets are presently limited by the value of about n ≈ 500.
Several efficient “meshless” basis sets {gμ } are known in the literature on com-
putational quantum chemistry. In particular, we mention the linear combination of
atomic orbitals (LCAO) and their successors, Slater-type orbitals (STOs). The most pop-
ular basis sets for representation of molecular orbitals are the so-called Gaussian-type
orbitals (GTOs),
Nb
φ(x) = ∑ ck gk (x), x ∈ ℝ3 , (8.9)
k=1

defined by Gaussians scaled by polynomials,


3
βℓ ℓ
−Ak ‖2 3
gk (x) = ∏(x ℓ − Aℓk ) k e−αk ‖x , x = {x(ℓ) }ℓ=1 , (8.10)
ℓ=1

where Ak = (A1k , A2k , A3k ) ∈ ℝ3 (k = 1, . . . , Nb ) correspond to the locations of atoms in a


molecule, βkℓ ∈ ℕ0 , and Nb is the number of GTO basis functions. In the case of larger
molecules, the so-called contracted Gaussian functions are widely used, which prob-
ably constitutes the best compromise between STOs and GTOs (cf. [203] for detailed
discussion). The construction of such problem-dependent basis sets is distinctively
based on the precomputed electronic orbitals for single atoms.
An alternative to the analytically given GTO-type basis functions are the so-called
fully numerical atomic orbitals [203], which are solely specified by their numerical
values on a grid. Such a choice of basis functions fits well the nature of our tensor-
structured numerical method. This allows utilizing the already existing problem
adapted basis sets taking advantage of the important physical information that is
well known for the individual atoms. The (low-rank) separable representation of
functions and operators reduces the 3D calculations to fast numerical operations im-
plemented only on the univariate grids (1D calculations) [173, 145]. In this way, the
computation of the volume integrals, convolution transforms, scalar products, and
function-function multiplications can be simplified dramatically.
The particular requirements on the approximating basis set to be fulfilled in the
framework of our tensor-structured numerical scheme were formulated at the end of
the previous subsection 8.1.1. The systematic construction of the high-quality low ten-
sor rank approximating basis can be established based on the following techniques:
– Algebraic optimization of the conventional “meshless” GTO-type basis sets
(RG = 1);
– Rank reduction of the Slater-type orbitals (RG = O(log ε−1 ) up to the tolerance
ε > 0), cf. [161];
8.1 Calculation of the Hartree and exchange operators | 115

– Using the orthogonal Tucker vectors computed for simplified problems, whose
Tucker rank is supposed to be weakly dependent on the particular molecule and
the grid parameters [173, 174, 186].

All these concepts still require further theoretical and numerical analysis.
The main advantage of the low tensor rank approximating basis sets is the linear
scaling of the resultant algorithms in the univariate grid size n, which already allows
employing huge n×n×n-grids in ℝ3 (specifically, n ≤ 2⋅104 for the current computations
in the framework of the multilevel Hartree–Fock solver). This could be beneficial in the
FEM-DFT computations applied to large molecular clusters.

8.1.3 Tensor computation of the Galerkin integrals in matrices J(D) and K (D)

The beneficial feature of our method is that functions and operators involved in the
computational scheme for the Coulomb and exchange matrices (8.1)–(8.5) are effi-
ciently evaluated using (approximate) low-rank tensor-product representations in the
discretized basis sets {Gμ } and {Xμ } at the expense that scales linear-logarithmic in n,
O(n log n).
To that end, we introduce some interpolation/prolongation operators intercon-
necting the continuous functions on Ω and their discrete representation on the grid via
the coefficient tensors in ℝℐ (or in ℝ𝒥 ). Note that the coefficient space of tri-tensors in

𝕍n = ℝℐ := V1 ⊗ V2 ⊗ V3

is the tensor-product space with Vℓ = ℝn (ℓ = 1, 2, 3). Conventionally, we use the


canonical isomorphism between 𝒱n and 𝕍n ,

𝒱n ∋ f (x) = ∑ fi ϕi (x) ⇐⇒ F := [fi ]i∈ℐ ∈ 𝕍n .


i

We make use of similar entities for the pair 𝒲n and 𝕎n = ℝ𝒥 := W1 ⊗ W2 ⊗ W3 with


Wℓ = ℝn−1 (ℓ = 1, 2, 3).
Now we define the collocation and L2 -projection mappings onto 𝕍n . For the con-
tinuous function f , we introduce the collocation “projection” operator by

𝒫C : f 󳨃→ ∑ f (yi )ϕi (x) ⇐⇒ F := [f (yi )]i∈ℐ ∈ 𝕍n ,


i

where {yi } is the set of cell-centered points with respect to the grid ω3,n . Furthermore,
for functions f ∈ L2 (Ω), we define the L2 -projection by

𝒫0 : f 󳨃→ ∑⟨f , ϕi ⟩ϕi (x) ⇐⇒ F := [⟨f , ϕi ⟩]i∈ℐ ∈ 𝕍n .


i

Likewise, we denote by 𝒬0 the L2 -projection onto 𝕎n .


116 | 8 Multilevel grid-based tensor-structured HF solver

Using the discrete representations as above, we are able to rewrite all functional
and integral transforms in (8.1)–(8.5) in terms of tensor operations in 𝕍n . In particular,
for the continuous targets, the function-times-function and the L2 -scalar product can
be discretized by tensor operations as

f ⋅ g 󳨃→ F ⊙ G ∈ 𝕍n and ⟨f , g⟩ 󳨃→ h3 ⟨F, G⟩

with
F = 𝒫C (f ), G = 𝒫C (g),

and ⊙ means the Hadamard (entrywise) product of tensors.


The convolution product is represented by

f ∗ g 󳨃→ F ∗T G ∈ 𝕍n , with F = 𝒫C (f ) ∈ 𝕍n , G = 𝒫0 (g) ∈ 𝕍n ,

where the tensor operation ∗T stands for the tensor-structured convolution transform
in 𝕍n described in [166] (see also [186, 174] for application of fast ∗T transform in elec-
tronic structure calculations). We notice that under certain assumptions on the regu-
larity of the input functions (see Section 5) the tensor product convolution ∗T can be
proven to provide an approximation error of order O(h2 ), whereas the two-grid version
via the Richardson extrapolation leads to the improved error bound O(h3 ) (cf. [166]).
Tensor-structured calculation of the multidimensional convolution integral operators
with the Newton kernel have been introduced and implemented in [174, 187, 145], see
also [108].
Representations (8.1)–(8.2) for the Coulomb operator can be now rewritten (ap-
proximately) in terms of the discretized basis functions by using tensor operations:

Norb Nb
ρ ≈ Θ := ∑ ( ∑ Cκa Cλa Gκ ⊙ Gλ ), where Gκ = 𝒫C (gκ ),
a=1 κ,λ=1

implying
1
VH = ρ ∗ g ≈ Θ ∗T PN , where PN = 𝒫0 (g), g = , (8.11)
‖⋅‖

with PN ∈ 𝕍n being the collocation tensor for the Coulomb potential. This implies the
tensor representation of the Coulomb matrix,

J(D)μν ≈ ⟨Gμ ⊙ Gν , Θ ∗T PN ⟩, 1 ≤ μ, ν ≤ Nb . (8.12)

The separability property of basis functions ensures that rank(Gμ ) ≤ RG , whereas ten-
sors Θ and PN are to be approximated by low-rank tensors. Hence, in our method,
the corresponding tensor operations are implemented using fast multilinear algebra
equipped with the corresponding rank optimization (tensor truncation) [173, 174, 186].
8.2 Numerics on three-dimensional convolution operators | 117

The numerical examples of other rank decompositions to electron density (not in-
cluding the calculation of the three-dimensional convolution operator) have been pre-
sented in [52, 81]. The tensor product convolution was introduced in [173, 174] and also
discussed in [108, 109, 166].
Likewise, tensor representations (8.3)–(8.5) for the exchange operator realized in
[145] now look as follows:
Nb
Waν ≈ ϒaν := [Gν ⊙ ∑ Cκa ⊙ Gκ ] ∗T PN , ν = 1, . . . , Nb , (8.13)
κ=1

with the tensor PN ∈ 𝕍n defined by (8.11). Now we proceed with


Nb
Kμν,a ≈ χμν,a := ⟨[ ∑ Cκa Gκ ] ⊙ Gμ , ϒaν ⟩, μ, ν = 1, . . . , Nb , (8.14)
κ=1

finally providing the entries of the exchange matrix by summation over all orbitals
Norb
K(D)μν = ∑ χμν,a , μ, ν = 1, . . . , Nb . (8.15)
a=1

Again, the auxiliary tensors and respective algebraic operations have to be imple-
mented with the truncation to low-rank tensor formats.

8.2 Numerics on three-dimensional convolution operators


Here we discuss the algorithms for grid-based calculation of the Coulomb and ex-
change operators by using the tensor-structured numerical method introduced in [174,
145], where it was demonstrated that calculation of the three- and six-dimensional
convolution integrals with the Newton kernel can be reduced to a combination of one-
dimensional Hadamard and scalar products and one-dimensional convolutions.
In the following, for numerical illustrations, we use the Gaussian basis sets,
which are convenient for verification of the computational results (the corresponding
Galerkin Fock matrix) with the standard MOLPRO output [299]. The univariate Gaus-
sians gk(ℓ) (xℓ ) = gk,1
(ℓ)
(xℓ ), ℓ = 1, 2, 3, are the functions with infinite support given by

gk(ℓ) (xℓ ) = (xℓ − Aℓ,k )pℓ,k exp(−αk (xℓ − Aℓ,k )2 ), xℓ ∈ ℝ, αk > 0,

where pℓ,k = 0, 1, . . . is the polynomial degree, and the points (A1,k , A2,k , A3,k ) ∈ ℝ3
specify the positions of nuclei in a molecule.
The molecule is embedded in a certain fixed computational box Ω = [−b, b]3 ∈ ℝ3 ,
as in Figure 11.1.1 For a given discretization parameter n ∈ ℕ, we use the equidistant
n × n × n tensor grid ω3,n = {xi }, i ∈ ℐ := {1, . . . , n}3 , with the mesh-size h = 2b/(n + 1).

1 In the case of small to moderate size molecules, usually, we use the computational box of size
403 bohr.
118 | 8 Multilevel grid-based tensor-structured HF solver

Figure 8.1: Approximation of the Gaussian-type basis function by a piecewise constant function.

The Gaussian-type basis functions are used for the representation of orbitals (8.9). In
calculations of integral terms, the separable type basis functions gk (x), x ∈ ℝ3 are
approximated by sampling their values at the centers of discretization intervals, as in
Figure 8.1, using the product of univariate piecewise constant basis functions gk (x) ≈
g k (x) = ∏3ℓ=1 g (ℓ)
k (x ), ℓ = 1, 2, 3, yielding their rank-1 tensor representation,
(ℓ)

gk 󳨃→ Gk = g(1)
k
⊗ g(2)
k
⊗ g(3)
k
∈ ℝn×n×n , k = 1, . . . , Nb . (8.16)

For the tensor-based calculation of the Hartree potential

ρ(y)
VH (x) := ∫ dy
‖x − y‖
ℝ3

and of the corresponding Coulomb matrix

Jkm := ∫ gk (x)gm (x)VH (x)dx, k, m = 1, . . . , Nb , x ∈ ℝ3 ,


ℝ3

we use the discrete tensor representation of basis functions (8.16). Then the electron
density is approximated by using 1D Hadamard products of skeleton vectors in rank-1
tensors (instead of product of Gaussians)

Norb Nb Nb
n×n×n
ρ ≈ Θ = 2 ∑ ∑ ∑ ca,m ca,k (g(1)
k
⊙ g(1)
m ) ⊗ (gk ⊙ gm ) ⊗ (gk ⊙ gm ) ∈ ℝ
(2) (2) (3) (3)
.
a=1 k=1 m=1

1
Further, the representation of the Newton convolving kernel ‖x−y‖
by a canonical
rank-RN tensor [30] is used (see Section 6.1 for details):

RN
n×n×n
PN 󳨃→ PR = ∑ p(1)
q ⊗ pq ⊗ pq ∈ ℝ
(2) (3)
. (8.17)
q=1

Since large ranks make tensor operations inefficient, the multigrid canonical-to-
Tucker and Tucker-to-canonical algorithms (see Sections 3.3.3 and 3.5) should be
8.2 Numerics on three-dimensional convolution operators | 119

applied to reduce the initial rank of Θ 󳨃→ Θ󸀠 by several orders of magnitude, from


Nb2 /2 to essentially smaller value Rρ ≪ Nb2 /2. For sufficient accuracy, the ε-threshold
is chosen of the order of 10−7 .
Tensor approximation to the Hartree potential is calculated by using the 3D tensor
product convolution, which is a sum of tensor products of 1D convolutions:
Rρ RN
VH ≈ VH = Θ󸀠 ∗ PR = ∑ ∑ cj (u(1)
j ∗ pq ) ⊗ (uj ∗ pq ) ⊗ (uj ∗ pq ).
(1) (2) (2) (3) (3)

j=1 q=1

Finally, the entries of the Coulomb matrix Jkm are computed by 1D scalar products of
the canonical vectors of VH with the Hadamard products of the rank-1 tensors repre-
senting the Galerkin basis:

Jkm ≈ ⟨Gk ⊙ Gm , VH ⟩, k, m = 1, . . . Nb .

The cost of 3D tensor product convolution is O(n log n) instead of O(n3 log n) for the
standard benchmark 3D convolution using the 3D FFT. Table 8.1 shows CPU times (sec)
for the Matlab computation of VH for H2 O molecule [174] on a SUN station using a
cluster with 4 Intel Xeon E7-8837/32 cores/2.67 GHz and 1024 GB storage (times for 3D
FFT for n ≥ 4096 are obtained by extrapolation). It is easy to notice cubic scaling
of the 3D FFT time in dyadic increasing of the grid size n and approximately linear-
logarithmic scaling for 3D convolution on the same grids (see C ∗ C row). C2T shows
the time for the canonical-to-Tucker rank reduction.
Following [166], we apply the Richardson extrapolation technique (see [218]) to
obtain higher accuracy approximations of order O(h3 ) without extra computational
cost. The numerical gain of using an extrapolated solution is achieved due to the fact
that the approximation error O(h3 ) on the single grid would require the univariate grid
size n1 = n3/2 ≫n. The corresponding Richardson extrapolant VH,Rich
(n)
approximating
VH (x) over a pair of nested grids ω3,n and ω3,2n , and defined on the “coarse” n⊗3 -grid,
is given by

VH,Rich
(n)
= (4 ⋅ VH(2n) − VH(n) )/3 in the grid-points on ω3,n .

The next numerical results show the accuracy of the tensor-based calculations using
n×n×n 3D Cartesian grids with respect to the corresponding output from the MOLPRO
package [299].

Table 8.1: Times (sec) for the 3D tensor product convolution vs. convolution by 3D FFT in computation
of VH for H2 O molecule.

n3 10243 20483 40963 81923 16 3843


FFT3 10 81 640 5120 ∼11 hours
C∗C 8.8 20.0 61.0 157.5 299.2
C2T 6.9 10.9 20.0 37.9 86.0
120 | 8 Multilevel grid-based tensor-structured HF solver

Figure 8.2: Left: Absolute error in tensor computation of the Coulomb matrix for CH4 and C2 H6
molecules.

Figure 8.3: Left: Absolute approximation error (blue line: ≈10−6 au) in the tensor-product computa-
tion of the Hartree potential of C2 H6 , measured in the grid line Ω = [−5, 7] × {0} × {0}. Right: Times
versus n in MATLAB for computation of VH for C2 H6 molecule.

Figure 8.2 demonstrates the accuracy (∼10−5 ) of the calculation of the Coulomb matrix
for CH4 and C2 H6 molecules using the Richardson extrapolation on a sequence of grids
ω3,n with n = 4096 and n = 8192.
Figure 8.3 (left) shows the accuracy in calculation of the Hartree potential (in com-
parison with the benchmark calculations from MOLPRO) for the C2 H6 molecule com-
puted on n × n × n grids of size n = 4096 and n = 8192 (dashed lines). The solid line in
Figure 8.3 shows the accuracy of the Richardson extrapolation of the results from two
grids of size n = 4096 and n = 8192. One can observe essential improvement of accu-
racy for the Richardson extrapolation. Figure 8.3 (right) shows the CPU times versus n
in MATLAB indicating the linear complexity scaling in the univariate grid size n. See
also Figure 8.4 illustrating accuracy for the exchange matrix K = Kex .
In a similar way, the algorithm for 3D grid-based tensor-structured calculation
of 6D integrals in the exchange potential operator was introduced in [145], Kkm =
8.3 Multilevel rank-truncated self-consistent field iteration | 121

Figure 8.4: L∞ -error in Kex = K for the density of H2 O and pseudodensity of CH3 OH.

N
orb
∑a=1 Kkm,a with
φa (x)φa (y)
Kkm,a := ∫ ∫ gk (x) gm (y)dxdy, k, m = 1, . . . Nb .
|x − y|
ℝ3 ℝ3

The contribution from the ath orbital are approximated by tensor anzats,

Nb Nb
Kkm,a ≈ ⟨Gk ⊙ [ ∑ cμa Gμ ], [Gm ⊙ ∑ cνa Gν ] ∗ PR ⟩.
μ=1 ν=1

Here, the tensor product convolution is first calculated for each ath orbital, and then
scalar products in canonical format yield the contributions to entries of the exchange
Galerkin matrix from the a-th orbital. The algorithm for tensor calculation of the ex-
change matrix is described in detail in [145].
These algorithms were introduced in the first tensor-structured Hartree–Fock
solver using 3D grid-based evaluation of the Coulomb and exchange matrices in 1D
complexity at every step of self-consistent field (SCF) iteration [146, 187].

8.3 Multilevel rank-truncated self-consistent field iteration


In the following sections we discuss the first grid-based Hartree–Fock solver, which
was developed in 2009 and published in [146] and [187].
The standard self-consistent field iteration (SCF) algorithm can be formulated as
the following “fixed-point” iteration [203, 44]: Starting from initial guess C0 , perform
iterations of the form

F̃k Ck+1 = SCk+1 Λk+1 , Λk+1 = diag(λ1k+1 , . . . , λNk+1


orb
), (8.18)
T
Ck+1 SCk+1 = INorb ,
122 | 8 Multilevel grid-based tensor-structured HF solver

where the current Fock matrix F̃k = Φ(Ck , Ck−1 , . . . , C0 ), k = 0, 1, . . ., is specified by


the particular relaxation scheme. For example, for the simplest approach, called the
Roothaan algorithm, one has F̃k = F(Ck ). In practically interesting situations this al-
gorithm usually leads to “flip-flop” stagnation [203].
Recall that λ1k+1 ≤ λ2k+1 ≤ ⋅ ⋅ ⋅ ≤ λNk+1
orb
are Norb negative eigenvalues of the linear
generalized eigenvalue problem

F̃k U = λSU, (8.19)

and the Nb × Norb matrices Ck+1 contain the respective Norb orthonormal eigenvectors
Nb ×Nb
u1 , . . . , uNorb . We denote by C
̃
k+1 ∈ ℝ the matrix representing the full set of orthog-
onal eigenvectors in (8.19).
We use the particular choice of F̃k , k = 0, 1, . . ., via the DIIS-algorithm (cf. [238]),
with the starting value F̃0 = F(C0 ) = H, where the matrix H corresponds to the core
Hamiltonian.
In [146, 187] a modification to the standard DIIS iteration was proposed by carrying
out the iteration on a sequence of successively refined grids with the grid-dependent
stopping criteria. The multilevel implementation provides robust convergence from
the zero initial guess for the Hartree and exchange operators. The coarse-to-fine grids
iteration, in turn, accelerates the solution process dramatically due to low cost of the
coarse grid calculations.
The principal feature of the tensor-truncated iteration is revealed on the fast up-
date of the Fock matrix F(C) by using tensor-product multilinear algebra of 3-tensors
accomplished with the rank truncation. Moreover, the multilevel implementation pro-
vides a simple scheme for constructing good initial guess on the fine grid-levels.

8.3.1 SCF iteration by using modified DIIS scheme

For each fixed discretization, we use the original version of DIIS scheme (cf. [128]),
defined by the following choice of the residual error vectors (matrices):
̃ T F(C )C
Ei := [C ̃ ∈ ℝNorb ×(Nb −Norb ) (8.20)
i+1 i i+1 ]|{1≤μ≤N orb ;Norb +1≤ν≤Nb }

for iteration number i = 0, 1, . . . , k, which should vanish on the exact solutions of the
Hartree–Fock Galerkin equation due to the orthogonality property. Hence, some stop-
ping criterion applies to residual error vector Ei for i = 0, 1, 2, . . .. Here the subindexes μ
and ν specify the relevant range of entries in the coefficients for molecular orbitals C ̃ .
i+1
The minimizing coefficient vector c̃ := (c0 , . . . , ck )T ∈ ℝk+1 is computed by solv-
ing the constrained quadratic minimization problem for the respective cost functional
(the averaged residual error vector over previous iterands):
󵄩󵄩2
1 󵄩󵄩󵄩 k k
󵄩
󵄩󵄩 1
f (c̃) := 󵄩󵄩󵄩 ∑ ci Ei 󵄩󵄩󵄩 ≡ ⟨Bc̃, c̃⟩ → min, provided that ∑ ci = 1,
2 󵄩󵄩󵄩i=0 󵄩󵄩
󵄩F 2 i=0
8.3 Multilevel rank-truncated self-consistent field iteration | 123

where

B = {Bij }ki,j=0 with Bij = ⟨Ei , Ej ⟩,

with Ei defined by (8.20). Introducing the Lagrange multiplier ξ ∈ ℝ, the problem is


reduced to minimization of the Lagrangian functional

L(c̃, ξ ) = f (c̃) − ξ (⟨1, c̃⟩ − 1),

where 1 = (1, . . . , 1)T ∈ ℝk+1 , which leads to the linear augmented system of equations

Bc̃ − ξ 1 = 0, (8.21)
⟨1, c̃⟩ = 1.

Finally, the updated Fock operator F̃k is built up by

k−1
F̃k = ∑ ciopt F̃i + ckopt F(Ck ), k = 0, 1, 2, . . . , (8.22)
i=0

where the minimizing coefficients ciopt = c̃i (i = 0, 1, . . . , k) solve the linear system
(8.21). For k = 0, the first sum in (8.22) is assumed to be zero, hence providing c0opt = 1
and F̃0 = F(C0 ).
Recall that if the stopping criterion on Ck , k = 1, . . ., is not satisfied, then one
updates F̃k by (8.22) and solves the eigenvalue problem (8.18) for Ck+1 .
Note that in practice one can use the averaged residual vector only on a reduced
subsequence of iterands, Ek , Ek−1 , . . . , Ek−k0 , k − k0 > 0. In our numerical examples
below, we usually set k0 = 4.

8.3.2 Unigrid and multilevel tensor-truncated DIIS iteration

In this section, we describe the resultant numerical algorithm. Recall that the discrete
nonlinear Fock operator is specified by a matrix

F(C) = H + J(C) + K(C), (8.23)

where H corresponds to the core Hamiltonian (fixed in our scheme), and the discrete
Hartree and exchange operators are given by tensor representations (8.12) and (8.4),
respectively.
First, we describe the unigrid tensor-truncated DIIS scheme [146, 187].

I. Algorithm U_DIIS (unigrid tensor-truncated DIIS iteration).


(1) Given the core Hamiltonian matrix H, the grid parameter n, and the termination
parameter ε > 0.
124 | 8 Multilevel grid-based tensor-structured HF solver

(2) Set C0 = 0 (i. e., J(C0 ) = 0, K(C0 ) = 0) and F̃0 = H.


(3) For k = 0, 1, . . ., perform
(a) Solve the full linear eigenvalue problem of size Nb × Nb , given by (8.19), and
define Ck+1 as the matrix containing the Norb eigenvectors corresponding to
Norb minimal eigenvalues.
(b) Terminate the iteration by checking the stopping criterion

‖Ck+1 − Ck ‖F ≤ ε.

(c) If ‖Ck+1 − Ck ‖F > ε, then compute the Fock matrix

F(Ck+1 ) = H + J(Ck+1 ) + K(Ck+1 )

by the tensor-structured calculations of J(Ck+1 ) and K(Ck+1 ) using grid-based


basis functions with expansion coefficients Ck+1 (see Section 8.1). Update the
Fock matrix F̃k+1 by (8.22) and switch to Step (a).
(4) Returns: Eigenvalues λ1 , . . . , λNorb and eigenvectors C ∈ ℝNb ×Norb .

Numerical illustration on the convergence of Algorithm U_DIIS for solving the


Hartree–Fock equation in the pseudopotential case of CH4 have been presented in
[187]. It demonstrates that the convergence history is almost independent of the grid
size on the examples with n = 64 and n = 256.
To enhance the unigrid DIIS iteration, we apply the multilevel version of Algo-
rithm U_DIIS defined on a sequence of discrete Hartree–Fock equations specified by a
sequence of grid parameters np = n0 , 2n0 , . . . , 2M n0 , with p = 0, . . . , M, corresponding
to the succession of dyadically refined spacial grids. To that end, for ease of expo-
sition, we also introduce the incomplete version of Algorithm U_DIIS, further called
Algorithm U_DIIS(k),̃ where the DIIS correction starts only after the iteration number
k = k̃ ≥ 1. The input data for Algorithm U_DIIS(k) ̃ include the current approximation
Ck̃ and a sequence of all already precomputed Fock matrices, F̃0 , F̃1 , . . . , F̃k−1
̃ .
We sketch this algorithm as follows:

II. Algorithm U_DIIS(k) ̃ (incomplete unigrid tensor-truncated DIIS iteration).


(1) Given the core Hamiltonian matrix H, the grid parameter n, the termination pa-
rameter ε > 0, Ck̃ , and a sequence of Fock matrices F̃0 , F̃1 , . . . , F̃k−1
̃ .
(2) Compute J(Ck̃ ), K(Ck̃ ), F(Ck̃ ) = H + J(Ck̃ ) + K(Ck̃ ), and F̃k̃ by (8.22).
(3) For k = k,
̃ k̃ + 1, . . ., perform steps (a)–(c) in Algorithm M_DIIS.

Next, we consider the multilevel tensor-truncated DIIS scheme [146, 187].

III. Algorithm M_DIIS (multilevel tensor-truncated DIIS scheme).


(1) Given the core Hamiltonian matrix H, the coarsest grid parameter n0 , the termina-
tion parameter ε0 > 0, and the number of grid refinements M.
8.3 Multilevel rank-truncated self-consistent field iteration | 125

(2) For p = 0, apply the unigrid Algorithm U_DIIS with n = n0 , εp = ε0 , and re-
turn the number of iterations k0 , matrix Ck0 +1 , and a sequence of Fock matrices
F̃0 , F̃1 , . . . , F̃k0 .
(3) For p = 1, . . . , M, apply successively Algorithm U_DIIS(kp−1 + 1), with the input
parameters np := 2p n0 , εp := ε0 2−2p , Ckp−1 +1 . Keep continuous numbering of the
DIIS iterations through all levels such that the maximal iteration number at level
p is given by
p
kp = ∑ mp
p=0

with mp being the number of iterative steps at level p.


(4) Returns: kM , CkM +1 , and a sequence of Fock matrices F̃0 , F̃1 , . . . , F̃kM .

In numerical practice, usually, we start calculations on a small n0 × n0 × n0 3D


Cartesian grid with n0 = 64 and end up with maximum nM = 8192 for all electron case
computations, or nM = 1024 for the pseudopotential case. Further, in Section 11.6,
we show by numerical examples that in large-scale computations the multilevel Al-
gorithm M_DIIS allows us to perform most of the iterative steps on coarse grids, thus
reducing dramatically the computational cost and, at the same time providing a good
initial guess for the DIIS iteration on nonlinearity at each consequent approximation
level.
The rest of this section addresses the complexity estimate of the multilevel tensor-
truncated iteration in terms of RN , Nb , n, and other governing parameters of the algo-
rithm. For the ease of discussion we suppose that rank(Gμ ) = 1, μ = 1, . . . , Nb (see [187]
concerning the more detailed discussion on the general case of rank(Gμ ) ≥ 1).

Lemma 8.1 ([187]). Let rank(Gμ ) = 1, μ = 1, . . . Nb , and rank(PN ) = RN ≤ CNorb . Sup-


pose that the rank reduction procedure applied to the convolution products ϒaν in (8.3)
provides the rank estimate rank(ϒaν ) ≤ r0 . Then the numerical cost of one iterative step
in Algorithm M_DIIS at level p can be bounded by

Wp = O(Nb RN np log np + Nb3 r0 Norb np ).

Assume that the number of multigrid DIIS iterations at each level is bounded by the con-
stant I0 . Then the total cost of Algorithm M_DIIS does not exceed the double cost at the
finest level n = nM , 2WM = O(I0 Nb3 r0 Norb n).
N
Proof. The rank bound rank(Gk ) = 1 implies rank(∑m=1
b
cma Gm ) ≤ Nb . Hence, the nu-
merical cost to compute the tensor-product convolution ϒaν in (8.3) amounts to

W(ϒaν ) = O(Nb RN np log np ).

Since the initial canonical rank of ϒaν is estimated by rank(ϒaν ) ≤ Nb RN , the multigrid
rank reduction algorithm, having linear scaling in rank(ϒaν ), see Section 3, provides
126 | 8 Multilevel grid-based tensor-structured HF solver

the complexity bound O(r0 Nb RN np ). Hence the total cost to compute scalar products
in χμν,a (see (8.4)) can be estimated by

W(χμν,a ) = O(Nb3 r0 Norb np ),

which completes the first part of our proof. The second assertion follows due to linear
scaling in np of the unigrid algorithm, which implies the following bound:

n0 + 2n0 + ⋅ ⋅ ⋅ + 2p n0 ≤ 2p+1 n0 = 2nM ,

hence completing the proof.

Remark 8.2. In the case of large molecules and RG = rank(Gμ ) ≥ 1, further optimiza-
tion of the algorithm up to O(RN Nb2 np )-complexity may be possible on the base of rank
reduction applied to the rank-RG Nb orbitals and by using an iterative eigenvalue solver
instead of currently employed direct solver via matrix diagonalization, or by using di-
rect minimization schemes [263].

Our algorithm for ab initio solution of the Hartree–Fock equation in tensor-


structured format was examined numerically on some moderate size molecules
[146, 187]. In particular, we consider the all-electron case of H2 O and the case of pseu-
dopotential of CH4 and CH3 OH molecules. In the presented numerical examples, we
use the discretized GTO basis functions for reasons of convenient comparison of the
results with the output from the standard MOLPRO package based on the analytical
evaluation of the integral operators in the GTO basis.
The size of the computational box [−b, b]3 introduced in Section 8.1.1 varies from
2b = 11.2 Å for H2 O up to 2b = 16 Å for small organic molecules. The smallest step-
size of the grid h = 0.0013 Å is reached in the SCF iterations for the H2 O molecule,
using the finest level grid with n = 8192, whereas the average step size for the compu-
tations using the pseudopotentials for small organic molecules is about h = 0.015 Å,
corresponding to the grid size n = 1024.
We solve numerically the ab initio Hartree–Fock equation by using Algorithms
U_DIIS and M_DIIS presented in Section 8.3.2. Starting with the zero initial guess for
matrices J(C) = 0 and K(C) = 0 in the Galerkin Fock matrix, the eigenvalue problem
at the first iterative step (p = 0) is solved by using only the H part of the Fock ma-
trix in (8.23), which does not depend on the solution and hence can be precomputed
beforehand.
Thus, the SCF iteration starts with the expansion coefficients cμi for orbitals in the
GTO basis, computed using only the core Hamiltonian H. At every iteration step, the
Hartree and exchange potentials and the corresponding Galerkin matrices are com-
puted using the updated coefficients cμi . The renewed Coulomb and exchange matri-
ces generate the updated Fock matrix to be used for the solution of the eigenvalue
8.3 Multilevel rank-truncated self-consistent field iteration | 127

Figure 8.5: Multilevel convergence of the DIIS iteration applied to the all electron case of H2 O (left),
and convergence in the energy in n (right).

problem. The minimization of the Frobenius norm of the virtual block of the Fock op-
erator evaluated on eigenvectors of the consequent iterations, C ̃ ,C
k k−1 , . . ., is utilized
̃
for the DIIS scheme.
The multilevel solution of the nonlinear eigenvalue problem (8.18) is realized via
the SCF iteration on a sequence of uniformly refined grids, beginning from the initial
coarse grid, say, with n0 = 64, and proceeding on the dyadically refined grids np =
n0 2p , p = 1, . . . , M. We use the grid-dependent termination criterion εnp := ε0 2−2p ,
keeping a continuous numbering of the iterations.
Figure 8.5 (left) shows the convergence of the iterative scheme in the case of H2 O
molecule. Figure 8.5 (right) illustrates the convergence in the total Hartree–Fock en-
ergy reaching the absolute error about 10−4 , which implies the relative error 9 ⋅ 10−6 in
the case of grid size n = 1024. The total energy is calculated by

Norb Norb
EHF = 2 ∑ λa − ∑ (̃Ja − K
̃a )
a=1 a=1

with ̃Ja = ⟨ψa , VH ψa ⟩L2 , and K̃a = ⟨ψa , 𝒱ex ψa ⟩ 2 , being the so-called Coulomb and
L
exchange integrals, respectively, computed in the molecular orbital basis ψa (a =
1, . . . , Norb ).
The detailed discussion of the multilevel DIIS iteration, including various numer-
ical tests, can be found in [187, 146].
9 Grid-based core Hamiltonian
In this section, following [156], we discuss the grid-based method for calculating the
core Hamiltonian part in the Fock operator (7.4)

1
ℋ = − Δ + Vc
2

with respect to the Galerkin basis {gm (x)}1≤m≤Nb , x ∈ ℝ3 , where Vc (x) is given by (7.4),
and Δ represents the 3D Laplacian subject to Dirichlet boundary conditions.

9.1 Tensor approach for multivariate Laplace operator


The initial eigenvalue problem is posed in the finite volume box Ω = [−b, b]3 ∈ ℝ3 sub-
ject to the homogeneous Dirichlet boundary conditions on 𝜕Ω. For given discretiza-
tion parameter N ∈ ℕ, we use the equidistant N × N × N tensor grid ω3,N = {xi },
i ∈ ℐ := {1, . . . , N}3 , with the mesh-size h = 2b/(N + 1), which may be different from the
grid ω3,n introduced in Section 8.1.1 (usually, n ≤ N).
Now, similar to Section 8.1.1, define a set of piecewise linear basis functions g k :=
I1 gk , k = 1, . . . , Nb , by linear tensor-product interpolation via the set of product hat
functions {ξi } = ξi1 (x1 )ξi2 (x2 )ξi3 (x3 ), i ∈ ℐ , associated with the respective grid-cells in
ω3,N . Here, the linear interpolant I1 = I1 ×I1 ×I1 is a product of 1D interpolation operators
0 N
g (ℓ)
k = I1 gk , ℓ = 1, . . . , 3, where I1 : C ([−b, b]) → Wh := span{ξi }i=1 is defined over the
(ℓ)

set of piecewise linear basis functions by

N
(I1 w)(xℓ ) := ∑ w(xiℓ )ξiℓ (xℓ ), xi ∈ ω3,N , ℓ = 1, 2, 3.
iℓ =1

This leads to the separable grid-based approximation of the initial Gaussian-type basis
functions gk (x),

3 3 N

k (xℓ ) = ∏ ∑ gk (xiℓ )ξi (xℓ ),


gk (x) ≈ g k (x) = ∏ g (ℓ) (ℓ)
(9.1)
ℓ=1 ℓ=1 i=1

where the rank-1 coefficients tensor Gk is given by Gk = g(1)k


⊗ g(2)
k
⊗ g(3)
k
with the canon-
ical vectors gk = {gk (xiℓ )} (see Figure 9.1 illustrating the construction of g k (x1 )).
(ℓ) (ℓ)

We approximate the exact Galerkin matrix Ag ∈ ℝNb ×Nb ,

Ag = {akm } := {⟨−Δgk , gm ⟩} ≡ {⟨∇gk , ∇gm ⟩}, k, m = 1, . . . Nb ,

by using the piecewise linear representation of the basis functions g k (x) ∈ ℝ3 (see (9.1))
constructed on N × N × N Cartesian grid (see [41] for general theory of finite element

https://doi.org/10.1515/9783110365832-009
130 | 9 Grid-based core Hamiltonian

Figure 9.1: Using hat functions ξi (x1 ) for a single-mode basis function gk (x1 ) yielding the piecewise
linear representation gk (x1 ) of a continuous function gk (x1 ).

approximation). Here, ∇ denotes the 3D gradient operator. The approximating matrix


AG is now defined by

Ag ≈ AG = {akm } := {⟨−Δg k , g m ⟩} ≡ {⟨∇g k , ∇g m ⟩}, AG ∈ ℝNb ×Nb . (9.2)

The accuracy of this approximation is of order ‖akm −akm ‖ = O(h2 ), where h is the mesh
size (see [156], Theorem A.4, and numerics in Section 9.3).
Recall that the Laplace operator applies to a separable function η(x), x =
(x1 , x2 , x3 ) ∈ ℝ3 , having a representation η(x) = η1 (x1 )η2 (x2 )η3 (x3 ) as follows:

d2 η1 (x1 ) d2 η2 (x2 ) d2 η3 (x3 )


Δη(x) = η2 (x2 )η3 (x 3 ) + η 1 (x 1 )η3 (x 3 ) + η1 (x1 )η2 (x2 ), (9.3)
dx12 dx22 dx32

which ensures the standard Kronecker rank-3 tensor representation of the respective
Galerkin FEM stiffness matrix AΔ in the tensor basis {ξi (x1 )ξj (x2 )ξk (x3 )}, i, j, k = 1, . . . N,

AΔ := A(1) ⊗ S(2) ⊗ S(3) + S(1) ⊗ A(2) ⊗ S(3) + S(1) ⊗ S(2) ⊗ A(3) ∈ ℝN


⊗3
×N ⊗3
.

Here, the 1D stiffness and mass matrices A(ℓ) , S(ℓ) ∈ ℝN×N , ℓ = 1, 2, 3, are given by

N 1
A(ℓ) := {⟨∇(ℓ) ξi (xℓ ), ∇(ℓ) ξj (xℓ )⟩}i,j=1 = tridiag{−1, 2, −1},
h
N h
S(ℓ) = {⟨ξi , ξj ⟩}i,j=1 = tridiag{1, 4, 1},
6

d
respectively, and ∇(ℓ) = dxℓ
. Since {ξi }Ni=1 are the same for all modes ℓ = 1, 2, 3, (for
simplicity of notation) we further denote A(ℓ) = A1 and S(ℓ) = S1 .

Lemma 9.1 (Galerkin matrix AG , [156]). Assume that the basis functions {g k (x)}, x ∈ ℝ3 ,
k = 1, . . . Nb , are rank-1 separable, i. e., g k (x) = g (1)
k (x1 )g k (x2 )g k (x3 ). Then matrix en-
(2) (3)

tries of the Laplace operator AG can be represented by


9.1 Tensor approach for multivariate Laplace operator | 131

akm = ⟨A1 g(1)


k
, g(1)
m ⟩⟨S1 gk , gm ⟩⟨S1 gk , gm ⟩
(2) (2) (3) (3)

+ ⟨S1 g(1)
k
, g(1)
m ⟩⟨A1 gk , gm ⟩⟨S1 gk , gm ⟩
(2) (2) (3) (3)

+ ⟨S1 g(1)
k
, g(1)
m ⟩⟨S1 gk , gm ⟩⟨A1 gk , gm ⟩
(2) (2) (3) (3)

= ⟨AΔ Gk , Gm ⟩, (9.4)
N
where g(ℓ)
k
, g(ℓ)
m ∈ ℝ (k, m = 1, . . . , Nb ) are the vectors of collocation coefficients of

k (xℓ )}, ℓ = 1, 2, 3, and Gk are the corresponding 3-tensors Gk = gk ⊗ gk ⊗ gk of


{g (ℓ) (1) (2) (3)

rank-1.

Proof. By definition, we have

akm = ⟨∇g k , ∇g m ⟩ = ⟨∇(g (1)


k g k g k ), ∇g m g m g m ⟩.
(2) (3) (1) (2) (3)

Taking into account the representation (9.3), this implies

akm = ⟨∇(1) g (1)


k , ∇(1) g m ⟩⟨g k , g m ⟩⟨g k , g m ⟩
(1) (2) (2) (3) (3)

k , g m ⟩⟨∇(1) g k , ∇(1) g m ⟩⟨g k , g m ⟩


+ ⟨g (1) (1) (2) (2) (3) (3)

k , g m ⟩⟨g k , g m ⟩⟨∇(1) g k , ∇(1) g m ⟩.


+ ⟨g (1) (9.5)
(1) (2) (2) (3) (3)

Simple calculations show that for ℓ = 1,


N N
⟨−Δ(1) g (1)
k , g m ⟩ = ⟨∇(1) ∑ gk i ξi (x1 ), ∇(1) ∑ gk j ξj (x1 )⟩
(1)

i=1 j=1
N N
= ⟨∑ gk i ∇(1) ξi (x1 ), ∑ gk j ∇(1) ξj (x1 )⟩
i=1 j=1
N N
= ∑ gk i ∑ gk j ⟨∇(1) ξi (x1 ), ∇(1) ξj (x1 )⟩ = ⟨A1 g(1)
k
, g(1)
m ⟩,
i=1 j=1

and
k , g m ⟩ = ⟨S1 gk , gm ⟩,
⟨g (1) (1) (1) (1)

and similarly for the remaining modes ℓ = 2, 3. These representations imply

akm = ⟨AΔ Gk , Gm ⟩,

which completes the proof.

Remark 9.2. Agglomerating rank-1 vectors Gk ∈ ℝN (k = 1, . . . , Nb ) into a matrix


⊗3

G ∈ ℝN ×Nb , the entrywise representation (9.4) can be written in a matrix form


⊗3

AG = GT AΔ G ∈ ℝNb ×Nb ,

corresponding to the standard matrix–matrix transform under the change of basis.


132 | 9 Grid-based core Hamiltonian

Lemma 9.1 implies that in case of basis functions having ranks larger than one

Rm
gm (x) = ∑ ηp (x), Rm ≥ 1, (9.6)
p=1

where ηp (x) is the rank-1 separable function, representation (9.4) takes the following
form:
Rk Rm
akm = ∑ ∑ [⟨A1 g(1) , g(1) ⟩⟨S1 g(2)
k,p m,q
, g(2) ⟩⟨S1 g(3)
k,p m,q
, g(3) ⟩
k,p m,q
p=1 q=1

+ ⟨S1 g(1) , g(1) ⟩⟨A1 g(2)


k,p m,q
, g(2) ⟩⟨S1 g(3)
k,p m,q
, g(3) ⟩
k,p m,q

+ ⟨S1 g(1) , g(1) ⟩⟨S1 g(2)


k,p m,q
, g(2) ⟩⟨A1 g(3)
k,p m,q
, g(3) ⟩],
k,p m,q
(9.7)

where Rm , m = 1, . . . , Nb , denote the rank parameters of the Galerkin basis func-


tions g m .
Representation (9.4) can be simplified by the standard lumping procedure pre-
serving the same approximation error O(h2 ):

akm 󳨃→ akm = ⟨A1 g(1)


k
, g(1)
m ⟩⟨gk , gm ⟩⟨gk , gm ⟩
(2) (2) (3) (3)

+ ⟨g(1)
k
, g(1)
m ⟩⟨A1 gk , gm ⟩⟨gk , gm ⟩
(2) (2) (3) (3)

+ ⟨g(1)
k
, g(1)
m ⟩⟨gk , gm ⟩⟨A1 gk , gm ⟩
(2) (2) (3) (3)

= ⟨AΔ,FD Gk , Gm ⟩,

where AΔ,FD denotes the finite difference (FD) discrete Laplacian

1 (1) (2) (3) (1)


AΔ,FD := [A ⊗ I ⊗ I + I ⊗ A(2) ⊗ I (3) + I (1) ⊗ I (2) ⊗ A(3) ],
h

with I (ℓ) being the N × N identity matrix.


It is worth noting that the extension of Lemma 9.1 to the case of d-dimensional
Laplacian

akm = ⟨AΔ,d Gk , Gm ⟩

leads to a similar d-term sum representation.

9.2 Nuclear potential operator by direct tensor summation


The method of direct tensor summation of long-range electrostatic potentials [156, 147]
described below is based on the use of low-rank canonical representation to the single
9.2 Nuclear potential operator by direct tensor summation | 133

Newton kernel PR in the bounding box, translated and restricted according to coordi-
nates of the nuclei in a box. The approach is applicable, for example, in tensor-based
calculation of the nuclear potential operator describing the Coulombic interaction of
electrons with the nuclei in a molecular system in a box or in a (cubic) unit cell. It is
defined by the function Vc (x) in the scaled unit cell Ω = [−b/2, b/2]3 ,

M0

Vc (x) = ∑ , Zν > 0, x, aν ∈ Ω ⊂ ℝ3 , (9.8)
ν=1 ‖x − aν ‖

where M0 is the number of nuclei in Ω, and aν and Zν represent their coordinates and
charges, respectively.
1
We start with approximating the non-shifted 3D Newton kernel ‖x‖ on the auxiliary
extended box Ω̃ = [−b, b]3 by its projection onto the basis set {ψ } of piecewise constant
i
functions defined on the uniform 2n × 2n × 2n tensor grid Ω2n with the mesh size h
described in Section 6.1. This defines the “reference” rank-R canonical tensor as above:

R
̃ R = ∑ p(1) ⊗ p(2) ⊗ p(3) ∈ ℝ2n×2n×2n .
P (9.9)
q q q
q=1

Here, we recall the grid-based approximate summation of nuclear potentials in


(9.8) by using the shifted reference canonical tensor in (9.9) once precomputed on fine
3D Cartesian grid. For ease of exposition, we make the technical assumption that each
nuclei coordinate aν is located exactly at a grid-point aν = (iν h−b/2, jν h−b/2, kν h−b/2)
with some 1 ≤ iν , jν , kν ≤ n. Our approximate numerical scheme is designed for nuclei
positioned arbitrarily in the computational box, where approximation error of order
O(h) is controlled by choosing large enough grid size n. Indeed, 1D computational cost
O(n) enables usage of fine grids of size n3 ≈ 1015 , yielding mesh size h ≈ 10−4 –10−5 Å
in our MATLAB calculations (h is of the order of the atomic radii). This grid-based
tensor calculation scheme for the nuclear potential operator was tested numerically in
Hartree–Fock calculations [156], where it was compared with the analytical evaluation
of the same operator by benchmark packages.
Let us introduce the rank-1 windowing operator

(1) (2) (3)


𝒲ν = 𝒲ν ⊗ 𝒲ν ⊗ 𝒲ν

for ν = 1, . . . , M0 by

n×n×n
𝒲ν P
̃ R := P
̃ R (iν + n/2 : iν + 3/2n; jν + n/2 : jν + 3/2n; kν + n/2 : kν + 3/2n) ∈ ℝ . (9.10)

With this notation, the total electrostatic potential Vc (x) in the computational box
Ω is approximately represented by a direct canonical tensor sum
134 | 9 Grid-based core Hamiltonian

M0
Pc = ∑ Zν 𝒲ν P
̃R
ν=1
M0 R
n×n×n
= ∑ Zν ∑ 𝒲ν(1) p(1)
q ⊗ 𝒲ν pq ⊗ 𝒲ν pq ∈ ℝ
(2) (2) (3) (3)
(9.11)
ν=1 q=1

with the canonical rank bound

rank(Pc ) ≤ M0 R, (9.12)

where every rank-R canonical tensor 𝒲ν P ̃ R ∈ ℝn×n×n is thought of as a sub-tensor of


the reference tensor P ̃ R ∈ ℝ2n×2n×2n obtained by its shifting and restriction (window-
ing) onto the n × n × n grid in the box Ωn ⊂ Ω2n . Here, a shift from the origin is specified
according to the coordinates of the corresponding nuclei aν counted in the h-units.
For example, the electrostatic potential centered at the origin, i. e., with aν = 0,
corresponds to the restriction of P ̃ R onto the initial computational box Ωn , i. e., re-
stricted to the index set (assume that n is even)

{[n/2 + i] × [n/2 + j] × [n/2 + k]}, i, j, k ∈ {1, . . . , n}.

Remark 9.3. The rank estimate (9.12) for the sum of arbitrarily positioned electrostatic
potentials in a box (unit cell) Rc = rank(Pc ) ≤ M0 R is usually too pessimistic. Our nu-
merical tests for moderate size molecules indicate that the rank of the (M0 R)-term
canonical sum in (9.11) can be reduced considerably. This rank optimization can
be implemented by the multigrid version of the canonical rank-reduction algorithm,
canonical-Tucker-canonical [174] (see also Section 3.3). The resultant canonical tensor
will be denoted by P̂c.

The described grid-based representation of the exact sum of electrostatic poten-


tials vc (x) in a form of a tensor in a canonical format enables its easy projection to
some separable basis set, like GTO-type atomic orbital basis often used in quantum
chemical computations. The following example illustrates calculation of the nuclear
potential operator matrix in tensor format for molecules [156, 147]. We show that the
projection of a sum of electrostatic potentials of atoms onto a given set of basis func-
tions is reduced to a combination of 1D Hadamard and scalar products.
Let us consider tensor-structured calculation of the nuclear potential operator
(9.8) in a molecule [156, 147]. Given the set of continuous basis functions

{gμ (x)}, μ = 1, . . . , Nb , (9.13)

each of them can be discretized by a third-order tensor

n
Gμ = [gμ (x1 (i), x2 (j), x3 (k))]i,j,k=1 ∈ ℝn×n×n
9.2 Nuclear potential operator by direct tensor summation | 135

obtained by sampling of gμ (x) at the midpoints (x1 (i), x2 (j), x3 (k)) of the grid-cells in-
dexed by (i, j, k). Suppose, for simplicity, that it is a rank-1 canonical tensor
rank(Gμ ) = 1, i. e.
n×n×n
Gμ = g(1)
μ ⊗ gμ ⊗ gμ ∈ ℝ
(2) (3)

n
with the canonical vectors g(ℓ)
μ ∈ ℝ associated with modes ℓ = 1, 2, 3.
The sum of potentials in a box Vc (x) (9.8) is represented in the given basis set (9.13)
by a matrix Vg = [vkm ] ∈ ℝNb ×Nb . The entries of the nuclear potential operator matrix
are calculated (approximated) by the simple tensor operation (see [156, 147])

vkm = ∫ Vc (x)gk (x)gm (x)dx ≈ vkm := ⟨Gk ⊙ Gm , Pc ⟩, 1 ≤ k, m ≤ Nb . (9.14)


ℝ3

We further denote VG = [vkm ]. Here Pc is the sum of shifted/windowed canonical ten-


sors (9.11) representing the total electrostatic potential of atoms in a molecule. Recall
that

Gk ⊙ Gm := (g(1)
k
⊙ g(1)
m ) ⊗ (gk ⊙ gm ) ⊗ (gk ⊙ gm )
(2) (2) (3) (3)

denotes the Hadamard (entrywise) product of tensors representing the basis functions
(9.13), which is reduced to 1D products. The scalar product ⟨⋅, ⋅⟩ in (9.14) is also reduced
to 1D scalar products due to separation of variables.
We notice that the approximation error ε > 0 caused by a separable representa-
tion of the nuclear potential is controlled by the rank parameter Rc = rank(Pc ) ≈ CR,
where C weakly depends on the number of nuclei M0 . Now letting rank(Gm ) = 1 im-
plies that each matrix element is to be computed with linear complexity in n, O(Rn).
The exponential convergence of the canonical approximation in the rank parameter
R allows us the optimal choice R = O(|log ε|) adjusting the overall complexity bound
O(|log ε|n) almost independent on M0 .

Remark 9.4. It should be noted that since we remain in the concept of global basis
functions for the Galerkin approximation to the HF eigenvalue problem, the sizes of
the grids used in discretized representation of these basis functions can be different
in the calculation of the kinetic and potential parts in the Fock operator. The corre-
sponding choice is only controlled by the respective approximation error and by the
numerical efficiency.

Finally, we note that the Galerkin tensor representation of the identity operator
leads to the following mass matrix: S = {skm }, where

skm = ∫ g k (x)g m (x)dx ≈ ⟨Gk , Gm ⟩, 1 ≤ k, m ≤ Nb .


ℝ3

To conclude this section, we note that the error bound ‖Vg − VG ‖ ≤ Ch2 can be
proven along the line of the discussion in [166].
136 | 9 Grid-based core Hamiltonian

9.3 Numerical verification for the core Hamiltonian


First, following [156] we consider the evaluation of a Galerkin matrix entry for the iden-
tity and Laplace operators, that is,
2
⟨g, g⟩ = ∫ g(x)2 dx and ⟨−Δg, g⟩ = ∫ ∇g(x) ⋅ ∇g(x)dx, g(x) = e−α‖x‖ , x ∈ ℝ3 ,
ℝ3 ℝ3

for a single Gaussian with sufficiently large α > 0 and using large N × N × N Cartesian
grids. Functions are discretized with respect to the basis set (9.1) in the computational
box [−b, b]3 with b = 14.6 au ≈ 8 Å.
For a single Gaussian, we compare 𝒥h computed as in Lemma 9.1 with the exact
expression

2
𝒥 = ∫ ∇g(x) ⋅ ∇g(x)dx = 3J1 J01 ,
ℝ3

where
∞ ∞
2 π 2
J1 = 4α2 ∫ x2 e−2αx dx = √ √α,
√π
J01 = ∫ e−αx dx = .
2 √α
−∞ −∞

Table 9.1 shows the approximation error |𝒥 − 𝒥h | versus the grid size, where 𝒥h corre-
sponds to the grid-based evaluation of the matrix element on the corresponding grid
for α = 2500, 4 ⋅ 104 , and 1.2 ⋅ 105 , which exceed the largest exponents α in the conven-
tional Gaussian sets for hydrogen (α = 1777), carbon (α = 6665), oxygen (α = 11 720),
and mercury (α = 105 ) atoms.
Computations confirm the results of Theorem A4 in [156] on the error bound O(h2 ).
It can be seen that the errors reduce by a distinct factor of 4 for the diadically refined

Table 9.1: Approximation error |𝒥 −𝒥h | for the grid-based evaluation of the Laplacian Galerkin matrix
2
entry for a Gaussian g(x) = e−α‖x‖ , x ∈ ℝ3 , N = 2p − 1.

p N3 α = 2.5 ⋅ 103 α = 4 ⋅ 104 α = 1.2 ⋅ 105


𝒥 − 𝒥h RE 𝒥 − 𝒥h RE 𝒥 − 𝒥h RE
3
12 4095 0.0037 – 0.0058 – 0.025 –
13 81913 9.3 ⋅ 10−4 1.0 ⋅ 10−5 0.0034 0.0026 2.4 ⋅ 10−5 –
14 16 3833 2.3 ⋅ 10−4 1.2 ⋅ 10−6 9.1 ⋅ 10−4 9.1 ⋅ 10−5 0.0015 –
15 32 7673 5.8 ⋅ 10−5 7.6 ⋅ 10−8 2.3 ⋅ 10−4 4.8 ⋅ 10−6 4.03 ⋅ 10−4 3.8 ⋅ 10−5
16 65 5353 1.4 ⋅ 10−5 4.7 ⋅ 10−9 5.8 ⋅ 10−5 3.0 ⋅ 10−7 1.0 ⋅ 10−4 1.6 ⋅ 10−6
17 131 0713 3.6 ⋅ 10−5 2.4 ⋅ 10−10 1.5 ⋅ 10−5 1.9 ⋅ 10−8 5.5 ⋅ 10−5 1.0 ⋅ 10−7
18 262 1433 9.1 ⋅ 10−7 3.1 ⋅ 10−11 3.6 ⋅ 10−6 1.2 ⋅ 10−9 6.4 ⋅ 10−6 6.5 ⋅ 10−9
19 524 2873 2.2 ⋅ 10−7 5.4 ⋅ 10−13 9.1 ⋅ 10−7 7.3 ⋅ 10−11 1.6 ⋅ 10−6 4.0 ⋅ 10−10
9.3 Numerical verification for the core Hamiltonian | 137

grids. Therefore, in spite of sharp “needles” of Gaussians due to large α, the Richard-
son extrapolation [218] (RE column) on a sequence of large grids provides a higher
accuracy of order O(h3 )–O(h4 ).
In Table 9.1, the largest grid size N = 219 − 1 corresponds to the computational box
Ω ∈ ℝ3 with the huge number of entries of order 257 ≈ 1017 . The corresponding mesh
size is of order h ∼ 10−5 Å. Computing times in Matlab range from several millisec-
onds up to 1.2 sec for the largest grid.
2
3
Notice that the integral ⟨g, g⟩ = ∫ℝ3 e−2α‖x‖ dx = J01 (α) involved in the calculation
of the mass-matrix Sg is approximated with the same accuracy.
In the following, we consider an example on the grid-based approximation to the
Schrödinger equation for the hydrogen atom (see [156]), that is, we verify the proposed
algorithms for the Hartree–Fock equation in the simplest case of the hydrogen atom

1 1
ℋψ = λψ, ℋ=− Δ+ , x ∈ ℝ3 , (9.15)
2 ‖x‖

which has the exact solution ψ = e−‖x‖ /√π, λ = −1/2.

Example 9.1. Consider the traditional expansion of the solution using the ten s-type
primitive Gaussian functions from the cc-pV6Z basis set [234, 265]

Nb
ψ(x) ≈ ∑ ck φk (x), Nb = 10, x ∈ ℝ3 ,
k=1

which leads to the Galerkin equation corresponding to (9.15) with

1 1
F = ⟨ℋg k , g m ⟩ := − ⟨Δg k , g m ⟩ + ⟨ g , g ⟩, k, m = 1, . . . Nb ,
2 ‖x‖ k m

with respect to the Galerkin basis {g k }. We choose the appropriate size of the compu-
tational box as b ≈ 8 Å and discretize {g k } using N × N × N Cartesian grid, obtain-
ing the canonical rank-1 tensor representation Gk of the basis functions. Then, the
kinetic energy and the nuclear potential parts of the Fock operator are computed by
(9.4) and (9.14).

Table 9.2, line (1), presents numerical errors in energy |λ − λh | of the grid-based
calculations using the cc-pV6Z basis set of Nb = 10 Gaussians generated by Molpro
[299], providing an accuracy of order ∼10−6 . Notice that this accuracy is achieved al-
ready at the grid-size N = 8192, hence, further grid refinement does not improve the
results.

Example 9.2. Here, we study the effect of basis optimization by adding an auxiliary
basis function to the Gaussian basis set from the previous example, thus increasing
the number of basis functions to Nb = 11. The second line (2) in Table 9.2 shows
improvement of accuracy for the basis augmented by a rank-1 approximation to the
138 | 9 Grid-based core Hamiltonian

Table 9.2: Examples 9.1–9.3 for hydrogen atom: |λ − λh | vs. grid size N3 for (1) the discretized basis
of Nb = 10 Gaussians, (2) 11 basis functions consisting of Gaussians augmented by a rank-1 func-
tion φ0 , (3) discretized single rank-Rb Slater function.

N3 10243 20483 40963 81923 16 3843 32 7683

(1) |λ − λh | 4.1 ⋅ 10−4 1.0 ⋅ 10−4 2.7 ⋅ 10−5 7.5 ⋅ 10−6 2.4 ⋅ 10−6 1.0 ⋅ 10−6
(2) |λ − λh | 1.5 ⋅ 10−5 7.2 ⋅ 10−6 2.7 ⋅ 10−6 1.1 ⋅ 10−6 8.0 ⋅ 10−7 7.8 ⋅ 10−7
(3) |λ − λh | 1.0 ⋅ 10−4 2.7 ⋅ 10−5 6.8 ⋅ 10−6 1.7 ⋅ 10−6 4.3 ⋅ 10−7 –

Slater function given by the grid representation of φ0 = e−(|x1 |+|x2 |+|x3 |) . Augmenting by
a piecewise linear hat function of the type ξi centered at the origin gives similar results
as for φ0 .

Example 9.3. In this example, we present computations with the controlled accuracy
using a single rank-Rb basis function generated by the sinc-approximation to the Slater
function. Using the Laplace transform

√α
G(ρ) = e−2√αρ = ∫ τ−3/2 exp(−α/τ − ρτ)dτ,
√π
0

the Slater function can be represented as a rank-R canonical tensor by computing the
sinc-quadrature decomposition [161, 163] and setting ρ = x12 + x22 + x32 :

√α L 3
G(ρ) ≈ ∑ wk τk−3/2 exp(−α/τk ) ∏ exp(−τk xℓ2 ),
√π k=−L ℓ=1

where τk = ekhL , wk = hL τk , hL = C0 log L/L. The accuracy of approximation is con-


trolled by choosing the number of quadrature points L. In this example, we have only
one basis function in a set, an approximate Slater function, but represented by the
canonical tensor of rank Rb ≤ 2L + 1. Thus, each of the matrices AG computed by (9.7)
and VG is of size 1 × 1. Table 9.2 (3) shows accuracy of the solution to the Hartree–Fock
equation for the hydrogen atom using one approximate Slater basis function.

Table 6.3 in [156] presents the Richardson extrapolation for Examples 9.1 and 9.3.
Due to noticeable convergence rate of order O(h2 ), the Richardson extrapolation (RE)
gives further improvement of the accuracy up to O(h3 ). It can be seen in Table 6.3,
[156], that the Richardson extrapolation for the results of Example 9.3 gives accuracy
of order 10−7 , beginning from the grid size 4096. Note that with the choice L = 60, the
accuracy is improved on one order of magnitude compared to those obtained for the
standard Gaussian basis set in Example 9.1.
Table 9.3 presents numerical examples of the grid-based approximation to the
Galerkin matrices for the Laplace operator AG and nuclear potential VG using (9.4)
and (9.14) for C2 H5 OH molecule. The mesh size of the N × N × N Cartesian grid ranges
9.3 Numerical verification for the core Hamiltonian | 139

from h = 0.0036 au (atomic units) corresponding to N = 8192 up to h = 2.2 ⋅ 10−4 au for


N = 131 072.
Throughout the tables we show the relative Frobenius norms of the differences
Er(AG ) and Er(VG ) in the corresponding Galerkin matrix elements for the Laplace and
nuclear potential operators, respectively, where

‖Ag − AG ‖ ‖Vg − VG ‖
Er(AG ) = , Er(VG ) = .
‖Ag ‖ ‖Vg ‖

The quadratic convergence of both quantities along the line of dyadic grid refinement
is in good agreement with the theoretical error estimates O(h2 ). Therefore, the employ-
ment of the Richardson approximation providing the error

4 ⋅ VG,h − VG,2h
ERi,2h,h = Er( )
3

suggests further improvement of the accuracy up to order O(h4 ) for the Laplace opera-
tor. The “RE” lines in Table 9.3 demonstrate the results of the Richardson extrapolation
applied to corresponding quantities at the adjacent grids.
Note that for the grid-based representation of the collective nuclear potential Pc ,
the univariate grid size n can be noticeably smaller than the size of the grid used for
the piecewise linear discretization for the Laplace operator.

Table 9.3: Ethanol (C2 H5 OH): accuracy Er(AG ) and Er(VG ) of the Galerkin matrices AG and VG corre-
sponding to the Laplace and the nuclear potential operators, respectively, using the discretized
basis of 123 primitive Gaussians (from the cc-pVDZ set [75, 265]).

p 13 14 15 16 17
N3 = 23p 81923 16 3843 32 7683 65 5363 131 0723
Er(AG ) 0.032 0.0083 0.0021 5.2 ⋅ 10−4 1.3 ⋅ 10−4
RE – 4.0 ⋅ 10−4 3.3 ⋅ 10−5 6.0 ⋅ 10−6 5.0 ⋅ 10−8
Er(VG ) 0.024 0.0083 0.0011 3.1 ⋅ 10−4
RE – 0.0031 0.0013 5.9 ⋅ 10−5

Figure 9.2 displays the nuclear potential for the molecule C2 H5 OH (ethanol) computed
in a box [−b, b]3 with b = 16 au. We show two cross-sections of the 3D function at the
level x = 0.0625 au and of the permuted function at the level y = −0.3125 au. It can be
seen from the left figure that three non-hydrogen atoms with the largest charges (two
Carbon atoms with Z = 6 and one Oxygen atom with Z = 8) are placed on the plane
x = 0. The right figure shows the location close to one of Hydrogen atoms.
The error ε > 0 arising due to the separable approximation of the nuclear po-
tential is controlled by the rank parameter of the nuclear potential RP = rank(Pc ).
Now letting rank(Gm ) = Rm implies that each matrix element is to be computed with
140 | 9 Grid-based core Hamiltonian

Figure 9.2: Nuclear potential Pc for the C2 H5 OH molecule, shown for the cross sections along x-axis
at the level x = 0.0625 au and along y-axis at level y = 1.6 au.

linear complexity in n, O(Rk Rm RP n). The almost exponential convergence of the rank
approximation in RP allows us the choice RP = O(|log ε|).
The maximum computational time for AG with N 3 = 131 0723 is of the order of
hundred seconds in MATLAB. For the coarser grid with N 3 = 81923 , CPU times are in
the range of several seconds for both AG and VG .
Comprehensive error estimates for the grid-based calculations of the core Hamil-
tonian are formulated in [156], where a number of numerical experiments for various
molecules is presented as well.
10 Tensor factorization of grid-based two-electron
integrals
10.1 General introduction
The efficient tensor-structured method for the grid-based calculation of the two-
electron integrals (TEI) tensor was introduced by V. Khoromskaia, B. Khoromskij, and
R. Schneider in 2012 (see [157]). In this chapter, following [157, 150], we describe the
fast algorithm for the grid-based computation of the fourth-order TEI tensor in a form
of the Cholesky factorization by using the grid-based algebraic 1D “density fitting”
scheme, which applies to the products of basis functions. It is worth to note, that the
described approach does not require calculation of the full TEI matrix, but only re-
lies on computation of its few selected columns evaluated by using 1D density fitting
factorizations (see Remark 10.3).
Imposing the low-rank tensor representation of the product basis functions and
the Newton convolving kernel, all discretized on large n × n × n Cartesian grid, the 3D
integral transforms are calculated in O(n log n) complexity. This scheme provides the
storage for TEI of the order of O(Nb3 ) in the number of basis functions Nb .
The TEI tensor, also known as the Fock integrals or electron repulsion integrals,
is the principal ingredient in electronic and molecular structure calculations. In par-
ticular, the corresponding coefficient tensor arises in ab initio Hartree–Fock (HF) cal-
culations, in post Hartree–Fock models (MP2, CCSD, Jastrow factors, etc.), and in the
core Hamiltonian appearing in FCI-DMRG calculations [6, 298, 241, 128].
Given the finite basis set {gμ }1≤μ≤Nb , gμ ∈ H 1 (ℝ3 ), the associated fourth-order two-
electron integrals tensor B = [bμνκλ ] ∈ ℝNb ×Nb ×Nb ×Nb is defined entrywise by

gμ (x)gν (x)gκ (y)gλ (y)


bμνκλ = ∫ ∫ dxdy, μ, ν, κ, λ ∈ {1, . . . , Nb } =: Ib . (10.1)
‖x − y‖
ℝ3 ℝ3

The fast and accurate evaluation and effective storage of the fourth-order TEI ten-
sor B of size Nb4 is the challenging computational problem since it includes multiple 3D
convolutions of the Newton kernel 1/‖x − y‖, x, y ∈ ℝ3 , with strongly varying product-
basis functions. Hence, in the limit of large Nb , the efficient numerical treatment and
storage of the TEI tensor is considered as one of the central tasks in electronic structure
calculations [247].
The traditional analytical integration using the representation of electronic or-
bitals in a Gaussian-type basis is the basement of most ab initio quantum chemical
packages. Hence, the choice of a basis set {gμ }1≤μ≤Nb is essentially restricted by the
“analytic” integrability for efficient computations of the tensor entries represented by
6D integrals in (10.1). This approach possesses intrinsic limitations concerning the
non-alternative constraint to the Gaussian-type basis functions, which may become

https://doi.org/10.1515/9783110365832-010
142 | 10 Tensor factorization of grid-based two-electron integrals

unstable and redundant for higher accuracy, larger molecules, or when considering
heavy nuclei.
It is known in quantum chemistry simulations [17, 298, 303] that, in the case of
compact molecules, the (pivoted) incomplete Cholesky factorization of the Nb2 × Nb2
TEI matrix unfolding

B = [bμν;κλ ] := mat(B) over (Ib ⊗ Ib ) × (Ib ⊗ Ib ) (10.2)

reduces the asymptotic storage of the resultant low-rank approximation to O(Nb3 ).


It was observed in numerical experiments that the particular rank-bound in the
Cholesky decomposition scales linearly in Nb , depending mildly (e. g., logarithmi-
cally) on the error in the rank truncation. We refer to [130, 99, 121, 15, 286, 255] for
more detail on the algebraic aspects of matrix Cholesky decomposition and the related
ACA techniques. The Cholesky decomposition is applicable since the TEI matrix B is,
indeed, the symmetric Gramm matrix of the product basis set {gμ gν } in the Coulomb
1
metric ⟨⋅, ‖x−y‖ ⋅⟩, ensuring its positive semidefiniteness. In some cases it is possible to
reduce the storage, even to O(Nb2 log Nb ), taking into account the pointwise sparsity of
matrix B in calculation of rather large extended systems [298].
For the Cholesky decomposition of a matrix B, we constructed in [157] the alge-
braically optimized redundancy-free factorization to the TEI matrix B, based on the
reduced higher-order SVD [174], to obtain the low-rank separable representation of
the discretized basis functions {gμ gν }. Numerical experiments show that this mini-
mizes the dimension of dominating subspace in span{gμ gν } to RG ≤ Nb , which allows
one to reduce the number of 3D convolutions (by the order of magnitude) from O(Nb2 )
to RG . Combined with the quantized-canonical tensor decompositions of long spatial
n-vectors, this leads to the logarithmic scaling in n for storage O(RG log n + Nb2 RG ). An
essential compression rate via the QTT approximation is observed in numerical experi-
ments even for compact molecules, becoming stronger for more stretched compounds.
Computation of the rank-RB , Cholesky decomposition employs only RB = O(Nb )
selected columns in the TEI matrix B, calculated from precomputed factorizations
of this matrix. We show by numerical experiments that each long Nb2 -vector of the
L-factor in the Cholesky LLT -decomposition can be further compressed using the
quantized-TT (QTT) approximation, reducing the total storage from O(Nb3 ) to O(Nb Norb 2
),
where the number of electron orbitals Norb usually satisfies Nb ∼ 10Norb .
The presented grid-based approach benefits from the fast O(n log n) tensor-
product convolution with the 6D Newton kernel over a large n3 × n3 grid [166], which
has already proved the numerical efficiency in the evaluation of the Hartree and ex-
change integrals [174, 145, 187]. However, in these papers, both the Coulomb and
exchange operators are calculated directly on the fly at each DIIS iteration, and thus
the use of TEI was avoided at the expense of time loops.
Recall that the beneficial feature of the grid-based tensor-structured methods is
that it substitutes the 3D numerical integration by multilinear algebraic procedures
10.2 Grid-based tensor representation of TEI in the full product basis | 143

like the scalar, Hadamard, and convolution products with linear 1D complexity O(n).
On the one hand, this weak dependence on the grid-size is the ultimate payoff for gen-
erality, in the sense that rather general approximating basis sets may be equally used
instead of analytically integrable Gaussians. On the other hand, the approach also
serves for structural simplicity of implementation, since the topology of the molecule
is caught without any physical insight, and only by the algebraically determined rank
parameters of the fully grid-based numerical scheme.
Due to O(n log n) complexity of the algorithms, there are rather weak practical re-
strictions on the grid-size n, allowing calculations on really large n × n × n 3D Cartesian
grids in the range n ∼ 103 –105 , thereby avoiding grid refinement. The corresponding
mesh sizes enable high resolution of the order of the size of atomic nuclei. For stor-
age consuming operations, the numerical expense can be reduced to logarithmic level
O(log n) by using the QTT representation of the discretized 3D basis functions and their
convolutions.
In [157] it is shown that the rank-O(Nb ) Cholesky decomposition of the TEI ma-
trix B, combined with the canonical-QTT data compression of long vectors, allows the
reduction of the asymptotic complexity of grid-based tensor calculations in HF and
some post-HF models. Alternative approaches to optimization of the HF, MPx, CCSD,
and other post-HF models can be based on using physical insight to sparsify the TEI
tensor B by zeroing-out all “small” elements [298, 241, 6, 268, 311].

10.2 Grid-based tensor representation of TEI in the full product


basis
We assume that all basis functions {gμ }1≤μ≤Nb have a support in a finite box Ω =
[−b, b]3 ⊂ ℝ3 , and, for ease of presentation, we consider the case with rank(gμ ) = 1
(see Remark 10.2). The size of the computational box is chosen in such a way that the
truncated part of the most slowly decaying basis function does not exceed the given
tolerance ε > 0. Taking into account the exponential decay in molecular orbitals, the
parameter b > 0 is chosen to be only few times larger than the molecular size.
Introduce the uniform n × n × n rectangular grid in [−b, b]3 . Then each basis func-
tion gμ (x) can be discretized by a three-dimensional tensor

n
Gμ = [gμ (x1 (i), x2 (j), x3 (k))]i,j,k=1 ∈ ℝn×n×n , μ = 1, . . . , Nb ,

obtained by sampling of gμ (x) over the midpoints (x1 (i), x2 (j), x3 (k)) of the grid-cells
with index (i, j, k). Given the discretized basis function Gμ , (μ = 1, . . . , Nb ), we assume
(without loss of generality) that it is a rank-1 tensor, rank(Gμ ) = 1, i. e.,

n×n×n
Gμ = g(1)
μ ⊗ gμ ⊗ gμ ∈ ℝ
(2) (3)
(10.3)
144 | 10 Tensor factorization of grid-based two-electron integrals

n
with the skeleton vectors g(ℓ)
μ ∈ ℝ , ℓ = 1, 2, 3, obtained as projections of the basis
functions gμ (x) on the uniform grid. Then the entries of B can be represented by using
the tensor scalar product over the “grid” indices

bμνκλ = ⟨Gμν , Hκλ ⟩n⊗3 , (10.4)

where

Gμν = Gμ ⊙ Gν ∈ ℝn , Hκλ = PN ∗ Gκλ ∈ ℝn ,


⊗3 ⊗3
(10.5)

μ, ν, κ, λ ∈ {1, . . . , Nb }, with the rank-RN canonical tensor PN ∈ ℝn approximating


⊗3

1
the Newton potential ‖x‖ (see Section 6.1). We recall that ∗ stands for the 3D tensor
convolution (5.11) and ⊙ denotes the 3D Hadamard product (2.37).
The element-wise accuracy of the tensor representation (10.4) is estimated by
O(h2 ), where h = 2b/n is the step-size of the Cartesian grid [166]. The Richardson
extrapolation reduces the error to O(h3 ).
It is worth to emphasize that in our scheme the n⊗3 tensor Cartesian grid does
not depend on the positions of nuclei in a molecule. Consequently, the simultaneous
rotation and translation of the nuclei positions still preserve the asymptotic approxi-
mation error on the level of O(h2 ).

Remark 10.1. The TEI tensor B has multiple symmetries

bμνκλ = bνμκλ = bμνλκ = bκλμν , μ, ν, κ, λ ∈ {1, . . . , Nb }.

The result is a direct consequence of definition (10.1) and symmetry of the convo-
lution product. The above symmetry relation allows reducing the number of precom-
puted entries in the full TEI tensor to Nb4 /8. This property is also mentioned in [291].
Let us introduce the 5th-order tensors

G = [Gμν ] ∈ ℝNb ×Nb ×n and H = [Hκλ ] ∈ ℝNb ×Nb ×n .


⊗3 ⊗3

Then (10.4) is equivalent to the contracted product representation over n⊗3 -grid in-
dexes

B = G ×n⊗3 (PN ∗n⊗3 G) = ⟨G, PN ∗n⊗3 G⟩n⊗3 = ⟨G, H⟩n⊗3 , (10.6)

where the right-hand part is recognized as the discrete counterpart of the Galerkin
representation (10.1) in the full product basis. When using the full grid calculations,
the total storage cost for the n × n × n product-basis tensor G and its convolution H
N (N +1) N (N +1)
amounts to 3 b 2b n and 3RN b 2b n, respectively. The numerical cost of Nb2 tensor-
product convolutions to compute H is estimated by O(RN Nb2 n log n) [166]. Based on
representation (10.6), each entry in the TEI tensor B of size Nb4 can be calculated with
the cost O(RN n), which might be too expensive for the large grid-size n. Thus a direct
tensor calculation of TEI seems to be unfeasible except for small molecules, even when
using the QTT tensor representation of the basis functions, as it was shown in [157].
10.3 Redundancy-free factorization of the TEI matrix B | 145

Remark 10.2. If the separation rank of a basis set is larger than 1, then the complexity
of scalar products in (10.6) increases quadratically in the rank parameter. However,
the use of basis functions with the greater than one rank parameter (say, Slater-type
functions) can be motivated by the reduction of the basis size Nb , which has a fourth-
order effect on the complexity.

10.3 Redundancy-free factorization of the TEI matrix B


The efficient solution of the TEI problem introduced in [157] is based on the construc-
tion of the redundancy-free modified product basis by an algebraic “1D density fitting”
and consequent Cholesky factorization to B. This approach minimizes the number of
required convolution products in (10.6) by using the reduced HOSVD (RHOSVD), in-
troduced in [174], for tensor-rank optimization in the canonical format. The RHOSVD-
type factorization applied to the 3D canonical tensor G allows us to represent it in a
“squeezed” form, in an optimized basis, obtained in a “black box” algebraic way.

10.3.1 Grid-based 1D density fitting scheme

For every space variable ℓ = 1, 2, 3, we construct the side matrices corresponding to


products of basis functions
2
n×Nb n
G(ℓ) = [g(ℓ)
μ ⊙ gν ]1≤μ,ν≤N ∈ ℝ
(ℓ)
, g(ℓ)
μ , gν ∈ ℝ ,
(ℓ)
(10.7)
b

which are associated with a product-basis tensor


2
G = [Gμν ] := [Gμ ⊙ Gν ]1≤μ,ν≤Nb ∈ ℝn×n×n×Nb . (10.8)

The matrix G(ℓ) is composed by concatenation of Hadamard products of the skeleton


vectors g(ℓ)
μ ⊙ gν of G in mode ℓ.
(ℓ)

This representation serves to minimize the large number of convolution products


N (N +1)
in (10.1), that is, b 2b . The approach in [157] is based on the truncated SVD for find-
ing the minimal set of dominating columns in the large site matrix G(ℓ) , ℓ = 1, 2, 3 of
size n × Nb2 , representing the full (and highly redundant) set of product basis functions
sampled on a grid. Given a tolerance ε > 0, we compute the ε-truncated SVD-based
left-orthogonal decomposition of G(ℓ) (1D density fitting),

T T󵄩
G(ℓ) ≅ U (ℓ) V (ℓ) such that 󵄩󵄩󵄩G(ℓ) − U (ℓ) V (ℓ) 󵄩󵄩󵄩F ≤ ε, ℓ = 1, 2, 3, (10.9)
󵄩

2
with an orthogonal matrix U (ℓ) ∈ ℝn×Rℓ and a matrix V (ℓ) ∈ ℝNb ×Rℓ , where Rℓ is the
corresponding matrix ε-rank. Here, U (ℓ) , V (ℓ) represent the so-called left and right
146 | 10 Tensor factorization of grid-based two-electron integrals

redundancy-free basis sets, where only the grid-depending part U (ℓ) is to be used in
the convolution products.
2
Since the direct SVD of large rectangular matrices G(ℓ) ∈ ℝn×Nb can be prohibitively
expensive, even for the moderate size molecules (n ≥ 213 , Nb ≥ 200), the five-step algo-
rithm was introduced in [157, 150], which reduces computational and storage costs to
T
compute the low-rank approximation G(ℓ) ≅ U (ℓ) V (ℓ) with the guaranteed tolerance
ε > 0, see Algorithm 1.

Algorithm 1 Fast low-rank ε-approximation of G(ℓ) .


2
Input: rectangular matrices G(ℓ) ∈ ℝn×Nb , ℓ = 1, 2, 3, tolerance ε > 0.
̃ (ℓ) ∈ ℝn×R̃ ℓ of the truncated Cholesky decomposition to the
(1) Find the factor U
T
Gram-matrix G(ℓ) G(ℓ) ≈ U ̃ (ℓ) )T by ε-thresholding the diagonal elements.
̃ (ℓ) (U
(2) Orthogonalize the column space of U ̃ (ℓ) by QR decomposition U
̃ (ℓ) := U (ℓ) RU .
(3) Project the initial matrix onto U (ℓ) , Ṽ (ℓ) := G(ℓ) T U (ℓ) (can be executed in the data-
sparse formats, e. g., in QTT).
(4) QR decomposition V ̃ (ℓ) := V (ℓ) RV to obtain the orthogonal Q-factor V (ℓ) .
(5) Rank reduction (R ̃ ℓ to Rℓ ) by SVD of RV ∈ ℝR̃ ℓ ×R̃ ℓ ; update U (ℓ) and V (ℓ) .
T
Output: Rank-Rℓ decomposition of G(ℓ) ≈ U (ℓ) V (ℓ) with the orthogonal matrix U (ℓ) .

Numerical experiments show that the Frobenius error of these rank decompositions
decays exponentially in the rank parameter Rℓ :

(ℓ) (ℓ) T 󵄩
󵄩󵄩 ≤ Ce−γℓ Rℓ ,
󵄩󵄩G − U V ℓ = 1, 2, 3, γℓ > 0.
󵄩󵄩 (ℓ)
󵄩F

Figure 10.1 illustrates the exponential decay in singular values of G(ℓ) for several mod-
erate size molecules.
Step (3) in Algorithm 1 requires an access to the full matrix G(ℓ) . However, when
this matrix allows data-sparse representation, the respective matrix–vector multipli-

Figure 10.1: Singular values of G(ℓ) for ℓ = 1, 2, 3: NH3 (left), glycine (middle) and Alanine (right)
molecules with the numbers Nb and Norb equal to 48, 5; 170, 20 and 211, 24, respectively.
10.3 Redundancy-free factorization of the TEI matrix B | 147

cations can be implemented with reduced cost. For example, given the low-rank QTT
representation of the column vectors in G(ℓ) , the matrix–matrix product at Step (3) can
be implemented in O(Nb2 Rℓ log n) operations. Notice that the QTT ranks of the column
vectors are estimated in numerical experiments by O(1) for all molecular systems con-
sidered so far, see also [68] concerning the QTT rank estimate of the Gaussian.
Another advantageous feature is due to a perfect parallel structure of the matrix–
vector multiplication procedure at Step (3). Here, the algebraically optimized separa-
tion ranks Rℓ are mostly determined by the geometry of a molecule, whereas the num-
ber Nb2 −Rℓ indicates the measure of redundancy in the product basis set. In numerical
experiments we observe Rℓ ≤ Nb and Rℓ ≪ n for large n.
Figure 10.2, left, represents the ε-rank Rℓ , ℓ = 1, 2, 3, and RB , computed on the ex-
amples of some compact molecules with ε = 10−6 . We observe that the Cholesky rank
of B, RB (see Section 10.3.2) is a multiple of Nb with a factor ∼6 (see also Figure 10.3).
Remarkably, the RHOSVD separation ranks Rℓ ≤ Nb remain to be very weakly depen-
dent on Nb , but primarily depend on the topology of a molecule.
Figure 10.2 (right) provides average QTT ranks of column-vectors in U (1) ∈ ℝn×R1
for NH3 , H2 O2 , N2 H4 , and C2 H5 OH molecules. Again, surprisingly, the rank portraits

Figure 10.2: Left: ε-ranks Rℓ and RB for HF, NH3 , H2 O2 , N2 H4 , and C2 H5 OH molecules versus the num-
ber of basis functions Nb = 34, 48, 68, 82, and 123, respectively. Right: Average QTT ε-ranks of
column-vectors in U (1) ∈ ℝn×Rℓ for NH3 , H2 O2 , N2 H4 , and C2 H5 OH molecules, ε = 10−6 .

Table 10.1: Average QTT ε-ranks of U (1) and V (1) in G(1) -factorization, ε = 10−6 .

Molecules NH3 H2 O2 N2 H4 C2 H5 OH

Nb ; Norb 48; 5 68; 9 82; 9 123; 13


Av. QTT rank of U (1) 7.3 7.9 7.5 7.6
Av. QTT rank of V (1) 15 21 24 37
(Av. QTT rank of V (1) )/Norb 3 2.3 2.6 2.85
148 | 10 Tensor factorization of grid-based two-electron integrals

appear to be nearly the same for different molecules, and the average rank over all
indexes m = 1, . . . , R1 is a small constant, about r0 ⋍ 7. The more detailed results are
listed in Table 10.1.

10.3.2 Redundancy-free factorization of the TEI matrix B

Now we are in a position to represent the TEI matrix B in the factorized form using
a reduced set of convolving functions. First, we recall that using the scalar product
representation of n × n × n arrays, we can rewrite the discretized integrals (10.1) in
terms of tensor operations as in (10.4), (10.5). Then using representations (10.7) and
(10.8) for each fixed multiindex μνκλ, we arrive at the following tensor factorization of
B [157]:
RN
T
B = ∑ ⊙3ℓ=1 G(ℓ) (p(ℓ)
k
∗n G(ℓ) ), (10.10)
k=1

where p(ℓ)
k
, ℓ = 1, 2, 3, are the column vectors in the side matrices of the rank-RN
1
canonical tensor representation PN of the Newton kernel ‖x‖ [166]. Substitution of
the side matrix decomposition (10.9) to (10.10) leads to the redundancy-free factorized
ε-approximation of the matrix B [157]:

RN RN
T T
B = ∑ ⊙3ℓ=1 G(ℓ) (p(ℓ)
k
∗n G(ℓ) ) ≅ ∑ ⊙3ℓ=1 V (ℓ) Mk(ℓ) V (ℓ) =: Bε , (10.11)
k=1 k=1

where V (ℓ) represents the corresponding right redundant free basis and
T
Mk(ℓ) = U (ℓ) (p(ℓ)
k
∗n U (ℓ) ) ∈ ℝRℓ ×Rℓ , k = 1, . . . , RN , (10.12)

stands for the Galerkin convolution matrix on the left redundant free basis U (ℓ) ,
ℓ = 1, 2, 3. We notice that equation (10.12) includes only Rℓ ≪ Nb2 convolution products.
The computational scheme for convolution matrices Mk(ℓ) is described in Algorithm 2.
Inspection of Algorithm 2 shows that the storage demand for representations (10.11)–
(10.12) can be estimated by RN ∑3ℓ=1 R2ℓ + Nb2 ∑3ℓ=1 Rℓ and O((RG + RN )n), respectively.

Remark 10.3. The redundancy-free factorization (10.11) is completely parametrized by


a set of thin matrices V (ℓ) and small convolution factor matrices Mk(ℓ) , ℓ = 1, 2 3, pre-
computed by the 1D density fitting scheme. With this parametrization one can easily
compute only the set of selected (required) columns in the matrix B by simple matrix-
vector multiplications, thus completely avoiding calculation of 3D convolution prod-
ucts. In this concern, we notice that the standard Cholesky decomposition algorithm
of TEI matrix B would require selected columns of this matrix of size Nb2 , where each
entry needs calculation of the 3D convolution as in (10.1).
10.3 Redundancy-free factorization of the TEI matrix B | 149

Algorithm 2 Computation of “convolution matrices” Mk(ℓ) .


T
Input: Rank-Rℓ approximate decompositions G(ℓ) ≈ U (ℓ) V (ℓ) , factor matrices P (ℓ) =
n×RN
1 , . . . , pR ] ∈ ℝ
[p(ℓ) (ℓ)
, ℓ = 1, 2, 3, in the rank-RN canonical tensor PN ∈ ℝn×n×n .
N

(1) For ℓ = 1, 2, 3, compute convolution products p(ℓ)


k
∗n U (ℓ) ∈ ℝn×Rℓ , k = 1, . . . , RN .
(2) For ℓ = 1, 2, 3, compute and store Galerkin projections onto the left redundant
T
free directional basis: Mk(ℓ) = U (ℓ) (p(ℓ)
k
∗n U (ℓ) ) ∈ ℝRℓ ×Rℓ .
Output: Right redundant free basis V (ℓ) ; set of Rℓ × Rℓ matrices Mk(ℓ) for ℓ = 1, 2, 3,
k = 1, . . . , RN .

The following lemma proves the complexity and error estimates for tensor representa-
tions (10.11)–(10.12). Given the ε-truncated SVD-based left-orthogonal decomposition
T
of G(ℓ) , G(ℓ) ≅ U (ℓ) V (ℓ) , ℓ = 1, 2, 3, with n × Rℓ and Nb2 × Rℓ matrices, U (ℓ) (orthogonal)
and V (ℓ) , respectively, we denote RG = max Rℓ .

Lemma 10.4 ([157, 150]). Given ε > 0, the redundancy-free factorized ε-approximations
to the matrix B (10.11) and to the convolution matrix (10.12) exhibit the following proper-
ties:
(A) The storage demand for factorizations (10.11) and (10.12) is estimated by
3 3
RN ∑ R2ℓ + Nb2 ∑ Rℓ , and O((RG + RN )n),
ℓ=1 ℓ=1

respectively. The numerical complexity of the ε-truncated representation (10.12) is


bounded by O(RN R2G n + RG RN n log n), where the second term includes the cost of
tensor convolutions in the canonical format.
(B) The ε-rank of the matrix Bε admits the following upper bound

3
rank(Bε ) ≤ min{Nb2 , RN ∏ Rℓ }. (10.13)
ℓ=1

T
(C) Denote Aℓ (k) = G(ℓ) (p(ℓ)
k
∗n G(ℓ) ). Then we have the following error estimate in the
Frobenius norm:
RN
󵄩2 󵄩
‖B − Bε ‖F ≤ 6ε max 󵄩󵄩󵄩G(ℓ) 󵄩󵄩󵄩F ∑ max 󵄩󵄩󵄩Aℓ (k)󵄩󵄩󵄩F 󵄩󵄩󵄩p(ℓ) (10.14)
󵄩 󵄩 󵄩 󵄩󵄩
ℓ ℓ k 󵄩 󵄩F .
k=1

Proof. (A) Using the Galerkin-type representation of the TEI tensor B as in (10.6), we
obtain
RN
T
B = mat(B) = ∑ ⊙3ℓ=1 G(ℓ) [p(ℓ)
k
∗n G(ℓ) ].
k=1
150 | 10 Tensor factorization of grid-based two-electron integrals

Plugging the truncated SVD factorization of G(ℓ) into the right-hand side leads to the
desired representation
RN
T T
Bε = ∑ ⊙3ℓ=1 V (ℓ) U (ℓ) [p(ℓ)
k
∗n (U (ℓ) V (ℓ) )]
k=1
RN
T T
= ∑ ⊙3ℓ=1 V (ℓ) [U (ℓ) (p(ℓ)
k
∗n U (ℓ) )]V (ℓ)
k=1
RN
T
= ∑ ⊙3ℓ=1 V (ℓ) Mk(ℓ) V (ℓ) . (10.15)
k=1

The storage cost for the RHOSVD-type factorization (10.15) to the Nb2 × Nb2 matrix B is
bounded by RN ∑3ℓ=1 R2ℓ + Nb2 ∑3ℓ=1 Rℓ independently on the grid-size n.
The computational complexity at this step is dominated by the cost of the reduced
T
Cholesky algorithm applied to the matrix G(ℓ) G(ℓ) that computes truncated SVD of the
side matrices G(ℓ) at the cost O(RG (Nb2 +n)) and by the total cost of convolution products
in (10.12), O(RN RG n log n).
(B) Using the rank properties of Hadamard product of matrices, it is easy to see
that (10.15) implies the direct ε-rank estimate for the matrix Bε as in (10.13), where Rℓ ,
ℓ = 1, 2, 3 characterizes the effective rank in “1D density fitting”.
(C) The error bound can be derived along the line of [174], Theorem 2.5(d), related
to the RHOSVD error analysis. Indeed, the approximation error can be represented
explicitly by
RN
T T
B − Bε = ∑ (⊙3ℓ=1 G(ℓ) p(ℓ)
k
∗n G(ℓ) − ⊙3ℓ=1 V (ℓ) U (ℓ) p(ℓ)
k
∗n U (ℓ) V (ℓ) ).
k=1
T T
̃ ℓ (k) = V (ℓ) U (ℓ) (p(ℓ) ∗n U (ℓ) V (ℓ) ). Then for each fixed k = 1, . . . RN , we have
Denote A k

‖Aℓ − A
̃ ℓ ‖ ≤ 2ε󵄩󵄩󵄩p(ℓ) 󵄩󵄩󵄩󵄩󵄩󵄩G(ℓ) 󵄩󵄩󵄩
󵄩 k 󵄩󵄩 󵄩 (10.16)
T
because of the stability in the Frobenius norm ‖U (ℓ) V (ℓ) ‖ ≤ ‖G(ℓ) ‖. Now, for fixed k,
we obtain

A1 ⊙ A2 ⊙ A3 − A
̃1 ⊙ A
̃2 ⊙ A
̃ 3 = A1 ⊙ A2 ⊙ A3 − Ã 1 ⊙ A2 ⊙ A3
+Ã 1 ⊙ A2 ⊙ A3 − Ã1 ⊙ A
̃ 2 ⊙ A3
+A
̃1 ⊙ A
̃ 2 ⊙ A3 − A
̃1 ⊙ A
̃2 ⊙ A
̃ 3.

Summing up the above representation over k = 1, . . . , RN and taking into account


(10.16), we arrive at the bound
RN
󵄩2 󵄩
‖B − Bε ‖F ≤ 6ε max 󵄩󵄩󵄩G(ℓ) 󵄩󵄩󵄩F ∑ max 󵄩󵄩󵄩Aℓ (k)󵄩󵄩󵄩F 󵄩󵄩󵄩p(ℓ) (10.17)
󵄩 󵄩 󵄩 󵄩󵄩
ℓ ℓ k 󵄩 󵄩F ,
k=1

which proves the result.


10.3 Redundancy-free factorization of the TEI matrix B | 151

Proof of Lemma 10.4 is constructive and outlines the way to an efficient imple-
mentation of (10.11), (10.12). Some numerical results on the performance of the corre-
sponding black-box algorithm are shown in Sections 10.3.3 and 11.4.
The RHOSVD factorization (10.11), (10.12) is reminiscent of the exact Galerkin rep-
resentation (10.6) in the right redundancy free basis, whereas matrices Mk(ℓ) play the
role of “directional” Galerkin projections of the Newton kernel onto the left redun-
dancy free basis. This factorization can be applied directly to fast calculation of the
reduced Cholesky decomposition of the matrix B considered in the next section.
Finally, we point out that our RHOSVD-type factorization can be viewed as the
algebraic tensor-structured counterpart of the density fitting scheme commonly used in
quantum chemistry [3, 217, 237]. We notice that in our approach the “1D density fitting”
is implemented independently for each space dimension, reducing the ε-ranks of the
dominating directional bases to the lowest possible value. The robust error control
in the proposed basis optimization approach is based on purely algebraic SVD-like
procedure that allows eliminating the redundancy in the product basis set up to given
precision ε > 0.
Further storage reduction can be achieved by the quantized-TT (QTT) approxima-
tion of the column vectors in U (ℓ) and V (ℓ) in (10.12). Specifically, the required storage
amounts to O((RG + RN ) log n) reals.
In some cases the representation (10.11) may provide the direct low-rank decom-
position of the matrix B. In fact, suppose that Rℓ ≤ Cℓ |log ε|Norb with constants Cℓ ≤ 1,
ℓ = 1, 2, 3. Then the ε-rank of the matrix B is bounded by

3
rank(Bε ) ≤ min{Nb2 , RN |log ε|3 Norb
3
∏ Cℓ }. (10.18)
ℓ=1

Indeed, in accordance to [157], we have the rank estimate rank(Bε ) ≤ min{Nb2 , RN ∏3ℓ=1 Rℓ },
which proves the statement.
Rank estimate (10.13) outlines the way to efficient implementation of (10.11),
(10.12). Here, the algebraically optimized directional separation ranks Rℓ , ℓ = 1, 2, 3,
are only determined by the entanglement properties of a molecule, whereas the num-
bers Nb2 − Rℓ indicate the measure of redundancy in the product basis set. Normally,
we have Rℓ ≪ n and Rℓ ≤ Nb , ℓ = 1, 2, 3. The asymptotic bound Rℓ ≤ Cℓ |log ε|Norb
can be seen in Figure 10.1. One can observe that in the case of glycine molecule, the
first mode-rank is much smaller than others, indicating the flattened shape of the
molecule. However, the a priori rank estimate (10.13) looks too pessimistic compared
to the results of numerical experiments, though in the case of flattened or extended
molecules (some of directional ranks are small), this estimate provides much lower
bound.
152 | 10 Tensor factorization of grid-based two-electron integrals

10.3.3 Low-rank Cholesky decomposition of the TEI matrix B

The Hartree–Fock calculations for the moderate size molecules are usually based on
the incomplete Cholesky decomposition [303, 130, 17] applied to the symmetric and
positive definite TEI matrix B,
2
B ≈ LLT , L ∈ ℝNb ×RB , (10.19)

where the separation rank RB ≪ Nb2 is of order O(Nb ). This decomposition can be ef-
ficiently computed by using the precomputed (off-line step) factorization of B as in
(10.11), which requires only a small number of adaptively chosen column vectors in B,
[157]. The detailed computational scheme is presented in Algorithm 3.
In this section, we describe the economical computational scheme introduced
in [157, 150], providing the O(Nb )-rank truncated Cholesky factorization of the TEI
matrix B with complexity O(Nb3 ). This approach requires only computation of the se-
lected columns in B, without the need to compute the whole TEI matrix. The Cholesky
scheme requires only O(Nb ) adaptively chosen columns in B, calculated on-line using
the results of redundancy-free factorization (10.11).
2
Further the complexity can be reduced to O(Norb Nb ) using the quantized repre-
sentation of the Cholesky vectors.
We denote the long indexes in the N × N (N = Nb2 ) matrix unfolding B by

i = vec(μ, ν) := (μ − 1)Nb + ν, j = vec(κ, λ), i, j ∈ IN := {1, . . . , N}.

Lemma 10.5 ([157]). The unfolding matrix B is symmetric and positive semidefinite.

Proof. The symmetry is enforced by the definition (see Lemma 10.1). The positive
semi-definiteness follows from the observation that the matrix B can be viewed as the
Galerkin matrix ⟨−Δ−1 ui , uj ⟩, i, j ∈ IN , in the finite product basis set {ui } = {gμ gν }, where
Δ−1 is the inverse of the self-adjoint and positive definite in H 1 (ℝ3 ) Laplacian operator
subject to the homogeneous Dirichlet boundary conditions as x → ∞.

We consider the ε-truncated Cholesky factorization of B ≈ Bε = LLT , where

T󵄩
󵄩󵄩B − LL 󵄩󵄩󵄩 ≤ Cε, L ∈ ℝN×RB .
󵄩󵄩

Based on the previous observation, we will postulate rather general ε-rank estimate
(in electronic structure calculations this conventional fact traces back to [17]); see nu-
merics on Figure 10.3.

Remark 10.6. Given a fixed truncation error ε > 0, for the Gaussian-type AO basis
functions, we have RB = rank(LLT ) ≤ CNb , where the constant C > 0 is independent
of Nb .
10.3 Redundancy-free factorization of the TEI matrix B | 153

Figure 10.3: Singular values of Bε = LLT for NH3 ,


H2 O2 , N2 H4 , and C2 H5 OH molecules with the num-
ber Nb of basis functions 48, 68, 82, and 123, re-
spectively.

Clearly, the fastest version of the numerical Cholesky decomposition is possible in


the case of precomputed full TEI tensor B. In this case the CPU time for the Cholesky
decomposition becomes negligible compared to those to compute the TEI tensor B.
However, the practical use of the algorithm is limited to the small basis sets because
of the large storage requirements Nb4 .
The following approach was introduced in [157] to compute the truncated Cho-
lesky decomposition with reduced storage demands by using the redundancy-free
RHOSVD-type factorization of B in form (10.11), see Remark 10.3. Using this represen-
tation, one can calculate the truncated Cholesky decomposition of B, calculating on
the fly a few columns and also diagonal elements in the TEI matrix B by the following
cheap tensor operations:
RN
T
B( : , j∗ ) = ∑ ⊙3ℓ=1 V (ℓ) Mk(ℓ) V (ℓ) ( : , j∗ )
k=1

and
RN
T
B(i, i) = ∑ ⊙3ℓ=1 V (ℓ) (i, : )Mk(ℓ) V (ℓ) ( : , i) ,
k=1

respectively, as shown in Algorithm 3, [150].


The results of our numerical experiments using Matlab implementation of Algo-
rithm 3 indicate that the truncated Cholesky decomposition with the separation rank
O(Nb ) ensures the satisfactory numerical precision ε > 0 of order 10−5 –10−6 . The
refined rank estimate O(Nb |log ε|) was observed in numerical experiments for every
molecular system we calculated so far.
The factorization (10.11) essentially reduces the amount of work on the “prepro-
cessing” stage in the limit of large Nb (see Lemma 10.4) since the number of convolu-
tions is now estimated by O(Nb ) instead of Nb2 .
Other methods of tensor contraction in TEI calculations have been discussed in
[134, 233].
154 | 10 Tensor factorization of grid-based two-electron integrals

Algorithm 3 Truncated Cholesky factorization of the matrix B ∈ ℝN×N , N = Nb2 .


Input: Right RF basis V (ℓ) ; set of Rℓ × Rℓ matrices Mk(ℓ) for ℓ = 1, 2, 3, k = 1, . . . , RN ,
error tolerance ε > 0;
RN T
(1) Compute the diagonal b = diag(B): B(i, i) = ∑k=1 ⊙3ℓ=1 V (ℓ) (i, : )Mk(ℓ) V (ℓ) ( : , i) ;
(2) Set r = 1, err = ‖b‖1 and initialize π = {1, . . . , N};
While err > ε perform (3)–(9)
(3) Find m = arg max{b(πj ) : j = r, r + 1, . . . , N}; update π by swapping πr and πm ;
(4) Set ℓr,πr = √b(πr );
For r + 1 ≤ m ≤ N perform (5)–(7)
R T
(5) Compute the entire column of B via B( : , r) = ∑k=1N
⊙3ℓ=1 V (ℓ) Mk(ℓ) V (ℓ) ( : , r) ;
(6) Compute the L-column ℓr,πm = (B(r, πm ) − ∑r−1
j=1 ℓj,πr ℓj,πm );
2
(7) Update the stored diagonal b(πm ) = b(πm ) − ℓr,π m
;
(8) Compute err = ∑Nj=r+1 b(πm );
(9) Increase r = r + 1;
Output: Low-rank decomposition of B, Bε = LLT such that tr(B − Bε ) ≤ ε.

10.4 On QTT compression to the Cholesky factor L


This section collects the important observations obtained in numerical experiments.
In the QTT analysis of the TEI matrix B for several moderate size compact molecules,
we revealed that, for fixed approximation error ε > 0, the average QTT ranks of the
Cholesky vectors indicate the behavior rQTT ∼ kchol Norb with kchol ≤ 3. From this nu-
merics we make a conclusion that the factor kchol = 3 is due to the spatial dimensional-
ity of the considered molecular system (or problem) observed for compact compounds,
and it becomes closer to 2 for more stretched molecules; see Table 10.2.

Table 10.2: Average QTT ranks of the Cholesky vectors vs. Norb for some molecules.

Molecule HF H2 O NH3 H2 O2 N2 H4 C2 H5 OH

Norb 5 5 5 9 9 13
rQTT 12 13.6 15 21 24 37
kchol = rQTT /Norb 2.4 2.7 3 2.3 2.6 2.85

Based on this numerical experiments, we formulate our hypothesis [157]:

Hypothesis 10.7. The structural complexity of the Cholesky factor L of the matrix B in
the QTT representation is characterized by the rank parameter

rQTT (L) ≅ 3Norb .


10.4 On QTT compression to the Cholesky factor L | 155

Figure 10.4: (Left): Average QTT ranks of the column vectors in L, rQTT (L), and in the vectorized coeffi-
cient matrix, rQTT (C), for several compact molecules. The “constant” lines at the level 2.35–2.85 indi-
cate the corresponding ratios rQTT (L)/Norb and rQTT (C)/Norb for the respective molecule. (Right): QTT
ranks of skeleton vectors in factorization (10.11)–(10.12) for H2 O, N2 H4 , C2 H5 OH, C2 H5 NO2 (glycine),
C3 H7 NO2 (alanine) calculations, with Norb equal to 5, 9, 13, 20, and 24, respectively.

The effective representation complexity of the Cholesky factor L ∈ ℝN×RB is esti-


mated by
2
9RB Norb ≪ RB Nb2 .

Assuming that the conventional relation Nb ≈ 10Norb is fulfilled, we conclude that


the reduction factor in the storage size with QTT representation of L is about 10−1 (for
QTT representation, we used the TT-Toolbox 2.21 ).
Similar rank characterizations have been observed by the QTT analysis of U (ℓ) and
V factors in the rank factorization of the initial product bases tensors G(ℓ) , ℓ = 1, 2, 3
(ℓ)

(see Table 10.1). In particular, the average QTT ranks of the reduced higher-order SVD
2
factors V (ℓ) ∈ ℝNb ×Rℓ in the rank factorization of the initial product bases tensors G(ℓ) ,
ℓ = 1, 2, 3, have almost the same rank scaling, rQTT (V (ℓ) ) ≤ 3Norb , as a factor kchol ≈ 3
in the Cholesky decomposition of the matrix B (see Table 10.1). Hence, the QTT repre-
sentation complexity for the factor V (ℓ) in (10.11) can be reduced to

2 1 2
10Norb RG ≈ N R .
10 b G

Figure 10.4 illustrates QTT-ranks behavior versus Norb for skeleton vectors in fac-
torization (10.11) for some compact molecules with different numbers of electron
orbitals Norb .

1 Free download from http://github.com/oseledets/TT-Toolbox, Skolkovo, Moscow.


11 Fast grid-based Hartree–Fock solver
by factorized TEI
In this section, we describe the fast black-box Hartree–Fock solver1 by the rank-
structured tensor numerical methods introduced in [147]. It follows the conventional
HF computational scheme, which relies on the precomputed grid-based TEI in a form
of low-rank factorization. The DIIS- type iteration [238] for solving the nonlinear
eigenvalue problem runs by updating the density matrix that gainfully employes the
factorized representation of TEI tensor. The computational scheme is performed in a
“black box” way; one has to specify only the size of a computational domain and the
related n × n × n 3D Cartesian grid, the skeleton vectors of the Galerkin basis functions
discretized on that grid, and the coordinates and charges of nuclei in a molecule. The
iterative solution process is terminated by chosen ε-threshold defining the accuracy
of the rank-truncation operations.
In this approach, the grid-based tensor-structured calculation of factorized TEI
tensor and of the core Hamiltonian are employed [157, 150, 156]. The routine size of
the 3D grid for TEI calculations in MATLAB on a terminal is of order n3 ≃ 1014 (with
n = 32 768), yielding fine mesh resolution of order h ≃ 10−4 Å.
The performance of this Hartree-Fock solver in MATLAB implementation [147] is
comparable with the standard quantum chemical packages both in computation time
and accuracy. Ab initio Hartree–Fock iteration for large compact molecules, up to
amino acids glycine (C2 H5 NO2 ) and alanine (C3 H7 NO2 ), can run on a laptop.
The discretized Gaussians are used as the “global” Galerkin basis due to simplicity
of comparison with the MOLPRO package [299] calculations for the same basis sets.
Here, the primitive Gaussians of the basis sets of type cc-pVDZ are used for all con-
sidered molecules. Due to the grid representation of the basis set, this Hartree–Fock
solver can be considered as a “laboratory” for development and testing of the opti-
mized bases of general type. In particular, in this framework, the substitution of the
the set of steepest core electron Gaussians by a Slater-type functions for every non-
hydrogen nuclei, may be employed, essentially reducing the number of basis func-
tions.

11.1 Grid representation of the global basis functions


The initial eigenvalue problem is posed in the finite volume box Ω = [−b, b]3 ∈ ℝ3
subject to the homogeneous Dirichlet boundary conditions on 𝜕Ω. For a given dis-
cretization parameter n ∈ ℕ, we use the equidistant n × n × n tensor grid ω3,n = {xi },

1 Also abbreviated as fast TESC Hartree-Fock solver, see Section 7.4.

https://doi.org/10.1515/9783110365832-011
158 | 11 Fast grid-based Hartree–Fock solver by factorized TEI

Figure 11.1: The computational box [−b, b]3 routine size


is b = 20 au (∼10.5 Å).

i ∈ ℐ := {1, . . . , n}3 , with the mesh-size h = 2b/(n + 1),, see Figure 11.1. For the set of
“global” separable Galerkin basis functions {gk }1≤k≤Nb , k = 1, 2, . . . , Nb , we define ap-
proximating functions g k := I1 gk , k = 1, . . . , Nb , by linear tensor-product interpolation
via the set of product “local” basis functions {ξi } = ξi1 (x1 )ξi2 (x2 )ξi3 (x3 ), i ∈ ℐ , asso-
ciated with the respective grid-cells in ω3,n . The local basis functions are chosen as
piecewise linear (hat functions) for tensor calculation of the Laplace operator [156] or
piecewise constant for factorized calculations of two-electron integrals [157] and the
direct tensor calculation of the nuclear potential operator Vc [156]. Recall that the lin-
ear interpolant I1 = I1 × I1 × I1 is a product of 1D interpolation operators g (ℓ) k = I1 gk ,
(ℓ)
0 n
ℓ = 1, 2, 3, where I1 : C ([−b, b]) → Wh := span{ξi }i=1 is defined over the set of (piece-
wise linear or piecewise constant) local basis functions (I1 w)(xℓ ) := ∑Ni=1 w(xℓ,i )ξi (xℓ ),
xi ∈ ω3,N . This leads to the separable grid-based approximation of the initial basis
functions gk (x),

3 3 N
gk (x) ≈ g k (x) = ∏ g (ℓ) (ℓ)
k (xℓ ) = ∏ ∑ gk (xℓ,i )ξi (xℓ ) (11.1)
ℓ=1 ℓ=1 i=1

such that the rank-1 coefficient tensor Gk is given by

Gk = g(1)
k
⊗ g(2)
k
⊗ g(3)
k
, k = 1, . . . , Nb , (11.2)

with the canonical vectors g(ℓ)


k
= {gk(ℓ) } ≡ {gk(ℓ) (xi(ℓ) )}. The discretized Galerkin basis is
i
then represented by the set of rank-1 canonical tensors Gk , k = 1, . . . , Nb .
Since the tensor-structured calculation of the operators in the Hartree–Fock equa-
tion is reduced to one-dimensional rank-structured algebraic operations, the size n of
the tensor-product grid ω3,n can be chosen differently for different parts of the Fock
operator. For example, the entries of the matrices in Ag and Vg in (see Section 9),
corresponding to kinetic and nuclear energy parts in the core Hamiltonian, can be
computed using different grid sizes n for discretizing the “global” Gaussian basis
functions. The same concerns with the grid size n in rank-structured calculations of
11.2 3D Laplace operator in O(n) and O(log n) complexity | 159

the two-electron integrals by using piecewise constant basis functions, which can be
much smaller than the grid-size n required for calculation of both Ag and Vg , since J
and K are integral operators. Thus, the discretization step-size for the grid represen-
tation of the Galerkin basis is specified only by accuracy needs for the particular part
of the Fock operator of interest.

11.2 3D Laplace operator in O(n) and O(log n) complexity


Recall that the grid-based calculation of the core Hamiltonian part (7.5), Hc = − 21 Δ+Vc ,
has been discussed in Section 9. In particular, given the Gaussian-type Galerkin basis
{gk (x)}1≤k≤Nb , x ∈ ℝ3 , the Laplace operator takes the matrix form Ag = [akm ] ∈ ℝNb ×Nb
with the entries

akm = ⟨−Δgk (x), gm (x)⟩, k, m = 1, . . . Nb ,

which can be computed by using simple multilinear algebra with rank-1 tensor Gk .
The exact Galerkin matrix Ag is approximated using (11.2) as in [156], Ag ≈ AG = {akm },
k, m = 1, . . . Nb , with

akm = −⟨AΔ Gk , Gm ⟩, (11.3)

which should be calculated with large grid-size n that resolve the sharp Gaussian basis
functions.
To overcome the limitations caused by the large mode size n of the target tensors,
the QTT tensor format [167, 165] can be used for calculation of the Laplace part in the
Fock operator [147]. This allows calculation of the multidimensional functions and
operators in logarithmic complexity O(log n). For the Laplace operator

AΔ = Δ(1)
1 ⊗I
(2)
⊗ I (3) + I (1) ⊗ Δ(2)
1 ⊗I
(3)
+ I (1) ⊗ I (2) ⊗ Δ(3)
1 , (11.4)

the exact rank-2 tensor train representation was introduced in [142]:

I 0 I
ΔTT = [Δ1 I] ⊗b [ ] ⊗b [ ] , (11.5)
Δ1 I Δ1

where the sign ⊗b (sometimes also denoted by ⋈) means the matrix product of block
core matrices with blocks being multiplied by means of the tensor product. Suppose
that n = 2L . Then the quantized representation of Δ1 takes the form [142, 170]

⊗b (L−2)
I J J 2I − J − J
Δ1Q = [I (11.6)
[ ] [ ]
J J] ⊗b [ J ] ⊗b [ −J ] ,
[ J] [ −nJ ]
160 | 11 Fast grid-based Hartree–Fock solver by factorized TEI

where L is equal to the number of the virtual dimensions in the quantized format, and

1 0 0 1
I=( ), J=( ).
0 1 0 0

For the discretized representation (11.2) of basis functions, the entries of the matrix
AG = {akm }, k, m = 1, . . . Nb , are calculated as

akm = −⟨AΔ Gk , Gm ⟩ ≈ −⟨ΔQTT Q(1)


k
⊗ Q(2)
k
⊗ Q(3)
k
, Q(1) (2) (3)
m ⊗ Qm ⊗ Qm ⟩, (11.7)

where the matrix ΔQTT is obtained by plugging the QTT Laplace representation (11.6)
into (11.5), and a tensor Q(ℓ)
k
, ℓ = 1, 2, 3, is the quantized representation of a vector
n
g(ℓ)
k
∈ ℝ .

Table 11.1: QTT calculations of the Laplacian matrix for H2 O molecule.

p 15 16 17 18 19 20
n3 = 23p 32 7673 65 5353 131 0713 262 1433 524 2873 1 048 5753
err(AG ) 0.0027 6.8 ⋅ 10−4 1.7 ⋅ 10−4 4.2 ⋅ 10−5 1.0 ⋅ 10−5 2.6 ⋅ 10−6
RE – 1.0 ⋅ 10−5 8.3 ⋅ 10−8 2.6 ⋅ 10−9 3.3 ⋅ 10−10 0
time (sec) 12.8 17.4 25.7 42.6 77 135

Table 11.1 demonstrates weak dependence of the calculation time on the size of the
3D Cartesian grid. In the case of water molecule, it shows the approximation error for
the Laplacian matrix err(AG ) = ‖AMolpro − AG ‖ represented in the discretized basis of
Nb = 41 Cartesian Gaussians, where AMolpro is the result of analytical computations
with the same Gaussian basis from MOLPRO program [299]. Time is given for MATLAB
implementation. The line “RE” in Table 11.1 represents the approximation error for
the discrete Laplacian AG obtained by the Richardson extrapolation on two adjacent
grids, where the grid-size is given by n = 2p , p = 1, . . . , 20. The QTT ranks of the canon-
ical vectors g(ℓ)
k
are bounded by several ones. The approximation order O(h2 ) can be
observed.

11.3 Nuclear potential operator in O(n) complexity


Now we recall shortly computation of the nuclear potential operator by direct tensor
summation of electrostatic potentials described in detail in Chapter 9:

M0

Vc (x) = ∑ , Zν > 0, x, aν ∈ Ω ⊂ ℝ3 , (11.8)
ν=1 ‖x − aν ‖
11.4 Coulomb and exchange operators by factorized TEI | 161

where M0 is the number of nuclei in Ω. Using the canonical tensor representation of a


1
reference 3D Newton kernel ‖x‖ described in Section 6.1

R
̃ R = ∑ p(1) ⊗ p(2) ⊗ p(3) ∈ ℝ2n×2n×2n
P (11.9)
q q q
q=1

and the rank-1 shifting-windowing operator (see Section 9.2)

(1) (2) (3)


𝒲ν = 𝒲ν ⊗ 𝒲ν ⊗ 𝒲ν for ν = 1, . . . , M0 ,

the total electrostatic potential Vc (x) in the computational box Ω is approximated by


the direct canonical tensor sum (see also (9.11))

M0 R
n×n×n
Pc = ∑ Zν ∑ 𝒲ν(1) p(1) (2) (2) (3) (3)
q ⊗ 𝒲ν pq ⊗ 𝒲ν pq ∈ ℝ . (11.10)
ν=1 q=1

Then for a given tensor representation of the basis function as a rank-1 canonical ten-
sor (11.2), the sum Vc (x) of potentials in a box as in (11.8) is represented in a given
basis set by a matrix Vg ≈ VG = {vkm } ∈ ℝNb ×Nb whose entries are calculated by simple
tensor operations [156, 147]:

vkm = ∫ Vc (x)gk (x)gm (x)dx ≈ {vkm } := ⟨Gk ⊙ Gm , Pc ⟩, 1 ≤ k, m ≤ Nb . (11.11)


ℝ3

Note that for the grid-based representation of the core potential, Vc (x), Pc , the
univariate grid size n can be noticeably smaller than the size of the grid used for the
piecewise linear discretization for the Laplace operator.

11.4 Coulomb and exchange operators by factorized TEI


Here we recall the multilinear algebraic calculation of the Coulomb and exchange ma-
trices in the Fock operator discussed in full detail in [157, 147]. For precomputed two-
electron integrals, in view of (7.14), the Coulomb matrix is given by

Nb
J(D)μν = ∑ bμν,κλ Dκλ . (11.12)
κ,λ=1

Vectorizing matrices J = vec(J) and D = vec(D) and taking into account the rank struc-
ture in TEI matrix B, we arrive at the simple matrix representation for the Coulomb
matrix

J = BD ≈ L(LT D). (11.13)


162 | 11 Fast grid-based Hartree–Fock solver by factorized TEI

The straightforward calculation by (11.13) amounts to O(RB Nb2 ) operations, where RB


is the ε-rank of B.
For the exchange operator K, tensor evaluation is more involved due to summation
over permuted indices:
N
1 b
K(D)μν = − ∑ b D , (11.14)
2 κ,λ=1 μλ,νκ κλ

which diminishes the advantages of the low-rank structure in the matrix B. Introduc-
ing the permuted tensor B
̃ = permute(B, [2, 3, 1, 4]) and the respective unfolding matrix
B
̃ = mat(B),
̃ we then obtain

vec(K) = K = BD.
̃ (11.15)

The direct calculation by (11.15) amounts to O(RB Nb3 ) operations. However, using
the rank-Norb decomposition of the density matrix D = 2CC T reduces the cost to
O(RB Norb Nb2 ) via the representation

Norb T
K(D)μν = − ∑ (∑ Lμλ Cλi )(∑ Lκν Cκi ) ,
i=1

where Lμν = reshape(L, [Nb , Nb , RB ]) ∈ ℝNb ×Nb ×RB is the Nb × Nb × RB -folding of the
Cholesky factor L.

Figure 11.2: Approximation accuracy for the Coulomb matrix of glycine molecule using TEI computed
on the grid with n3 = 32 7683 (left) and n3 = 65 5363 (right).

Figure 11.2 presents the error in computation of the Coulomb matrix for glycine amino
acid (Nb = 170) using TEI computed on the grids n3 = 32 7683 (left) and n3 = 65 5363
(right). The numerical error scales quadratically in the grid size O(h2 ) and can be im-
proved to O(h3 ) by the Richardson extrapolation. The observed decay ratio 1 : 4 in-
dicates the applicability of the Richardson extrapolation to the results on a pair of
11.5 Algorithm of the black-box HF solver | 163

Figure 11.3: Left: the error in density matrix for the amino acid alanine (Nb = 210) for the TEI com-
puted with n3 = 131 0723 . Right: the error in exchange matrix for H2 O2 (Nb = 68) computed by TEI
using the grid of size n3 = 131 0723 .

diadically refined grids. Figure 11.3 (left) demonstrates the error in computation of the
density matrix of alanine molecule (Nb = 210) using TEI computed on the grid with
n3 = 131 0723 . Figure 11.3 (right) displays the error in exchange matrix computation for
the H2 O2 molecule (Nb = 68) using TEI with n3 = 131 0723 .

11.5 Algorithm of the black-box HF solver


Our grid-based TESC HF solver operates in a black-box way; the input only includes
setting up the (x, y, z)-coordinates and charges of nuclei in Zν , ν = 1, . . . , M0 , in the
molecule and the Galerkin basis functions discretized on a tensor grid. For lattice sys-
tems, it is necessary to give the coordinates and Galerkin basis of a “reference atom”,
the interval between atoms in a lattice, and its length.
Recalling the discussion in Section 7.3, we have to solve the eigenvalue problem
for the coefficients matrix C = {ciμ } ∈ ℝNorb ×Nb ,

F(C)C = SCΛ, Λ = diag(λ1 , . . . , λNb ) (11.16)

with the overlap matrix S for the chosen Galerkin basis (7.10) and the Fock operator

F(C) = H + J(C) + K(C), (11.17)

where the matrices J(C) and K(C) depend on the solution matrix C. To solve the eigen-
value problem (11.16), we start self-consistent field iteration with F(C) = H and with
zero matrices for both Coulomb J and exchange K operators. In the course of SCF iter-
ation, we control the residual, computing the maximum-norm of the difference in the
virtual part of the eigenvectors from two consequent iterations

󵄩󵄩C(1, Norb : Nb )it−1 − C(1, Norb : Nb )it 󵄩󵄩󵄩∞ ≤ ε. (11.18)


󵄩󵄩 󵄩
164 | 11 Fast grid-based Hartree–Fock solver by factorized TEI

Iteration may be terminated when this value becomes smaller than a given ε-threshold
or the number of iterations may be predefined. Since iteration times are negligibly
small, we usually use a predefined number of iterations.
The first step is defining the global Galerkin basis. In what follows, for compari-
son with MOLPRO output, we discretize the rank-1 basis functions given as a product
of polynomials with Gaussians. We choose in advance the appropriate grid sizes ac-
cording to the desired accuracy of calculations. In general, one can set an nx × ny × nz
3D Cartesian grid, but in our current calculations, we use a cubic box with equal sizes
n in every space variable. As it was already noted, the univariate grid-size n of the
n × n × n 3D Cartesian grid can be chosen differently in calculation of the discretized
Laplacian, the nuclear potential operator, and the two-electron integrals tensor. Us-
ing finer (larger) grids need more CPU time. Therefore, there is a playoff between the
required accuracy and computational cost.
Given the coordinates of nuclei and the Galerkin basis, the black-box HF solver
performs the following computation steps.
(1) Choose the grid size n and the ε-threshold for rank truncation. Set up the grid
representation of the basis functions.
(2) Compute the nuclear energy shift Enuc , by (7.17).
(3) Compute the core Hamiltonian H by the three-dimensional grid-based calculation
of the Galerkin matrix AG for the Laplacian by (11.3) or (11.7) and for the nuclear
potential operator VG by (11.11).
(4) Using grid-based “1D density fitting”, compute the factorized TEI matrix in a form
of low-rank Cholesky decomposition B = LLT by (10.11), (10.12).
(5) Set up the input data for SCF iteration:
– threshold ε for the residual (alternatively, a maximal number of iterations);
– number Mopt specifying the design of DIIS scheme [238];
– define initial Coulomb and exchange matrices as J = 0 and K = 0.
(6) Start the SCF iteration for solving nonlinear eigenvalue problem:
– solve the linear spectral problem (11.16) with the current Fock matrix

1
F = AG − VG + J − K;
2

– update the residual (11.18) (difference in the virtual parts of the eigenvectors);
– update matrices J(C) and K(C) by computing (11.12) and (11.14);
– compute the ground-state energy E0,it at current iteration.
When the residual arrives at the given ε (or when the maximal iteration number
is reached), iteration is terminated.
(7) Compute the ground-state energy E0,n .
(8) Calculate the MP2 corrections by factorizations introduced in [150]; see Sec-
tion 11.8.
11.6 Ab initio ground state energy calculations for compact molecules | 165

Figure 11.4: The largest molecules considered for numerical examples (below): amino acids glycine
C2 H5 NO2 (left) and alanine C3 H7 NO2 (right). The ball-stick picture of molecules is generated by the
MOLDEN program [258].

Table 11.2: Times for one SCF iteration in the tensor-based Hartree–Fock solver (step 6) in MATLAB
implementation.

Molecule NH3 H2 O2 N2 H4 C2 H5 OH glycine alanine

Nb 48 68 82 123 170 210


Time (sec) 0.2 0.3 0.4 0.6 1.6 3.3

For small and moderate size molecules the solver in MATLAB works in one run from
the first step to the end of SCF iteration using 3D Cartesian grids for TEI calculations up
to n3 = 131 0723 . The total computation time usually does not exceed several minutes,
see Table 11.2 illustrating times for one SCF iteration by fast TESC Hartree-Fock solver
in MATLAB implementation.
For larger molecules (amino acids, see Figure 11.4), accurate calculations with the
grids exceeding n3 = 65 5363 need an off-line precomputing of TEI, which requires less
than one hour of Matlab calculations. CPU time for TEI calculations depends mostly
on the number of basis functions rather than on the size of the grid. The grid size is
mainly limited by the available storage of the computer: storage demand for the first
2
step in TEI calculations (factorization of the side matrices G(ℓ) ∈ ℝn×Nb , ℓ = 1, 2, 3),
is estimated by O(3nNb2 ), whereas for the second step of TEI calculations (Cholesky
decomposition of the TEI matrix B), it is bounded by O(Nb3 ).

11.6 Ab initio ground state energy calculations for compact


molecules
Numerical simulations are performed in MATLAB on an 8 AMD Opteron
Dual-Core/2800 computer cluster. The molecule is considered in a computational
box [−b, b]3 with b = 20 au (≈10.6 Å). In TEI calculations, we use the uniform mesh
sizes up to finest level with h = 2.5⋅10−4 corresponding to approximately 1.3⋅10−4 Å. For
166 | 11 Fast grid-based Hartree–Fock solver by factorized TEI

Figure 11.5: SCF iterations history for glycine and H2 O molecules.

the core Hamiltonian calculations finer grids are required with the mesh size about
h = 3.5 ⋅ 10−5 au (∼1.8 ⋅ 10−5 Å). These corresponds to large 3D Cartesian grids of size
n3 = 65 5353 and n3 = 1 048 5763 entries, correspondingly. In the following examples,
we present calculations of the ground-state energy for several compact molecules.
Figure 11.5 shows convergence of the SCF iterations for glycine (Nb = 170) amino
acid (left) and water (Nb = 41) molecule (right) using the factorized representation of
TEI precomputed with n3 = 131 0723 . The black line shows convergence of the resid-
ual computed as the maximum-norm of the difference of the eigenvectors from two
consequent iterations ‖C(1, : )it−1 − C(1, : )it ‖∞ . The green line presents the difference
between the lowest eigenvalue computed by the grid-based solver and the respective
eigenvalue from MOLPRO calculations, Δλ1,it = |λ1,Molpro − λ1,it |. The red line is the dif-
ference in ground-state energy with the MOLPRO results, ΔE0,it = |E0,Molpro − E0,it |.
Figures 11.6–11.8 demonstrate the convergence of the ground-state energy versus
self-consistent field iteration for glycine amino acids (Nb = 170), NH3 (Nb = 48) and
water (Nb = 41) molecules. Left figures show convergence history over 70 iterations;
right figures show the zoom of last 30 iterations. The black line corresponds to E0,Molpro
computed by MOLPRO for the same Gaussian basis.
Figure 11.9 presents the output of the solver for alanine molecule. Figure 11.10
presents the last 30 + k iterations on convergence of the ground-state energy for H2 O2
molecule. The red, green and blue lines correspond to grid sizes n3 = 32 7683 , 65 5363 ,
and 131 0723 , correspondingly.

Table 11.3: Glycine, basis of 170 Gaussians (cc-pVDZ): error in ground-state energy versus the mesh
size h. MOLPRO result is E0,Molpro = −282.8651.

p 13 15 16 17
n3 = 23p 81923 32 7673 65 5353 131 0723
h 0.0039 9.7 ⋅ 10−4 4.9 ⋅ 10−4 2.5 ⋅ 10−4
E0,n −282.8679 −282.8655 −282.8654 −282.8653
er(E0 ) 0.0024 3.5 ⋅ 10−4 2.2 ⋅ 10−4 2.2 ⋅ 10−4
11.6 Ab initio ground state energy calculations for compact molecules | 167

Figure 11.6: Convergence of the ground-state energy for the glycine molecule (left), with the grid size
for TEI calculation, n⊗3 = 131 0723 ; zoom for last 30 iterations (right).

Figure 11.7: Convergence of the ground-state energy for the NH3 molecule (left), with TEI grid size
n⊗3 = 131 0723 ; zoom of the last 30 iterations (right).

Figure 11.8: Convergence of the ground-state energy for the H2 O molecule (left), with the TEI grid size
n⊗3 = 131 0723 ; zoom of the last 30 iterations (right).
168 | 11 Fast grid-based Hartree–Fock solver by factorized TEI

Figure 11.9: Left: SCF iteration for alanine molecule (Nb = 211) with TEI computed on the grid
n⊗3 = 32 7683 . Right: convergence of E0,it at last 30 iterations.

Figure 11.10: Molecule H2 O2 , convergence


of E0,n after 30 + k iterations; with TEI
calculated on a sequence of grids.

Table 11.3 presents the error in the ground-state energy for glycine molecule er(E0 ) =
E0,n − E0,Molpro versus the mesh size of the grid for calculating TEI tensor, n. Notice that
the absolute error of calculations with grid-based TEI changes only mildly for grids
with size n ≥ 65 5353 , remaining at the level of about 10−4 hartree. This corresponds
to the relative error of the order of 10−7 hartree. Figure 11.11 demonstrates the absolute
error in the density matrix for some molecules.

11.7 On Hartree–Fock calculations for extended systems


For modeling the extended systems, we construct artificial crystal-like structures by
using a single Hydrogen atom as an initiating block and multiply translating it at equal
intervals d1 , d2 , d3 in every of three spacial directions x, y, and z, respectively. Thus,
a 3D lattice cluster of size m1 ×m2 ×m3 is assembled, where m1 , m2 , m3 are the numbers
of atoms in the spatial directions x, y, and z, see an example in Figure 11.12.
11.7 On Hartree–Fock calculations for extended systems | 169

Figure 11.11: Absolute error of the density matrix for NH3 molecule (left) and alanine amino acid
(right) compared with MOLPRO output.

Figure 11.12: The lattice structure


of size 4.5 × 4.5 × 1.5 Å3 in the
computational box [−b, b]3 with
b = 16 au (∼8.5 Å).

Several basis functions (e. g., Gaussians) taken for a single atom as the “initialization
basis” are duplicated for the lattice atoms, thus, creating the basis set for the whole
molecular system. For model problems, we construct artificial structures using the Hy-
drogen atoms, for example, in a form of the 4 × 4 × 2 lattice, using Hydrogen molecule
H2 as the “initiating” building block, with the distance between atoms 1.5 Å. Then for
a lattice system as described above, one can apply the fast Hartree-Fock solver. Fig-
ure 11.13 shows the slice of the nuclear potential calculated for the slab with 4 × 4 × 2
Hydrogen atoms. Figure 11.14 shows the output of the Hartree–Fock eigenvalue prob-
lem solver for a cluster of 4 × 4 × 2 Hydrogen atoms. The left figure shows the conver-
gence of the ground-state energy, and the right one demonstrates the lower part of the
spectrum {λμ }, μ = 1, . . . , Nb , where every line corresponds to one λμ .
Tensor Hartree–Fock calculations do not have special requirements on the posi-
tions of nuclei on the 3D grid; the nuclei in the investigated molecular systems may
have an arbitrary position in (x, y, z)-coordinates in the computational box.
Solving the ab initio Hartree–Fock problem for larger clusters of Hydrogen-like
atoms by using block circulant and Toeplitz structures in the framework of the lin-
earized Fock operator is considered in [151, 154]. The reformulation of the nonlinear
Hartree–Fock equation for periodic molecular systems, based on the Bloch theory [37],
has been addressed in the literature for more than forty years ago, and nowadays there
are several implementations mostly relying on the analytic treatment of arising in-
tegral operators [72, 235, 88]. Mathematical analysis of spectral problems for PDEs
170 | 11 Fast grid-based Hartree–Fock solver by factorized TEI

Figure 11.13: Left: cross-section of the nuclear potential for the 8 × 4 × 1 cluster of H atoms. Right:
convergence of the residual in SCF iteration.

Figure 11.14: Convergence of the ground-state energy for the 4 × 4 × 2 cluster of H atoms (left) and
a part of its spectrum (right).

with the periodic-type coefficients was an attractive topic in the recent decade; see
[46, 47, 45, 77] and the references therein.
In [154], the new grid-based tensor approach to approximate solution of the el-
liptic eigenvalue problem for the 3D lattice-structured systems is introduced and an-
alyzed, where the linearized Hartree–Fock equation is considered over a spatial L1 ×
L2 × L3 lattice for both periodic and non-periodic problem settings, was discretized in
the localized Gaussian-type orbitals basis. In the periodic case, the Galerkin system
matrix obeys a three-level block-circulant structure that allows the FFT-based diago-
nalization, whereas for the finite extended systems in a box (Dirichlet boundary con-
ditions) this matrix allows the perturbed block-Toeplitz representation providing fast
matrix-vector multiplication and low storage size.
The above mentioned grid-based tensor techniques manifest the twofold benefits:
(a) the entries of the Fock matrix are computed by 1D operations using low-rank ten-
sors represented on a 3D grid; (b) in the periodic case, the low-rank tensor structure in
11.8 MP2 calculations by factorized TEI | 171

the diagonal blocks of the Fock matrix in the Fourier space reduces the conventional
3D FFT to the product of 1D FFTs.
Lattice-type systems in a box with Dirichlet boundary conditions are treated nu-
merically by the tensor solver as for single molecules, which makes possible calcula-
tions on rather large L1 ×L2 ×L3 lattices due to reduced numerical cost for 3D problems.
The numerical simulations for both box-type and periodic L × 1 × 1 lattice chains in a
3D rectangular “tube” with L up to several hundred confirm the theoretical complexity
bounds for the block-structured eigenvalue solvers at the limit of large L, see [154].

11.8 MP2 calculations by factorized TEI


The Møller–Plesset perturbation theory (MP2) provides an efficient tool for correcting
the Hartree–Fock energy by relatively modest numerical efforts [222, 3, 128]. It facili-
tates the accurate calculation of the molecular gradient energy and other quantities
[127]. Since the straightforward calculation of the MP2 correction scales as O(Nb5 ) flops
with respect to the number of basis functions, efficient methods are consistently de-
veloped making the problem tractable for larger molecular systems. The direct method
for evaluating the MP2 energy correction and the energy gradient, which reduces the
storage needs to O(Nb2 ) at the expense of calculation time has been reported in [125].
The advantageous technique using the Cholesky factorization of the two-electron
integrals introduced in [17] was efficiently applied for MP2 calculations [4]. A linear
scaling MP2 scheme for extended systems is considered in [6]. Recently, the MP2
scheme attracted much interest due to efficient algorithms for the multi-electron
integrals [304, 247], its density fitting approach exhibiting a low cost (considering
extended molecular systems [216, 298, 241]), and owing to application of tensor fac-
torization methods [306]. An efficient MP2 algorithm that is applicable to large ex-
tended molecular systems in the framework of the DFT model is based on the Laplace
transform reformulation of the problem and usage of the multipole expansion [311].
Following [150], here we describe an approach to compute the Møller–Plesset cor-
rection to the Hartree–Fock energy with reduced numerical cost based on using the
factorized tensor representation of TEI matrix. Notice that the auxiliary redundancy-
free factorization of TEI is obtained in a “black-box” way, that is, without physical
insight into the molecular configuration.
The TEI matrix is precomputed in a low-rank format obtained via truncated
Cholesky factorization (approximation). This induces separability in the molecu-
lar orbitals transformed TEI matrix and in the doubles amplitude tensor. Such an
approach reduces the asymptotic complexity of the MP2 calculations from O(Nb5 ) to
O(Nb3 Norb ), where Nb is the total number of basis functions, whereas Norb denotes the
number of occupied orbitals. The rank parameters are estimated for both the orbital
basis transformed TEI and for the doubles amplitude tensors. Notice that using the
QTT tensor approximation [167] of long Nb2 -vectors in the Cholesky factor allows to
172 | 11 Fast grid-based Hartree–Fock solver by factorized TEI

reduce the storage consumption and CPU times by a factor of about ≃10 in both TEI
and MP2 calculations.
The efficiency of MP2 energy correction algorithm was tested in [150] for some
compact molecules, including glycine and alanine amino acids. Due to factorized ten-
sor representations of the involved multidimensional data arrays, the MP2 calculation
times turned out to be rather moderate compared to those for TEI tensor, ranging from
one second for water molecule to approximately 4 minutes for glycine molecule. The
numerical accuracy is controlled by the given threshold ε > 0 due to stable tensor-rank
reduction algorithms.

11.8.1 Two-electron integrals in a molecular orbital basis

In what follows, we describe the main ingredients of the computational scheme in-
troduced in [150], which reduces the cost by using low-rank tensor decompositions of
arising multidimensional data arrays.
Let C = {Cμi } ∈ ℝNb ×Nb be the coefficient matrix representing the Hartree–Fock
molecular orbitals (MO) in the atomic orbitals (AO) basis set {gμ }1≤μ≤Nb (obtained in
the Hartree–Fock calculations). First, one has to transform the TEI tensor B = [bμνλσ ]
computed in the initial AO basis set to that represented in the MO basis

Nb
B 󳨃→ V = [viajb ] : viajb = ∑ Cμi Cνa Cλj Cσb bμνλσ , a, b ∈ Ivir , i, j ∈ Iocc , (11.19)
μ,ν,λ,σ=1

where Iocc := {1, . . . , Norb }, Ivir := {Norb + 1, . . . , Nb }, with Norb denoting the number of
occupied orbitals. In what follows, we shall use the notation

Nvir = Nb − Norb , Nov = Norb Nvir .

Hence we have V ∈ ℝℐ , where

⊗4
ℐ := (Ivir × Iocc ) × (Ivir × Iocc ) ⊂ Ib .

Straightforward computation of the tensor V in the above representation makes


the dominating impact to the overall numerical cost of MP2 calculations O(Nb5 ).
Given the tensor V = [viajb ], the second-order MP2 perturbation to the HF energy
is calculated by
viajb (2viajb − vibja )
EMP2 = − ∑ ∑ , (11.20)
a,b∈Ivir i,j∈Iocc
εa + εb − εi − εj

where the real numbers εk , k = 1, . . . , Nb , represent the HF eigenvalues. Notice that


the denominator in (11.20) remains strongly positive if εa > 0 for a ∈ Ivir and εi < 0
11.8 MP2 calculations by factorized TEI | 173

for i ∈ Iocc . The latter conditions (nonzero homo lumo gap) will be assumed in the
following.
Introduce the so-called doubles amplitude tensor T,
(2viajb − vibja )
T = [tiajb ] : tiajb = , a, b ∈ Ivir ; i, j ∈ Iocc ,
εa + εb − εi − εj

then the MP2 perturbation takes the form of a scalar product of rank-structured ten-
sors:

EMP2 = −⟨V, T⟩ = −⟨V ⊙ T, 1⟩,

where the summation is restricted to the subset of indices ℐ , and 1 denotes the rank-1
all-ones tensor. Define the reciprocal “energy“ tensor

1
E = [eabij ] := [ ], a, b ∈ Ivir ; i, j ∈ Iocc , (11.21)
εa + εb − εi − εj

and the partly transposed tensor (transposition in indices a and b)

V󸀠 = [viajb
󸀠
] := [vibja ].

Now the doubles amplitudes tensor T will be further decomposed into the sum

T = T(1) + T(2) = 2V ⊙ E − V󸀠 ⊙ E, (11.22)

where each term in the right-hand side above will be treated separately.

11.8.2 Separation rank estimates and numerical illustrations

In this section, we show that the rank RB = O(Nb ) approximation to the symmetric TEI
matrix B ≈ LLT with the Cholesky factor L ∈ ℝN×RB leads to the low-rank representa-
tion of the tensor V and the RB -term decomposition of T. This reduces the asymptotic
complexity of MP2 calculations to O(Nb3 Norb ) and also provides certain computational
benefits. In particular, it reduces the storage costs.

Lemma 11.1 ([150]). Given the rank-RB Cholesky decomposition of the matrix B, the ma-
trix unfolding V = [via;jb ] allows a rank decomposition with rank ≤ RB . Moreover, the
tensor V󸀠 = [vibja ] enables an RB -term decomposition of mixed form.

Proof. Let us denote by Lk = Lk (μ; ν), k = 1, . . . , RB , a matrix unfolding of the vector


2
L( : , k) ∈ ℝNb ×Nb in the Cholesky factor L ∈ ℝNb ×RB ; notice that the Cholesky factoriza-
tion can be written pointwise as follows:
RB
bμν;λσ ≈ ∑ Lk (μ; ν)Lk (σ; λ).
k=1
174 | 11 Fast grid-based Hartree–Fock solver by factorized TEI

Let Cm = C( : , m), m = 1, . . . , Nb be the mth column of the coefficient matrix C = {Cμi } ∈


ℝNb ×Nb . Then, the rank-RB representation of the matrix unfolding V = [via;jb ] ∈ ℝNov ×Nov
takes the form

V = LV LTV , LV ∈ ℝNov ×RB ,

where

LV ((i − 1)Nvir + a; k) = CiT Lk Ca , k = 1, . . . , RB , a = 1, . . . , Nvir , i = 1, . . . , Norb .

This is justified by the following transformations:

Nb
viajb = ∑ Cμi Cνa Cλj Cσb bμνλσ
μ,ν,λ,σ=1
RB Nb
≈∑ ∑ Cμi Cνa Cλj Cσb Lk (μ; ν)Lk (σ; λ)
k=1 μ,ν,λ,σ=1
RB Nb Nb
= ∑ ( ∑ Cμi Cνa Lk (μ; ν))( ∑ Cλj Cσb Lk (σ; λ))
k=1 μ,ν=1 λ,σ=1
RB
= ∑ (CiT Lk Ca )(CbT LTk Cj ). (11.23)
k=1

This proves the first statement. Furthermore, the partly transposed tensor V󸀠 := [vibja ]
allows an RB -term decomposition derived similarly to (11.23):

RB
󸀠
viajb = vibja = ∑ (CiT Lk Cb )(CaT LTk Cj ). (11.24)
k=1

This completes the proof.


It is worth noting that one has to compute and store the only LV factor in the
above symmetric factorizations of V and V 󸀠 . Hence, the storage cost of decompo-
sitions (11.23) and (11.24) restricted to the active index set Ivir × Iocc amounts to
RB Nvir Norb numbers. The complexity of straightforward computation can be estimated
by O(RB Nb2 Norb ).
Next, we consider separable representation of the tensor T in (11.22). To that end,
we first apply low-rank canonical ε-approximation to the tensor E. The following
lemma describes the canonical approximation to the tensor E that converges expo-
nentially fast in the rank parameter.

Lemma 11.2 ([150]). Suppose that the so-called homo lumo gap is estimated by

δ
min |εa − εi | ≥ > 0.
a∈Ivir ,i∈Iocc 2
11.8 MP2 calculations by factorized TEI | 175

Then the rank-RE canonical approximation to the tensor E ≈ ERE , with RE = 2M + 1,

M
ea,b,i,j ≈ ∑ cp e−αp (εa +εb −εi −εj ) , αp > 0, (11.25)
p=−M

with the particular choice

h = π/√M, αp = eph , cp = hαp , and M = O(|log ε log δ|)

provides the error bound

‖E − ERE ‖F ≤ O(ε).

Proof. Consider the sinc-quadrature approximation of the Laplace transform applied


to the 4th-order Hilbert tensor

M
1 ∞
= ∫ e−t(x1 +x2 +x3 +x4 ) dt ≈ ∑ cp e−αp (x1 +x2 +x3 +x4 )
x1 + x2 + x3 + x4 0 p=−M

for xi ≥ 0 such that ∑ xi > δ, which converges exponentially in M (see [111, 93]). This
proves the statement.

Notice that the matrix V exhibits an exponential decay in the singular values (ob-
served in numerical experiments; see Figure 11.15), which means that the approxima-
tion error ε > 0 can be achieved with the separation rank RV = O(|log ε|). Figure 11.15
illustrates the exponential convergence in the rank parameter for the low-rank approx-
imation of matrices V and E = [eab;ij ].

Figure 11.15: Singular values of the matrix unfolding V (left) and E (right) for some compact
molecules, including the aminoacids glycine (C2 H5 NO2 ) and alanine (C3 H7 NO2 ). The numbers in
brackets indicate the size of a matrix, that is, Norb Nvirt , for the corresponding molecule.
176 | 11 Fast grid-based Hartree–Fock solver by factorized TEI

11.8.3 Complexity bounds, sketch of algorithm, QTT compression

Lemmas 11.1 and 11.2 result in the following complexity bound: The Hadamard product
V ⊙ T and the resultant functional EMP2 can be evaluated at the expense

O(RE R2B Nocc Nvir ).

Indeed, the first term in the splitting T = T(1) + T(2) is represented by rank-structured
tensor operations

T(1) = 2V ⊙ E = 2[tiajb
(1)
],

where
RE RB
(1)
tiajb = ∑ cp ∑ (eαp εi CiT Lk e−αp εa Ca )(e−αp εb CbT LTk eαp εj Cj ) (11.26)
p=1 k=1

and Lk = Lk ( : , : ) stands for the Nb × Nb matrix unfolding of the Cholesky vector L(:, k).
Then the numerical complexity of this rank-(RE RB ) separable approximation is esti-
mated via the multiple of RE with the corresponding cost for the treatment of the ten-
sor V, that is, O(RE RB Nocc Nvir ). Furthermore, the RB -term decomposition of V󸀠 := [vibja ]
(see (11.24)) again leads to the summation over (RE RB )-term representation of the sec-
ond term in the splitting of T,

T(2) = [tiajb
(2)
] = V󸀠 ⊙ E,

where
RE RB
(2)
tiajb = ∑ cp ∑ (eαp εi CiT Lk e−αp εa Cb )(e−αp εb CaT LTk eαp εj Cj ). (11.27)
p=1 k=1

This makes the main contribution to the overall cost.


Based on the rank decompositions of the matrix B, the energy tensor E, and the
doubles amplitude tensor T, we utilize the final Algorithm 4 to compute the MP2 en-
ergy correction (see [150]).

Table 11.4: MP2 correction to the ground-state energy (in hartree) for some compact molecules,
including aminoacids glycine (C2 H5 NO2 ) and alanine (C3 H7 NO2 ).

Molecules H2 O H2 O2 N2 H4 C2 H5 OH C2 H5 NO2 C3 H7 NO2

Nb ; Norb 41; 5 68; 9 82; 9 123; 13 170; 20 211; 24


E0 −76.0308 −150.7945 −111.1897 −154.1006 −282.8651 −321.9149
EMP2 −0.2587 −0.4927 −0.4510 −0.6257 −1.0529 −1.24
11.8 MP2 calculations by factorized TEI | 177

Algorithm 4 Fast tensor-structured computation of the MP2 energy correction.


Input: Rank-RB factorization LLT of B, coefficient matrix C, and Hartree–Fock eigen-
values ε1 , . . . , εNb , error tolerance ε > 0.
(1) Compute the column vectors in the rank-RB decomposition of matrix V = [via;jb ],
CiT Lk Ca , k = 1, . . . , RB (i, a = 1, . . . , Nb ) as in (11.23).
(2) Precompute the matrix factors in RB -term decomposition of V 󸀠 = [vib;ja ] as in
(11.24).
(3) Construct the canonical decomposition of “energy”tensor E = [ea,b,i,j ] by the
sinc-quadrature ea,b,i,j ≈ ∑M p=−M cp e
−αp (εa +εb −εi −εj )
as in (11.25) (Lemma 11.2).
(4) Compute the tensor T(1) = 2V ⊙ E as in (11.26) using rank decompositions of V
and E.
(5) Compute the tensor T(2) = V󸀠 ⊙ E as in (11.27) using RB -term decompositions of
V 󸀠 and E.
(6) Compute the MP2 correction by the “formatted”scalar product operation:

EMP2 = −⟨V, T(1) + T(2) ⟩.

Output: MP2 energy correction EMP2 .

Table 11.4 presents the effect of MP2 correction for several compact molecules. In most
cases, this correction amounts to about 0.4 % of the total energy.
The tensor-structured factorization of the TEI matrix B makes it possible to reduce
the overall cost of MP2 calculations to O(Nb2 Nvir Norb ) by using the QTT approximation
of the long column vectors in the Cholesky factor L. Figure 10.4 (left) indicates that
the average QTT ranks of columns vectors in the Cholesky factor and of the vectorized
density matrix C ∈ ℝNb ×Nb remains almost the same (they depend only on the entan-
glement properties of a molecule), and they can be estimated by

rankQTT (L(:, k)) ≈ rankQTT (Ck ) ≤ 3Norb , k = 1, . . . , NB .

This hidden structural property implies that the computation and storage cost for the
matrix V = LV LTV involved in Algorithm 4 (the most expensive part in the MP2 calcula-
2
tion) can be reduced to O(Norb ) at the main step in (11.23), that is, computing CiT Lk Ca
2
instead of Nb , thus indicating the reduced redundancy in the AO basis in the case of
compact molecules. Since the QTT rank enters quadratically the storage cost for QTT
vectors, we conclude that

(3Norb )2 ≤ CNb2 ,

where the constant C is estimated by C ≈ 0.1, taking into account that the typical
relation Nb ≈ 10 ⋅ Norb holds in the case of Gaussian-type basis sets.
178 | 11 Fast grid-based Hartree–Fock solver by factorized TEI

Further reduction of the numerical complexity can be based on taking into ac-
count the more specific properties of the matrix unfolding V when using a physical
insight to the problem (say, flat or extended molecules, multiple symmetries, lattice
type or periodic structures, accounting data sparsity, etc.).
Other methods for high-accuracy energy calculations are based on coupled clus-
ters technique, which requires much larger computations resources; see, for example,
[260, 13, 249].
12 Calculation of excitation energies of molecules
12.1 Numerical solution of the Bethe–Salpeter equation
Recently, computation of excitation energies and absorption spectra for molecules
and surfaces of solids attracted much interest due to the related promising applica-
tions, in particular, in the development of sustainable energy technologies. The tradi-
tional methods for computer simulation of excitation energies for molecular systems
require large computational facilities. Therefore, there is a steady need for new al-
gorithmic approaches for calculating the absorption spectra of molecules with less
computational cost and having a good potential for application to larger systems. The
tensor-based approach seems to present a good alternative to conventional methods.
One of the well established ab initio methods for computation of excited states is
based on the solution of the Bethe–Salpeter equation (BSE) [252, 126], which in turn
is based on the Green’s function formalism and many-body perturbation theory, pro-
viding calculation of the excitation energies in a self-consistent way [224, 259, 194,
245]. The BSE method leads to a challenging computational task of solving a large
eigenvalue problem for a fully populated (dense) matrix, which, in general, is non-
symmetric. Another commonly used approach for computation of the excitation ener-
gies is based on time-dependent density functional theory (TDDFT) [251, 107, 51, 274,
56, 248].
The size of the BSE matrix scales quadratically 𝒪(Nb2 ) in the size Nb of the atomic
orbitals basis sets commonly used in ab initio electronic structure calculations. The
direct diagonalization of 𝒪(Nb6 )-complexity becomes prohibitive even for moderate
size molecules with size of the atomic orbitals basis set Nb ≈ 100. Therefore, an ap-
proximation that relies entirely on multiplications of the governing BSE matrix, or its
approximation with vectors in the framework of some iterative procedure, is the only
feasible strategy. In turn, fast matrix–vector computations can be based on the use of
low-rank matrix representations since such data structures allow efficient storage and
basic linear algebra operations with linear complexity scaling in the matrix size.
An efficient method was introduced in [23] for approximate numerical solution
of the BSE eigenvalue problem by using the low-rank approximation, which leads to
relaxation of the numerical costs from O(N 6 ) down to O(N 2 ). It is based on the con-
struction of a simplified problem by a diagonal plus rank-structured representation of
a system matrix so that the related spectral problem can be solved iteratively. Then
model reduction via the projection onto a reduced basis is constructed by using the
representative set of eigenvectors of a simplified system matrix. The further enhance-
ment based on the block-diagonal plus low-rank approximation to BSE matrix for ac-
curacy improvement was presented in [25].
The particular construction of the BSE system matrix in [23] is based on the
non-interacting Green’s function in terms of eigenfunctions and eigenvalues of the

https://doi.org/10.1515/9783110365832-012
180 | 12 Calculation of excitation energies of molecules

Hartree–Fock operator introduced in [243, 244], where it was applied to the simple
H2 molecule in the minimal basis of two Slater functions, and where the system ma-
trix entries are evaluated analytically. In [23] it was shown that this computational
scheme for solving the BSE becomes practically applicable to moderate size com-
pact molecules when using the tensor-structured Hartree–Fock calculations [147, 152]
yielding efficient representation of the two-electron integrals (TEI) in the molecular
orbitals basis in a form of a low-rank Cholesky factorization [157, 150].
The low-rank representation of TEI tensor stipulates the beneficial structure of
the BSE matrix blocks, thus enabling efficient numerical algorithms for solution of
large structured eigenvalue problems. The simplified block decomposition in the BSE
system matrix is characterized by the separation rank of order O(Nb ), which enables
compact storage and fast matrix–vector multiplications in the framework of iterations
on a subspace for computation of a few (lowest/largest) eigenvalues. To reduce the
error of the diagonal plus low-rank approximation, it was proposed in [25] to represent
the static screen interaction part in the BSE matrix by a small fully populated sub-
block with adaptively chosen size.
In [25] efficient iterative schemes are introduced for computing several tens of the
smallest in modulo eigenvalues for both the BSE problem and its Tamm–Dancoff ap-
proximation (TDA). The most efficient subspace iteration is based on application of the
matrix inverse, which for the considered matrix formats can be evaluated in a struc-
tural form by using the Sherman–Morrison–Woodbury formula [269]. The numerical
experiments show that our method is economical (at least up to small amino-acids),
where the numerical cost for computing several hundreds eigenvalues decreases by
the orders of magnitude. Usually, the smallest in modulo singular values of BSE prob-
lem are of most interest in applications.

12.2 Prerequisites from Hartree–Fock calculations


As the prerequisites for constructing the generating matrices for the BSE eigenvalue
problem, we use the results of ab initio tensor-based Hartree–Fock and MP2 calcula-
tions; see Sections 11.5 and 11.8. They provide a full set of necessary quantities for fast
and economical computation of a set of lowest in magnitude part of the BSE spectrum
(here, we follow notations from [23]):
– full set of eigenvalues of the Hartree–Fock EVP
ε1 , . . . , εNb ;

– the full set of Galerkin coefficients of the expansion of molecular orbitals in a


Gaussian basis
C = {cμi } ∈ ℝNb ×Nb ;
2 2
– the two-electron integral matrix B = [bμν,κλ ] ∈ ℝNb ×Nb computed in a form of low-
rank Cholesky factorization
12.2 Prerequisites from Hartree–Fock calculations | 181

2
B ≈ LLT , L ∈ ℝNb ×RB , RB = O(Nb ), (12.1)

and presented in the molecular orbitals basis


Nb
B 󳨃→ V = [viajb ], where viajb = ∑ Cμi Cνa Cκj Cλb bμν,κλ . (12.2)
μ,ν,κ,λ=1

The indices i, j ∈ ℐo := {1, . . . , Norb } correspond to occupied orbitals, and a, b ∈ ℐv to


virtual ones, ℐv := {Norb , . . . , Nb }. Denote Nv = Nb − Norb and Nov = Norb Nv and further
use the short notation No = Norb .
The BSE calculations utilize the two subtensors of V specified by the index sets ℐo
and ℐv . The first subtensor is defined as in the MP2 calculations [150]

V = [viajb ] : a, b ∈ ℐv , i, j ∈ ℐo , (12.3)

whereas the second one lives on the extended index set

V
̂ = [v̂turs ] : r, s ∈ ℐv , t, u ∈ ℐo . (12.4)

In what follows, {Ci } and {Ca } denote the sets of occupied and virtual orbitals,
respectively. We shall also use the notation.
Denote the associated matrix by V = [via,jb ] ∈ ℝNov ×Nov in case (12.3), and similarly
2 2
̂ = [v̂tu,rs ] ∈ ℝNo ×Nv in case (12.4). The straightforward computation of the matrix
by V
V by the above representations accounts for the dominating impact on the overall
numerical cost of order O(Nb5 ) in the evaluation of the block entries in the BSE matrix.
Recall that the rank RB = O(Nb ) approximation to matrix B ≈ LLT with the N × RB
Cholesky factor L allows introducing the low-rank representation of the tensor V, and
then to reduce the asymptotic complexity of calculations to O(Nb4 ), [150]; see Sec-
tion 11.8, Lemma 11.1. A similar factorization can be derived in the case of (12.4).
The following statement is a slight modification of Lemma 11.1.

Lemma 12.1. Let the rank-RB Cholesky decomposition of the matrix B be given by (12.1).
Then the RB -term representation of the matrix V = [via;jb ] takes the form

V = LV LTV , LV ∈ ℝNov ×RB , (12.5)

where the columns of LV are given by

LV ((i − 1)Nvir + a − No ; k) = CiT Lk Ca , k = 1, . . . , RB , a ∈ ℐv , i ∈ ℐo .

On the index set (12.4), we have


2 2
̂ = Û ŴT ∈ ℝNo ×Nv
V V V
2 2
with UV̂ ∈ ℝNo ×RB , WV̂ ∈ ℝNv ×RB .
The numerical cost is determined by the computation complexity and storage size
for the factors LV , UV̂ , and WV̂ in the above rank-structured factorizations.
182 | 12 Calculation of excitation energies of molecules

Lemma 12.1 provides the upper bounds on rank(V) in the representation (12.5),
which might be reduced by the SVD based ε-rank truncation. It can be shown that
the ε-rank of the matrix V remains of the same magnitude as that for the TEI matrix B
obtained by its ε-rank truncated Cholesky factorization (see the numerical illustration
in Section 12.4).
Numerical tests in [150] (see also Sections 10 and 11.8) indicate that the singular
values of the TEI matrix B decay exponentially as
γ
−N k
σk ≤ Ce b , (12.6)

where the constant γ > 0 in the exponential depends weakly on the molecule config-
uration. If we define RB (ε) as the minimal number satisfying the condition

RB
∑ σk2 ≤ ε2 , (12.7)
k=RB (ε)+1

then estimate (12.6) leads to the ε-rank bound RB (ε) ≤ CNb |log ε|, which will be postu-
lated in the following.
Note that the matrix rank RV (ε) increases only logarithmically in ε, similarly to
the bound for RB (ε). This can be formulated as the following lemma (see [23]),

Lemma 12.2. For given ε > 0, there exist a rank-r approximation Vr of the matrix V and
a constant C > 0 not depending on ε such that rRV (ε) ≤ RB (ε) and

‖Vr − V‖ ≤ CNb ε|log ε|.

12.3 Tensor factorization of the BSE matrix blocks


Here, we discuss the main ingredients for calculation of blocks in the BSE matrix and
their reduced rank approximate representation. We compose the 2Nov × 2Nov BSE ma-
trix by following equations (46a) and (46b) in [243], though the construction of static
screened interaction matrix w(ij, ab) in equation (12.11) below may slightly differ; see
also [23].
Construction of the BSE matrix includes computation of several auxiliary quanti-
ties. First, introduce a fourth-order diagonal “energy” matrix by

Δε = [Δεia,jb ] ∈ ℝNov ×Nov : Δεia,jb = (εa − εi )δij δab

which can be represented in the Kronecker product form

Δε = Io ⊗ diag{εa : a ∈ ℐv } − diag{εi : i ∈ ℐo } ⊗ Iv ,
12.3 Tensor factorization of the BSE matrix blocks | 183

where Io and Iv are the identity matrices on respective index sets. It is worth noting
that if the so-called homo lumo gap of the system is positive, i. e.,

εa − εi > δ > 0, a ∈ ℐv , i ∈ ℐo ,

then the matrix Δε is invertible.


Using the matrix Δε and the Nov × Nov TEI matrix V = [via,jb ] represented in the MO
basis as in (12.2), the dielectric function (Nov × Nov matrix) Z = [zpq,rs ] is defined by

zpq,rs := δpr δqs − vpq,rs [χ 0 (ω = 0)]rs,rs ,

where χ 0 (ω) is the matrix form of the so-called Lehmann representation to the re-
sponse function. In turn, the representation of the inverse matrix of χ 0 (ω) is known to
have a form

Δε 0 1 0
χ −1
0 (ω) = − ( ) + ω( ),
0 Δε 0 −1

implying

Δε−1 0
χ 0 (0) = − ( ).
0 Δε−1

Define the rank-1 matrix 1 ⊗ dε , where 1 ∈ ℝNov is the all-ones vector, and dε =
diag{Δε−1 } ∈ ℝNov is the diagonal vector of Δε−1 . In this notation, the matrix Z = [zpq,rs ]
takes a compact form

Z = Io ⊗ Iv + V ⊙ (1 ⋅ dTε ). (12.8)

Introducing the inverse matrix Z −1 , we finally define the so-called static screened in-
teraction matrix by

W = [wpq,rs ] : wpq,rs := ∑ zpq,tu


−1
vtu,rs . (12.9)
t∈ℐv ,u∈ℐo

In the forthcoming calculations, this equation is considered on the conventional and


extended index sets {p, s ∈ ℐo } × {q, r ∈ ℐv } and {p, q ∈ ℐo } × {r, s ∈ ℐv }, respectively,
such that vtu,rs corresponds either to sub-tensor in (12.3) or in (12.4).
On the conventional index set, we obtain the following matrix factorization of
W := [wia,jb ],

W = Z −1 V, provided that a, b ∈ ℐv , i, j ∈ ℐo ,

where V is calculated by (12.3). Lemma 12.1 suggests the existence of a low-rank fac-
torization for the matrix W defined above.
184 | 12 Calculation of excitation energies of molecules

Lemma 12.3 ([23]). Let the matrix Z defined by (12.8) over the index set a, b ∈ ℐv , i, j ∈ ℐo
be invertible. Then the rank of the respective matrix W = Z −1 V is bounded by

rank(W) ≤ rank(V) ≤ RB .

Furthermore, equation (46a) in [243] includes matrix entries wij,ab for a, b ∈ ℐv ,


i, j ∈ ℐo . To this end, the modified matrix W = [wpq,rs ] is computed by (12.9) on the
extended index set {p, q ∈ ℐo } × {r, s ∈ ℐv } by using entries v̂ij,ab in the matrix unfolding
of the tensor V ̂ in (12.4) multiplied from the left with the N 2 × N 2 sub-matrix of Z −1 .
o o
Now the matrix representation of the Bethe–Salpeter equation in the (ov, vo) sub-
space reads as the following eigenvalue problem:

x A B x I 0 x
F ( n) ≡ ( ∗ ) ( n ) = ωn ( ) ( n) , (12.10)
yn B A∗ yn 0 −I yn

determining the excitation energies ωn and the respective excited states. Here, the ma-
trix blocks are defined in the index notation by (see (46a) and (46b) in [243] for more
detail)

aia,jb := Δεia,jb + via,jb − wij,ab , (12.11)


bia,jb := via,bj − wib,aj , a, b ∈ ℐv , i, j ∈ ℐo . (12.12)

In the matrix form, we obtain

A = Δε + V − W,
̂

where the matrix elements in W ̂ = [ŵia,jb ] are defined by w


̂ia,jb = wij,ab , computed by
(12.9). Here, the diagonal plus low-rank sparsity structure in Δε + V can be recognized
in view of Lemma 12.1. For the matrix block B, we have

B=V
̃−W
̃ = V − W,
̃

where the matrix V,


̃ which is an unfolding of the partly transposed tensor, is defined
entrywise by

V
̃ = [ṽiajb ] := [viabj ] = [viajb ],

and hence it coincides with V in (12.3) due to the symmetry properties. Here, W ̃ =
[w
̃ia,jb ] = [wib,aj ] is defined by permutation. The ε-rank structure in the matrix blocks A
and B, resulting from the corresponding factorizations of V, has been analyzed in [23].
Solutions of equation (12.10) can be grouped in pairs: excitation energies ωn with
eigenvectors (xn , yn ) and de-excitation energies −ωn with eigenvectors (x∗n , y∗n ).
The block structure in the matrices A and B is inherited from the symmetry of the
TEI matrix V, via,jb = vai,bj∗
and the matrix W, wia,jb = wbj,ai

. In particular, it is known
12.4 The reduced basis approach using low-rank approximations | 185

from the literature that the matrix A is Hermitian, and the matrix B is (complex) sym-
metric (since via,bj = vjb,ai and wib,aj = wja,bi ), which we presuppose in the matrix con-
struction. The literature concerning the discussion of skew-symmetric (Hamiltonian)
block structure in BSE matrix can be found in [23].
In the following discussion, we confine ourselves to the case of real spin orbitals;
that is, the matrices A and B remain real. The dimension of the matrix in (12.10) is
2No Nv × 2No Nv , where No and Nv denote the numbers of occupied and virtual orbitals,
respectively. In general, No Nv is asymptotically of size O(Nb2 ). That is, the spectral prob-
lem (12.10) may be computationally extensive. Indeed, the direct eigenvalue solver for
(12.10) via diagonalization becomes infeasible due to O(Nb6 ) complexity scaling. Fur-
thermore, the numerical cost for calculation of the matrix elements based on the pre-
2
computed TEI integrals from the Hartree–Fock equation scales as O(Nov ) = O(Nb4 ),
where the low-rank structure in the matrix V can be adapted.
The challenging computational tasks arise in the case of lattice-structured com-
pounds, where the number of basis functions increases proportionally to the lattice
size L × L × L, that is Nb ≈ Nb,0 L3 , which quickly leads to intractable problems even for
small lattices.

12.4 The reduced basis approach using low-rank approximations


Notice that in realistic quantum chemical simulations of excitation energies, the cal-
culation of several tens of eigenpairs may be sufficient.
As we have already seen, the part Δε + V in the matrix block A allows the accurate
diagonal plus low-rank (DPLR) structured approximation. Moreover, the sub-matrix
V
̃ = V in the block B also inherits the low-rank approximation. Taking into account
these structures, a special solver was proposed in [23] for partial eigenvalue problem
based on the use of a reduced basis obtained from the eigenvectors of the reduced
matrix that picks up only the essential part of the initial BSE matrix with the DPLR
structure. The iterative solver is based on fast matrix–vector multiplication and effi-
cient storage of all data involved in the computational scheme. Using the reduced ba-
sis approach, the initial problem is then approximated by its Galerkin projection onto
a reduced basis of moderate size.
We summarize that the low-rank decomposition of the matrix V,
V ≈ LV LTV , LV ∈ ℝNov ×RV , RV = RV (ε) = O(Nb |log ε|) ≤ RB (12.13)
can be optimized depending on the truncation error ε > 0; see also Section 11.8.
In the construction of simplified matrix, we represent matrix blocks A and B in-
cluded in the BSE matrix by using the rank-structured decompositions.
The properties of the Hadamard product imply that the matrix Z exhibits the rep-
resentation
Z = Io ⊗ Iv + LV LTV ⊙ (1 ⋅ dTε ) = INov + LV (LV ⊙ dε )T ,
186 | 12 Calculation of excitation energies of molecules

where the rank of the second summand does not exceed RV . Hence, the linear system
solver W = Z −1 V can be implemented by algorithms tailored to the DPLR structure by
adapting the Sherman–Morrison–Woodbury formula.
The computational cost for setting up the full BSE matrix F in (12.10) can be es-
2
timated by O(Nov ), which includes the cost O(Nov RB ) for generating the matrix V and
the dominating cost O(N 2 ) for setting up W.
ov
̂
We further rewrite the spectral problem (12.10) in the equivalent form

x A B x x
F1 ( n ) ≡ ( ∗ ) ( n ) = ωn ( n ) . (12.14)
yn −B −A∗ yn yn

It is worth noting that the so-called Tamm–Dancoff approximation (TDA) simpli-


fies equation (12.14) to the standard Hermitian eigenvalue problem

Axn = μn xn , xn ∈ ℝNov , A ∈ ℝNov ×Nov , (12.15)

with the factor two smaller matrix size Nov .


The main idea of the reduced basis approach presented here is as follows: Instead
of solving the partial eigenvalue problem for finding m0 eigenpairs thereby satisfy-
ing equation (12.14), we first solve the simplified auxiliary spectral problem with a
modified matrix F0 . The approximation F0 is obtained from F1 by using low-rank ap-
proximations of the parts Ŵ and W̃ of the matrix blocks A and B, respectively; that is,
A and B are replaced by

A 󳨃→ A0 := Δε + V − W
̂r and B 󳨃→ B0 := V − W
̃r , (12.16)

respectively. Here, we assume that the matrix V is already represented in the low-rank
format in the form (12.13).
The modified auxiliary problem reads

un A B0 u u
F0 ( ) ≡ ( 0∗ ) ( n ) = λn ( n ) . (12.17)
vn −B0 −A∗0 vn vn

This structured eigenvalue problem is much simpler than (12.10) since the matrix
blocks A0 and B0 , defined in (12.16), are composed of diagonal and low-rank matrices.
Figures 12.1 and 12.2 illustrate the structure of A0 and B0 submatrices in a BSE system
matrix.
Given the set of m0 eigenpairs

{(λn , ψn ) = (λn , (un , vn )T )},

computed for the modified (simplified) problem (12.17), we solve the full eigenvalue
problem for the reduced matrix obtained by the Galerkin projection of the initial equa-
tion onto the problem-adapted small basis set {ψn } of size m0 , {ψn } ∈ ℝ2Nov ×1 , n =
1, . . . , m0 . Here, the quantities λn represent the closest to zero eigenvalues of F0 .
12.4 The reduced basis approach using low-rank approximations | 187

Figure 12.1: The diagonal plus low-rank structure of A0 block in the modified BSE system matrix.

Figure 12.2: The low-rank structure of the block B0 in the modified BSE matrix.

Define a matrix

G1 = [ψ1 , ψ2 , . . . , ψm0 ] ∈ ℝ2Nov ×m0

whose columns are computed by the vectors in the reduced basis, and then compute
the stiffness and mass matrices by projection of the initial BSE matrix F1 onto the
reduced basis specified by the columns in G1 ,

M1 = G1T F1 G1 , S1 = G1T G1 ∈ ℝm0 ×m0 .

The projected generalized eigenvalue problem of small size m0 × m0 reads

M1 y = γn S1 y, y ∈ ℝm0 . (12.18)

The portion of eigenvalues γn , n = 1, . . . , m0 , computed by the direct diagonalization


is thought to be very close to the corresponding excitation energies ωn (n = 1, . . . , m0 )
in the initial spectral problem (12.10).
The reduced basis approach via low-rank approximation can be applied directly
to the TDA equation, such that the simplified auxiliary problem reads

A0 u = λn u, (12.19)

where we are interested in finding the m0 smallest eigenvalues.


Table 12.1 illustrates that the larger the size m0 of the reduced basis, the better the
accuracy of the lowest excitation energy γ1 , as expected [23].
188 | 12 Calculation of excitation energies of molecules

Table 12.1: The error |γ1 − ω1 | vs. the size of reduced basis, m0 .

m0 5 10 20 30 40 50

H2 O 0.025 0.025 0.014 0.01 0.01 0.005


N2 H4 0.02 0.02 0.015 0.015 0.015 0.005

Notice that the matrix W


̂ might have rather large ε-rank for small values of ε, which
increases the cost of high accuracy solutions. Numerical tests show (see Table 12.2)
that the ε-rank approximation to the matrix W
̂ with a moderate rank parameter leads
to a numerical error in the excitation energies of the order of few percents. For this
reason, the paper [23] studies another approximation strategy in which the rank-
approximation of the matrix W ̂ remains fixed, whereas the matrices V and W ̃ are
substituted by their adaptive ε-rank approximations. This approach only slightly
improves the numerical efficiency of the method.

Table 12.2: Accuracy (in eV) for the first eigenvalue, |γ1 − ω1 |, vs. ε-ranks for V , W
̂, and W
̃.

ε 4 ⋅ 10−1 2 ⋅ 10−1 10−1 10−2

H2 O |γ1 − ω1 | 0.27 0.27 0.21 2.1 ⋅ 10−4


ranks V ,Ŵ,W
̃ 6, 9, 6 13, 13, 10 25, 72, 36 60, 180, 92
N2 H4 |γ1 − ω1 | 0.38 0.38 0.27 1.6 ⋅ 10−4
ranks V ,Ŵ,W
̃ 11, 17, 11 26, 25, 15 49, 144, 54 117, 657, 196
C2 H5 OH |γ1 − ω1 | 0.81 0.81 0.4 1.6 ⋅ 10−4
ranks V ,Ŵ,W
̃ 16, 17, 14 39, 29, 20 71, 105, 74 171, 1430, 296

Matrix blocks in the auxiliary equation (12.17) are obtained by rather rough ε-rank ap-
proximation to the initial system matrix. However, we observe much smaller approxi-
mations error γn − ωn for solving the projected reduced basis system (12.18) compared
with that for auxiliary equation (12.17); see Figures 12.3 and 12.4.
Numerical tests indicate that the difference γn − ωn behaves merely quadratically
in the rank truncation parameter ε; see [23] for a more detailed discussion.
In the case of a symmetric matrix, the above-mentioned effect of “quadratic” con-
vergence rate can be justified by a well-known property of the quadratic error behavior
in the approximate eigenvalue, computed by the Rayleigh quotient with respect to the
perturbed eigenvector (vectors of the reduced basis ψn in our construction), compared
with the perturbation error in the eigenvector, which is of order O(ε). This beneficial
property may explain the efficiency of the reduced basis approach in this particular
application.
In the BSE formulation based on the Hartree–Fock molecular orbitals basis, we
may have a slight perturbation of the symmetry in the matrix block W; ̂ that is, the
12.4 The reduced basis approach using low-rank approximations | 189

Figure 12.3: Comparison of m0 = 30 lower eigenvalues for the reduced and exact BSE systems vs. ε
in the case of Glycine amino acid.

Figure 12.4: Comparison of m0 = 30 lower eigenvalues for the reduced and exact BSE systems for
H2 O molecule: ε = 0.6, left; ε = 0.1, right.

above argument does not apply directly. However, we observe the same quadratic er-
ror decay in all numerical experiments implemented so far. It is also worth noting that
due to the symmetry features of the eigenproblem, the approximation computed by
the reduced basis approach is always an upper bound of the true excitation energies
obtained from the full BSE model. Again, this is a simple consequence of the varia-
tional properties of the Ritz values being upper bounds on the smaller eigenvalues for
symmetric matrices. The “upper bound” character is also clearly visible in Figures 12.3
and 12.4.
Table 12.2 shows numerics for molecular systems H2 O (360 × 360), N2 H4 (1430 ×
1430), and C2 H5 OH (2860 × 2860), where the BSE matrix size is given in brackets. It
demonstrates the quadratic decay of the error |γ1 − ω1 | in the lowest excitation energy
with respect to the approximation error |λ1 −ω1 | for the modified auxiliary BSE problem
190 | 12 Calculation of excitation energies of molecules

(12.17). The error is controlled by the tolerance ε > 0 in the rank truncation procedure
applied to the BSE submatrices V, W, ̂ and W;̃ see [23] for the detailed discussion.

12.5 Approximating the screened interaction matrix in a


reduced-block format
Numerical results in [23] (see Table 12.2) show that using simple diagonal plus low-
rank structures for accurate approximation of the BSE system matrix we arrive at large
ranks for the representation of the screened matrix W, ̂ thus deteriorating computa-
tional efficiency. The remedy to this problem was found in [25]. It was proposed to
substitute the low-rank representation of this part of matrix A0 by a smaller-size ac-
tive sub-matrix of corresponding size.
This approach was motivated by the numerical consideration (observed for all
molecular systems considered so far) that eigenvectors in the central part of the spec-
trum have dominating components supported by a rather small part of the full index
set of size 2 Nov ; see Figure 12.5 corresponding to m0 = 30. Indeed, their effective sup-
port is compactly located at the first “active” indexes {1, . . . , NW } and in the cluster
{Nov + 1, . . . , Nov + NW } in the respective blocks, where NW ≪ Nov .

Figure 12.5: Visualizing the first m0 BSE eigenvectors for the H32 chain with NW = 554 (left) and
Glycine amino acid molecule with NW = 880 (right).

Following [25], we define the selected sub-matrix W ̂b in W


̂ by keeping the balance be-
tween the storage size for the active sub-block W ̂b and the storage for the matrix V.
Since the storage and numerical complexity of the rank-RV matrix V is bounded by
2 RV Nov , we control the size of the restricted NW × NW block W
̂b by the balancing re-
lation

NW ≈ CW √2 RV Nov , (12.20)

where the constant CW is close to 1. The approximation error introduced due to the
corresponding matrix truncation can be controlled by the choice of the constant CW .
12.5 Approximating the screened interaction matrix in a reduced-block format | 191

Keeping the diagonal in the matrix W


̂ unchanged, we define the simplified matrix
N
by W
̂ 󳨃→ W
̂N ∈ ℝ ov ov , where
W
×N

̂N (i, j) = {W(i, j), i, j ≤ NW or i = j,


̂
W (12.21)
W
0 otherwise.

The simplified matrix A


̂ is then given by

A 󳨃→ A
̂ := Δε + V − W
̂N ,
W
(12.22)

whereas the modified block B0 remains the same as in (12.16). The corresponding
structure of the simplified matrix A
̂ is illustrated in Figure 12.6.

Figure 12.6: Diagonal plus low-rank plus reduced-block structure of the matrix A.
̂

This construction guarantees that the storage and matrix–vector multiplication com-
plexity for the simplified matrix block A
̂ remains of the same order as that for the ma-
trix V, characterized by a low ε-rank. Table 12.3 demonstrates how the ratio NW /Nov
decreases with the increasing problem size.

Table 12.3: The ratio NW /Nov for some molecules.

Molecule H2 O H2 O2 N2 H4 C2 H5 OH H32 C2 H5 NO2 C3 H7 NO2

Nov 180 531 657 1430 1792 3000 4488


NW /Nov 0.63 0.5 0.4 0.3 0.32 0.29 0.25

We modify the simplified matrix

F0 󳨃→ F̂ by replacing A0 󳨃→ A
̂ in (12.17),

which leads to the corrections in the eigenvalues λn 󳨃→ λ̂n and eigenvectors G0 󳨃→ G


̂=
2Nov ×m0
[ψ1 , . . . , ψm0 ] ∈ ℝ by solving the simplified problem


̂ n = λ̂n ψn (12.23)
192 | 12 Calculation of excitation energies of molecules

defined by the low-rank plus block-diagonal approximation F̂ to the initial BSE ma-
trix F. The corresponding eigenvalues γ̂n of the modified reduced system (12.23) are
computed by direct solution of the small size reduced eigenvalue problem

Mq
̂ n = γ̂n Sq
̂ ,
n qn ∈ ℝm0 , (12.24)

where the Galerkin and stiffness matrices are specified by

M ̂T F G,
̂=G ̂ ̂T G
Ŝ = G ̂ ∈ ℝm0 ×m0 .

Table 12.4 illustrates the decrease of the approximation error of the simplified and
reduced BSE problems by the order of magnitude.

Table 12.4: Accuracies (in eV) of eigenvalues for the reduced BSE problem via simple low-rank ap-
proximation |ω1 − γ1 | and for block diagonal plus low-rank approximation to BSE matrices |ω1 − γ̂1 |
with the ϵ = 0.1.

Molecule H2 O N2 H4 C2 H5 OH C2 H5 NO2 C3 H7 NO2

BSE size 3602 13142 26602 60002 89762


|ω1 − γ1 | 0.2 0.27 0.4 0.38 0.53
|ω1 − γ̂1 | 0.02 0.03 0.08 0.05 0.1

Proposition 12.4 ([25]). The numerical results indicate the important property observed
for all molecular systems tested so far: the close to zero eigenvalues λ̂k and γ̂k provide
lower and upper bounds for the exact BSE eigenvalues ωk ; that is,

λ̂k ≤ ωk ≤ γ̂k , k = 1, 2, . . . , m0 ≤ m0 .

The upper bound via the eigenvalues γ̂k can be explained by the variational form
of the reduced problem setting. However, the understanding of the lower bound prop-
erty, when using the output λ̂k from the simplified system addresses an interesting
open problem.
Figure 12.7 demonstrates the two-sided error estimates declared in Proposi-
tion 12.4. Here, the “black” line represents the eigenvalues for the auxiliary problem
(12.17), but with the modified matrix F,
̂ whereas the blue line represents the eigenval-
ues of the reduced equation (12.24) of type (12.18) with the Galerkin matrices M ̂ and
S.
̂ We observe a considerable decrease of the approximation error for both simplified
and reduced problems with the diagonal plus low rank plus small block approach for
submatrix A as compared with the error of the straightforward diagonal plus low-rank
approach presented in Figures 12.3 and 12.4.
12.5 Approximating the screened interaction matrix in a reduced-block format | 193

Figure 12.7: Two-sided bounds for the BSE excitation energies for the H32 chain (left) and C2 H5 NO2
molecule (right).

Figure 12.8: Two-sided error bounds: The errors (in eV) in m0 smallest eigenvalues for simplified and
reduced schemes; N2 H4 molecule (left) and Glycine amino acid C2 H5 NO2 (right).

Figure 12.8 represents examples of upper and lower bounds, i. e., λ̂k − ωk and ωk − γ̂k ,
for the whole sets of m0 ≤ 250 eigenvalues for larger molecules. We observe that the
lower bound is violated only by few larger excitation energies at the level below the
truncation error ϵ.
We conclude that the reduced basis approach, based on the modified auxiliary
matrix M̂ via reduced-block approximation (12.22), provides considerably better accu-
racies ωk − γ̂k than that for γk corresponding to matrix M0 . Table 12.4 compares the
accuracies |ω1 − γ1 | for the first eigenvalues of the reduced BSE problem based on the
straightforward low-rank approximation from equation (12.18) with accuracies |ω1 − γ̂1 |
resulting from combined block plus low-rank approximation all computed for several
molecules.
194 | 12 Calculation of excitation energies of molecules

12.6 Inverse iteration for diagonal plus low-rank matrix


In this section, following [25], we discuss the efficient structural eigenvalue solver
for problem (12.23). Iterative eigenvalue solvers, such as Lanczos or Jacobi–Davidson
methods, are quite efficient in approximation of the largest eigenvalues, but may suf-
fer from slow convergence if applied to computation of the smallest or intermediate
eigenvalues. We are interested in both of these scenarios. There are both positive and
negative eigenvalues in (12.17), and we need the few with the smallest magnitude. In
the TDA model (12.15), we solve a symmetric positive definite problem A0 u = λn u, but
again, the smallest eigenvalues are required.
In both cases, the remedy is to invert the system matrix so that the eigenvalues of
interest become largest. The MATLAB interface to ARPACK (procedure eigs) assumes
by default that the user-defined function solves a linear system with the matrix instead
of multiplying it when the smallest eigenvalues are requested. In our case, we can im-
plement this efficiently since the matrix consists of an easily invertible part (diagonal
or block-diagonal), plus a low-rank correction, and hence we can use the Sherman–
Morrison–Woodbury formula [269].
To shorten the notation, we set up the rank-r decompositions W ̂r = LW LT and
W
T
Wr = YZ , and we define
̃

A0 = Δε + PQT , P = [LV LW ] , Q = [LV −LW ] ,


(12.25)
B0 = ΦΨT , Φ = [LV Y] , Ψ = [LV −Z] ,

taking into account (12.5).


First, consider the TDA model (12.15). The Sherman–Morrison–Woodbury formula
for A0 in (12.25) reads

T T
A−1
0 = Δε − Δε P(I + Q Δε P) Q Δε . (12.26)
−1 −1 −1 −1 −1

Here, the 2r×2r core matrix K = (I +QT Δε−1 P)−1 is small and can be computed explicitly
at the expense 𝒪(r 3 +r 2 Nov ). Hence, the matrix–vector product A−1
0 un requires multipli-
cation by the diagonal matrix Δε−1 and the low-rank matrix in the second summand.
This amounts to the overall cost 𝒪(Nov r). To invert the matrix F0 in the simplified BSE,
we first derive its LU decomposition,

A B0 A 0 I A−1
0 B0
F0 = [ 0T ] = [ 0T ][ ], S = −AT0 + BT0 A−1
0 B0 . (12.27)
−B0 −AT0 −B0 I 0 S

To solve a system

z u
F0 [ ] = [ ] ,
y v
12.6 Inverse iteration for diagonal plus low-rank matrix | 195

we need one action of A−1


0 and of the inverse of the Schur complement S . Indeed,
−1

z̃ = A−1
0 u, ỹ = v + BT0 z,̃
(12.28)
y = S−1 y,̃ z = z̃ − A−1
0 B0 y.

Note that A−10 B0 is a low-rank matrix and can be precomputed in advance. The action
of A−1
0 is given by (12.26), so we address now the inversion of the Schur complement.
Plugging (12.26) into S, we obtain

S = −Δε − QP T + ΨΦT A−1 T T


0 ΦΨ = −(Δε + QS PS ),

where

QS = [Q Ψ(ΦT Δε−1 PKQT Δε−1 Φ − ΦT Δε−1 Φ)] , PS = [P Ψ] . (12.29)

Therefore,

S−1 = −(Δε−1 − Δε−1 QS KS PST Δε−1 ), KS = (I + PST Δε−1 QS ) . (12.30)


−1

Keeping intermediate results in these calculations, we can trade off the memory
against the CPU time. The computational cost of (12.29), and then (12.30), is again
bounded by 𝒪(r 2 Nov ), whereas the implementation of (12.28) takes 𝒪(rNov ) opera-
tions.

Table 12.5: Times (s) for eigenvalue problem solvers applied to simplified TDA matrix A0 (“−” means
that iterations did not converge).

Molecular syst. H2 O N2 H4 C2 H5 OH H32 C2 H5 NO2 H48 C3 H7 NO2


2 2 2 2 2 2
TDA matrix size 180 657 1430 1792 3000 4032 44882
eig(A0 ) 0.02 0.5 4.3 9.8 37.6 91 127.4
lobpcg(A0 ) 0.22 0.6 5.4 2.77 18.2 5.6 34.2
lobpcg(inv(A0 )) 0.03 0.06 0.15 0.5 0.53 0.5 1.4
eigs(A0 ) 0.07 0.29 1.7 0.49 − − −
eigs(inv(A0 )) 0.05 0.08 0.17 0.11 0.32 0.34 0.5

Precomputation of intermediate matrices and their use in the structured matrix inver-
sion are shown in Algorithms 1 and 2 in [25]. Table 12.5 compares CPU times (sec) for
full eig and the rank-structured iteration for TDA problem (12.15) in Matlab implemen-
tation [25]. The rank-truncation threshold is ε = 0.1; the number of computed eigen-
values is m0 = 30. The bottom line shows the CPU times (sec) of the eigs procedure
applied with the inverse matrix–vector product A−1 0 u marked by “inv”. The other lines
show results of the corresponding algorithms, which used the traditional product A0 u
(A0 in the diagonal plus low-rank form). Notice that the results for Matlab version of
196 | 12 Calculation of excitation energies of molecules

LOBPCG by [190] are presented for comparison. We see that the inverse-based method
is superior in all tests.
Notice that the initial guess for the subspace iteration applied to the full BSE can
be constructed, thereby replicating the eigenvectors computed in the TDA model. It
provides rather accurate approximation to the exact eigenvectors for the initial BSE
system (12.14). In [23] it was shown numerically that the TDA approximation error |μn −
ωn | of order 10−2 eV is achieved for the compact and extended molecules presented in
Table 12.5.
Table 12.6 compares CPU times (sec) for the full eig-solver, and the rank-structured
eigs-iteration applied to the inverse of simplified rank-structured BSE system (12.17);
see [25] for more detail.

Table 12.6: Times (s) for the simplified rank-structured BSE matrix F0 .

Molecule H2 O N2 H4 C2 H5 OH H32 C2 H5 NO2 H48 C3 H7 NO2

No , Nb 5, 41 9, 82 13, 123 16, 128 20, 170 24, 192 24, 211
BSE matrix size 3602 13142 28602 35842 60002 80642 89762
eig(F0 ) 0.08 4.2 33.7 68.1 274 649 903
eigs(inv(F0 )) 0.13 0.28 0.7 0.77 2.2 2.3 3.9

12.7 Inversion of the block-sparse matrices


If the matrix ŴN is kept in the block-diagonal form as in (12.21)–(12.22), inverting of
W
A = Δε +V − WNW is also easy; the same applies to the case (12.16). The same Sherman–
̂ ̂
Morrison–Woodbury scheme can be used as in Algorithms 1 and 2 in [25]. To that end,
we aggregate ΔεW = Δε − W ̂N , whereas in the low-rank factors only P = Q = LV
W
remains. After that, all calculations as in Subsection 12.6 are retained unchanged just
by replacing all Δε by ΔεW , where the latter is now a block-diagonal matrix.
The particular simple modifications for the enhanced algorithm are as follows (see
[25]): Let us split Δε = blockdiag(Δε1 , Δε2 ), where Δε1 has the size NW , and Δε2 ∈
ℝNW ×NW with NW = Nov − NW representing the remaining values. The same applies to
󸀠 󸀠
󸀠

W
̂N = blockdiag(Wb , diag(w2 )), where w2 contains the elements on the diagonal of
W
W,
̂ which do not belong to Wb . Then the implementation of the matrix inverse

Δε−1
W = blockdiag((Δε1 − Wb ) , (Δε2 − diag(w2 )) ) (12.31)
−1 −1

requires inversion of an NW × NW dense matrix, and a diagonal matrix of size NW 󸀠


=
Nov −NW . Since NW is chosen small, the complexity of this operation is moderate. Now
12.7 Inversion of the block-sparse matrices | 197

all steps requiring multiplication with Δε−1 in Algorithms 1, 2 in [25] can be substituted
by (12.31). The numerical complexity of the new inversion scheme is estimated in the
following lemma.

Lemma 12.5 ([25] [Complexity of the reduced-block algorithm]). Suppose that the rank
parameters in the decomposition of V and W ̃ do not exceed r and that the block-size NW
is chosen from equation (12.20). Then the rank structured plus reduced-block represen-
tations of the inverse matrices  −1 and F̂ −1 can be set up with the overall cost 𝒪(N 3/2 r 3/2 +
ov
Nov r 2 ). The complexity of each inversion A ̂ −1 u or F̂ −1 w is bounded by 𝒪(Nov r).
3
Proof. Inversion of the NW × NW dense block in (12.31) requires 𝒪(NW ) operations.
Hence, condition (12.20) ensures that the cost of setting up the matrix (12.31) is
3/2 3/2
bounded by 𝒪(Nov r ). After that, multiplication of (12.31) by an Nov × r matrix
2
requires 𝒪(NW r + NW 󸀠
r) = 𝒪(Nov (r 2 + r)) operations. Multiplication of (12.31) by a
2
vector is performed with 𝒪(NW + NW 󸀠
) = 𝒪(Nov r) cost. The complexity of the other
steps is the same as for diagonal plus low-rank approach.

Numerical illustrations for the enhanced data sparsity via block-diagonal plus
low-rank approximation are presented in Table 12.7.

Table 12.7: Block-sparse matrices: times (s) for eigensolvers applied to TDA and BSE systems. The
bottom line shows the error (eV) for the case of block-sparse approximation to the diagonal matrix
block A,
̂ ε = 0.1.

Molecular syst. H2 O N2 H4 C2 H5 OH H32 C2 H5 NO2 H48 C3 H7 NO2


2 2 2 2 2 2
BSE matrix size 360 1314 2860 3584 6000 8064 89762
eigs(inv(A))
̂ 0.07 0.09 0.25 0.77 0.54 3.0 1.0
eigs(inv(F̂ )) 0.21 0.37 1.11 1.10 2.4 2.92 4.6
BSE vs. F̂ : |γ̂1 − ω1 | 0.02 0.03 0.08 0.07 0.05 0.10 0.1

Notice that the performance of the diagonal plus low-rank and block-sparse plus low-
rank solvers is comparable, but the second one provides better sparsity and higher
accuracy in the computed eigenvalues (see Section 12.5). It is remarkable that the ap-
proach, based on the inverse iteration applied to the low-rank plus reduced-block ap-
proximation, outperforms the full eigenvalue solver by several orders of magnitude
(see Tables 12.6 and 12.7).
The data in previous tables correspond to the choice m0 = 30. Figure 12.9 indicates
a merely linear increase in the computational time for the eigs(inv(F))̂ solver with
respect to the increasing value of m0 .
198 | 12 Calculation of excitation energies of molecules

Figure 12.9: CPU times vs. m0 for N2 H4 (dashed


line), C2 H5 NO2 (solid line), and C2 H5 OH (doted line)
molecules.

12.8 Solving BSE spectral problems in the QTT format


Solving the BSE problem in the QTT format [167] was introduced in [25]. In this ap-
proach, reduction of the numerical cost in the case of large system size is achieved
by adapting the ALS-type iteration (in particular, the DMRG iteration) for comput-
ing the eigenvectors in the block-QTT tensor representation [70]. Application of the
QTT-approximation is motivated by the observation, known from [150], (see also Sec-
tion 11.8). Accordingly, the generating Cholesky factors in the TEI tensor exhibit the
average QTT-ranks proportional only to the number of occupied orbitals in the molec-
ular system No , but they do not depend on the total BSE matrix size 𝒪(Nb2 ).
For eigenvectors in the block-QTT format, the QTT ranks are even smaller; typ-
ically they are proportional to the number of computed eigenvectors, which makes
this approach to solving the BSE eigenvalue problem very competitive. Contrary to the
conventional QTT matrix representations in [25], only the columns in the Cholesky
factor of a low-rank part in the BSE matrix are approximated in the QTT format, thus
keeping the low-rank form V = LLT and low rank QTT structure for the long column-
vectors in L simultaneously. This allows avoiding the prohibitive increase of the QTT
matrix rank. See additional detail in [25].
Table 12.8 illustrates that for the TDA model applied to single molecules and to
molecular chains, the average QTT ranks computed for the columns in the LV factor
in (12.5), and for each of m0 = 30 TDA-eigenvectors (corresponding to the smallest
eigenvalues), are almost equal or even smaller than the number of occupied molecular
orbitals No in the system under consideration. Notice that these results are obtained
by the QTT compression of each column from LV or eigenvectors separately.
Figure 12.10 indicates that the behavior of the QTT ranks in the columns of the
LV -factor reproduces the system size Nov in terms of No on the logarithmic scale.
Recall that in the case of single molecules the commonly used number of GTO ba-
sis functions satisfy the relation Nb /No ≥ CGTO ≈ 10 (see examples below), which im-
plies the asymptotic behavior Nov ≈ CGTO No2 . Hence, the QTT rank estimate rQTT ≈ No
obtained above leads to the following asymptotic complexity of the QTT-based tensor
12.8 Solving BSE spectral problems in the QTT format | 199

Table 12.8: Average QTT ranks of the column vectors in LV and the m0 eigenvectors (corresponding to
the smallest eigenvalues) in the TDA problem.

Molecular syst. H2 O H16 N2 H4 C2 H5 OH H32 C2 H5 NO2 C3 H7 NO2

No 5 8 9 13 16 20 24
QTT ranks of LV 5.4 7 9.1 12.7 14 17.5 21
QTT ranks of eigenvect. 5.3 7.6 9.1 12.7 13.6 17.2 20.9
Nov 180 448 657 1430 1792 3000 4488

Figure 12.10: QTT ranks (left) and Nov on logarithmic scale (right) vs. No .

solver:

2 2
𝒲BSE = 𝒪(log(Nov )rQTT ) = 𝒪(log(No )No ), (12.32)

which is asymptotically on the same scale (but with smaller prefactor) as that for
the data-structured algorithms based on full-vector arithmetics (see Sections 12.6
and 12.7).
The high-precision Hartree–Fock calculations may require much larger GTO basis
sets so that the constant CGTO may increase considerably. In this situation, the QTT-
based tensor approach seems to outperform the algorithms in full-vector arithmetics.
An even more important consequence of (12.32) is that the rank behavior rQTT ≈ No
indicates that the QTT tensor-based algorithm has memory requirements and alge-
braic complexity of order 𝒪(log(No )No2 ), depending only on the fundamental physical
characteristics of the molecular system; the number of occupied molecular orbitals No
2
(but not on the system size Nov ). This remarkable property traces back to the similar
feature observed in [157, 150]; that is, QTT ranks of the column vectors in the low-rank
Cholesky factors in the TEI matrix are proportional to No (about 3 No ).
Based on the previous discussion, we introduce the following hypothesis.

Hypothesis 1. Estimate (12.32) determines the irreducible lower bound on the asymp-
totic algebraic complexity of the large-scale BSE eigenvalue problems.
200 | 12 Calculation of excitation energies of molecules

The CPU times for QTT calculations are comparable or smaller than the time of
the best Sherman–Morrison–Woodbury inversion methods in the previous sections,
as demonstrated in Table 12.9 (cf. Table 12.7). Recall that the row referred to as “abso-
m0
lute error” in Table 12.9 represents the quantity ‖μqtt − μ⋆ ‖ = (∑m=1 (μqtt,m − μ⋆,m )2 )1/2
characterizing the total absolute error in the first m0 eigenvalues calculated in the
Euclidean norm. The QTT format provides also a considerable reduction of memory
needed to store eigenvectors.

Table 12.9: Time (s) and absolute error (eV) for QTT-DMRG eigensolvers for TDA matrix.

Molecular syst. C2 H5 OH H32 C2 H5 NO2 H48 C3 H7 NO2


2 2 2 2
TDA size 1430 1792 3000 4032 44882
time QTT eig 0.14 0.23 0.32 0.28 0.63
abs. error (eV) 0.08 0.19 0.17 0.14 0.00034

We now summarize the important result of this section: Lower bound on the asymp-
totic algebraic complexity O(No2 ), confirmed by extensive numerical experiments,
means that solving the BSE system in the QQT tensor format leads to numerical com-
plexity O(No2 ), which explicitly indicates the dependence on the number of electrons
in a system. This seems to be the asymptotically optimal cost for solving large-scale
BSE eigenvalue problems.
Notice that in recent years the analysis of eigenvalue problem solvers for large
structured matrices has been widely discussed in the linear algebra community [20,
19, 22]. Tensor-structured approximation of elliptic equations with quasi-periodic co-
efficients has been considered in [180, 181].
13 Density of states for a class of rank-structured
matrices
In this section, we discuss the new numerical approach to approximation of the den-
sity of states (DOS) of large rank-structured symmetric matrices. This approach was
recently introduced in [27] in the application to estimation of the optical spectra of
molecules in the framework of the BSE and TDA calculations; see the discussion in
Section 12.1. In this application, the block-diagonal plus low-rank matrix structures
arise in the representation of the symmetric TDA matrix. Here, we sketch the tech-
niques for fast DOS calculation applied to the general class of rank-structured matri-
ces.
Several methods for calculating the density of states were originally developed in
condensed matter physics [74, 301, 285, 73, 296], and now this topic is also considered
in numerical linear algebra community [290, 98, 283]. We refer to a recent survey on
commonly used methodology for approximation of DOS for large matrices of general
structure [204]. The traditional methods for approximating DOS are usually based on a
polynomial or fractional-polynomial interpolation of the exact DOS function, regular-
ized by Gaussians or Lorentzians and subsequent computing traces of certain matrix-
valued functions, for example, matrix resolvents or polynomials calculated at a large
set of interpolation points within the spectral interval of interest. The trace calcula-
tions are typically executed by using the heuristic stochastic sampling over a large
number of random vectors [204].
The sizes of matrices arising in quantum chemistry and molecular dynamics com-
putations are usually large, scaling polynomially in the size of a molecular system,
whereas the DOS for these matrices often exhibits very complicated shapes. Hence,
the traditional approaches mentioned above become prohibitively expensive. More-
over, the algorithms based on polynomial type or trigonometric interpolants have poor
approximating properties when the spectrum of a matrix exhibits gaps or highly os-
cillating non-regular shapes, as is often the case in electronic structure calculations.
Furthermore, stochastic sampling relies on Monte Carlo-type error estimates charac-
terized by slow convergence rates and, as result, by low accuracy.
The method presented in [27] to approximate the DOS of Tumm–Dankoff (TDA)
Hamiltonian applies to a class of rank-structured matrices, particularly, to the block-
diagonal plus low-rank BSE/TDA matrix structures described in [23, 25]. It is based
on the Lorentzian blurring [124] such that the most computationally expensive part
of the calculation is reduced to the evaluation of traces of the shifted matrix inverses.
Fast method is presented for calculating traces of parametric matrix resolvents at in-
terpolation points by taking an advantage of the block-diagonal plus low-rank matrix
structure. This allows us to overcome the computational difficulties of the traditional
schemes and avoid the need of stochastic sampling.

https://doi.org/10.1515/9783110365832-013
202 | 13 Density of states for a class of rank-structured matrices

Furthermore, it is shown in [27] that a regularized DOS can be accurately approx-


imated by a low rank QTT tensor [167] that can be determined through a least squares
procedure. As a result, an accurate approximation to the regularized DOS living on a
large representation grid on the whole spectral interval is realized by the low-rank QTT
tensor interpolation, calculated by the adaptive cross approximation in the TT format
[230]. In what follows, we show that similar techniques can be applied for the effi-
cient computation of DOS for the general classes of rank-structural matrices arising,
in particular, in the numerical simulations of lattice-type and quasi-periodic systems.

13.1 Regularized density of states for symmetric matrices


Here we consider the class of symmetric matrices. Following [204], we use the stan-
dard definition of the DOS function for symmetric matrices

1 n
ϕ(t) = ∑ δ(t − λj ), t, λj ∈ [0, a], (13.1)
n j=1

where δ is the Dirac delta, and λj ’s are the eigenvalues of the symmetric matrix A =
AT ∈ ℝn×n ordered as λ1 ≤ λ2 ≤ ⋅ ⋅ ⋅ ≤ λn .
Several classes of blurring approximations to ϕ(t) have been considered in the
literature. One can replace each Dirac-δ by a Gaussian function with a small width
η > 0, i. e.,

1 t2
δ(t) 󴁄󴀼 gη (t) = exp (− 2 ),
√2πη 2η

where the choice of the regularization parameter η depends on the particular problem
setting. As a result, (13.1) can be approximated by

1 n
ϕ(t) 󳨃→ ϕη (t) := ∑ g (t − λj ), (13.2)
n j=1 η

on the whole energy interval [0, a]. Another option is the replacement of each Dirac-δ
by a Lorentzian with a small width η > 0, i. e.,

1 η 1 1
δ(t) 󴁄󴀼 Lη (t) := = Im( ) (13.3)
π t 2 + η2 π t − iη

providing an approximate DOS in the form

1 n
ϕ(t) 󳨃→ ϕη (t) := ∑ L (t − λj ). (13.4)
n j=1 η
13.1 Regularized density of states for symmetric matrices | 203

As η → 0+ , both Gaussians and Lorentzians converge to the Dirac distribution:

lim gη (t) = lim Lη (t) = δ(t).


η→0+ η→0+

Both functions ϕη (t) and Lη (t) are continuous. Hence, they can be discretized by
sampling on a fine grid Ωh over [0, a], which is assumed to be the uniform cell-centered
N-point grid with the mesh size h = a/N.
In what follows, we focus on the case of Lorentzians blurring. First, we consider
the class of matrices that can be accurately approximated by a block-diagonal plus
low-rank ansatz (see [23, 25]), which allows efficient explicit representation of the
shifted inverse matrix.
The numerical illustrations below represent the DOS for H2 O molecule broad-
ened by Gaussians (13.2). The data correspond to the reduced basis approach via rank-
structured approximation applied to the symmetric TDA model [23, 25] described by
the symmetric matrix block A of the full BSE system matrix; see Section 12. Figure 13.1
(left) represents DOS for H2 O computed by using the exact TDA spectrum (blue) and
its approximation based on simplified model via low-rank approximation to A (red),
whereas the right figure shows the relative error. This suggests that DOS for the initial
matrix A of general structure can be accurately approximated by DOS calculated for
its structural diagonal plus low rank approximation.

Figure 13.1: DOS for H2 O. Exact TDA vs. simplified TDA (left); zoom of the small spectral interval
(right).

Let us briefly illustrate another example of DOS functions arising stochastic homoge-
nization theory. The numerical examples below have been implemented in [158].
Spectral properties of the randomly generated elliptic operators play an impor-
tant role in the analysis of average quantities in stochastic homogenization. Here, we
follow [158] and present the average behavior of the density of spectrum for the family
of randomly generated 2D elliptic operators {Am } for the large sequence of stochastic
204 | 13 Density of states for a class of rank-structured matrices

Figure 13.2: Density of states for a number of stochastic processes M = 1, 2, . . . , 20 with L = 4 (left)
and L = 8 (right) for λ = 0.5, n0 = 8, and α = 0.25.

realizations. The DOS provides the important spectral characteristics to the differen-
tial operator that accumulates the crucial information on the static and dynamical
characteristics of the complex physical or molecular system. In particular, the numer-
ics below demonstrate the convergence of DOS to the sample average function at the
limit of large number of stochastic realizations with the fixed size of the so-called rep-
resentative volume element L; see [158] for more detail.
Figure 13.2 represents DOS for a sequence of M = 1, 2, . . . , 20 stochastic realiza-
tions on L × L lattice with L = 4, 8 from left to right, corresponding to the fixed model
parameters. The numerical experiments show that the DOS of the stochastic operator
is represented by the rather complicated functions whose numerical approximation
might be the challenging task.

13.2 General overview of commonly used methods


One of the commonly used approaches to the numerical approximation of the func-
tion Lη (t) is based on the construction of certain polynomial or fractional polynomial
interpolant whose evaluation at each sampling point tk requires solving a large linear
system with the shifted matrix A, that is, it remains computationally expensive. In the
case of Lorentzians broadening (13.4), the regularized DOS takes the form [204]

1 n 1 1
ϕ(t) 󳨃→ ϕη (t) := ∑ Im( )= Im Trace[(tI − A − iηI)−1 ]. (13.5)
nπ j=1 (t − λj ) − iη nπ

To keep real-valued arithmetics, one can use the equivalent form

1 n η 1
ϕη (t) := Trace[((tI − A)2 + η2 I) ], (13.6)
−1
∑ =
nπ j=1 (t − λj )2 + η2 nπ

which includes only the real-valued matrix functions.


13.3 Computing trace of a rank-structured matrix inverse | 205

The advantage of representations (13.5) and (13.6) is that in both cases computing
the DOS in the form ϕη (t) allows avoiding the explicit information on the matrix spec-
tra. Indeed, the initial task reduces to approximating the trace of the matrix resolvent

f1 (A) = (tI − A − iηI)−1 or f2 (A) = ((tI − A)2 + η2 I) . (13.7)


−1

The traditional approach [204] to approximately computing the traces of the


matrix-valued analytic function f (A) reduces this task to the estimation of the mean
of vTm f (A)vm over a sequence of random vectors vm , m = 1, . . . , mr that satisfy

𝔼[vm ] = 0, 𝔼[vm vTm ] = I.

That is, Trace[f (A)] is approximated by


m
1 r T
Trace[f (A)] ≈ ∑ v f (A)vm . (13.8)
mr m=1 m

The calculation of (13.8) for f1 (A) and f2 (A), given by (13.7), reduces to solving linear
systems in the form of

(tI − iηI − A)x = vm for m = 1, . . . , mr (13.9)

or

(η2 I + (tI − A)2 )x = vm for m = 1, . . . , mr . (13.10)

These linear systems have to be solved for many target points t = tk ∈ [a, b] in the
course of a chosen interpolation scheme and the subset of spectrum of interest.
In the case of rank-structured matrices A, the solution of equations (13.9) or (13.10)
can be implemented with a lower cost. However, even in this favorable situation one
requires a relatively large number mr of stochastic realizations to obtain satisfactory
mean value approximation. Indeed, following the central limit theorem, the conver-
gence rate is expected to be of order O(1/√mr ) at the limit of large number of stochas-
tic realizations. On the other hand, with the limited number of interpolation points,
the polynomial type of interpolation schemes applied to highly non-regular shapes as
shown, for example, in Figure 13.1 (left), can only provide the poor resolution and it is
unlikely to reveal spectral gaps and many local peaks of interest.

13.3 Computing trace of a rank-structured matrix inverse


In what follows, we discuss an approach that is based on evaluating the trace term
in (13.5) directly (i. e., without stochastic sampling) provided that the target matrix
allows special rank structured representation (approximation).
206 | 13 Density of states for a class of rank-structured matrices

Definition 13.1. We consider the class of n × n rank-structured matrices in the form

A = E + PP T with P ∈ ℝn×R , (13.11)

where the symmetric matrix E allows an efficient computation of


– the traces of inverse matrices, trace[E −1 ] and trace[E −2 ] and
– the matrix–vector product with E −1

both at the cost O(n) up to some logarithmic factor. For numerical efficiency, the rank
parameter R is supposed to be small compared with the matrix size, that is, R ≪ n.

The rank-structured matrices (13.11) in Definition 13.1 arise in various applications.

Remark 13.2. Definition 13.1 applies, in particular, to the following classes of matrices
E in (13.11):
(A) E = blockdiag{B0 , D0 }, which arises when using the low-rank BSE matrix structure
as in [23, 25] (see Section 12.1).
(B) E is the multilevel block circulant matrix arising in Hartree–Fock calculations for
slightly perturbed periodic lattice-structured systems, [154].
(C) E represents homogenized matrix for the FEM-Galerkin approximation to ellip-
tic operators with quasi-periodic coefficients arising, for example, in geomet-
ric/stochastic homogenization theory; see [181, 158].

In what follows, we use the notation 1m for a length-m vector of all ones. The fol-
lowing simple result that generalizes [27] Theorem 3.1 to the more general class of
matrices, describes an efficient numerical scheme for calculation of traces of rank-
structured matrices specified by Definition 13.1 and asserts that the corresponding cost
is estimated by O(nR2 ).

Lemma 13.3. For the matrix A of the form (13.11), the trace of the matrix inverse A−1 can
be calculated explicitly by

trace[A−1 ] = trace[E −1 ] − 1Tn (U ⊙ V)1R

where U = E −1 PK −1 ∈ ℝn×R , V = E −1 P ∈ ℝn×R , and a small R × R symmetric core matrix


is given by

K = IR + P T E −1 P.

Let K ≥ 0. Then the “symmetric” representation of the trace reads

trace[A−1 ] = trace[E −1 ] − 1Tn (U ⊙ U)1R ,

where

U = E −1 PK −1/2 ∈ ℝn×R .

The numerical cost is estimated by O(nR2 ) up to low-order term.


13.3 Computing trace of a rank-structured matrix inverse | 207

Proof. The proof follows the arguments similar to that in Theorem 3.1, [27]. The anal-
ysis relies on the particular favorable structure of the matrix E described in Definition
13.1. Indeed, we use the direct trace representation for both rank-R and inverse matri-
ces E −1 . The argument is based on the simple observation that the trace of a rank-R
matrix UV T , where U, V ∈ ℝn×R , U = [u1 , . . . , uR ], V = [v1 , . . . , vR ], uk , vk ∈ ℝn , can be
calculated in terms of skeleton vectors by
R
trace[UV T ] = ∑ ⟨uk , vk ⟩ = 1Tn (U ⊙ V)1R (13.12)
k=1

at the expense O(Rn). Now define the rank-R matrices by

U = E −1 PK −1 , V = E −1 Q.

Then the Sherman–Morrison–Woodbury scheme leads to the representation

A−1 = E −1 − UV T = E −1 − E −1 PK −1 QT E −1 ,

where U = E −1 PK −1 and V = QT E −1 . In the symmetric version, we have U = V =


E −1 PK −1/2 . Now we apply formula (13.12) representing the trace of a rank-R matrix to
obtain the desired representation.
The complexity estimate follows by the assumptions on the matrix E.
The above representation has to be applied many times for calculating the trace of

B = B(t) = tI − A − iηI or B = B(t) = (tI − A)2 + η2 I

at each interpolating point t = tm , m = 1, . . . , M.


We consider the case of real arithmetics that corresponds to the choice

B = B(t) = (tI − A)2 + η2 I.

We notice that the price to pay for the real arithmetics in equation (13.10) is the com-
putation with squared matrices, which, however, does not deteriorate the asymptotic
complexity since there is no increase of the rank parameter in the rank-structured rep-
resentation of the target matrix; see Lemma 13.4, which is the respective modification
of Theorem 3.2 in [27]. In what follows, we denote by [U, V] the concatenation of two
matrices of compatible size.

Lemma 13.4. Given the matrix B(t) = (tI − A)2 + η2 I, where A is defined by (13.11), the
trace of the real-valued matrix resolvent B−1 (t) can be calculated explicitly by

trace[B−1 ] = trace[Ê −1 ] − 1Tn (U


̂ ⊙ V)1
̂ 2R (13.13)

with

U ̂ −1 ∈ ℝn×2R
̂ = Ê −1 PK ̂ ∈ ℝn×2R ,
̂ = Ê −1 Q
and V
208 | 13 Density of states for a class of rank-structured matrices

where the real-valued matrix Ê is given by

̂ = (η2 + t 2 )I − 2tE + E 2 ,
E(t)

̂ Q
and the rank-2R matrices P, ̂ are represented via concatenation

̂ = [−2tQ + EQ + QE + Q(QT Q), Q] ∈ ℝn×2R ,


P ̂ = [Q, EQ] ∈ ℝn×2R ,
Q

such that the small core matrix K(t) ∈ ℝ2R×2R takes the form K(t) = IR + Q
̂T Ê −1 (t)P.
̂
2
The numerical cost is estimated by O(nR ) up to a low-order term.

Proof. Given the block-diagonal plus low-rank matrix A in the form (13.11), we obtain

B = (tI − A)2 + η2 I = Ê + P
̂Q̂T , (13.14)

where the block-diagonal matrix Ê and the rank-2R matrix P ̂T are defined as above.
̂Q
We apply the Sherman–Morrison–Woodbury scheme to the structured matrix B, then
Theorem 13.3 implies the desired representation. Now we take into account that Ê is
the matrix polynomial in E(t) of degree 2; then the assumptions on the trace properties
of E prove the complexity bound.

Based on Lemmas 13.3 and 13.4, the calculation of DOS can be implemented effi-
ciently in real arithmetics. Notice that a similar statement to Lemma 13.4 holds in the
case of complex arithmetics; see the discussion in [27], Theorem 3.1.
The following numerics demonstrate the efficiency of DOS calculations for the
rank-structured TDA matrix in the form (13.13) implemented in real arithmetics (MAT-
LAB). In this case the initial block-diagonal matrix E is given by E = blockdiag{B0 , D0 }
as described in Section 12.1.
Figure 13.3 illustrates that using only the structure-based trace representation
(13.13) in Lemma 13.4, we obtain the approximation that resolves perfectly the DOS

Figure 13.3: Left: DOS for H2 O vs. its recovering by using the trace of matrix resolvents; Right: zoom
in the small energy interval.
13.4 QTT approximation of DOS via Lorentzians: rank bounds | 209

Figure 13.4: The rescaled CPU time vs. n = Nov


for algorithm in Theorem 13.4.

function on the examples of H2 O molecule. See [27] for numerical examples for several
moderate size molecules.
Figure 13.4 shows the rescaled CPU time, that is, T/R, where T denotes the total
CPU time for computing DOS by the algorithm implementing (13.13). We applied the
algorithm to the different system size n (i. e., the size of TDA matrices considered in
Section 12.1), varying from n = 180 till n = 4488. In all cases, the N-point representa-
tion grid with fixed N = 214 was used. This indicates that the numerical performance
of the algorithm is even better than the theoretical complexity O(nR2 ); see more nu-
merics in [27].

13.4 QTT approximation of DOS via Lorentzians: rank bounds


In what follows, we outline the perspectives of QTT approximation to DOS along the
line of the discussion in [27].
In the case of large grid size N, the number of representation parameters for
the corresponding high-order QTT tensor can be reduced to the logarithmic scale
𝒪(log N), which allows the QTT tensor interpolation of the target N-vector by using
only 𝒪(log N) ≪ N functional calls. We demonstrate how to apply this approximation
technique to long N-vectors representing the DOS sampled over the fine representa-
tion grid Ωh .
The QTT approximant can be viewed as the rank-structured interpolant to the
highly non-regular function ϕη regularizing the exact DOS. In this case, the appli-
cation of traditional polynomial or trigonometric-type interpolation is inefficient. We
apply the QTT approximation method to the DOS regularized by Lorentzians and sam-
pled on fine representation grid of size N = 2d .
In [27] it is shown that the QTT approach provides a good approximation to ϕη
on the whole spectral interval and requires only a moderate number of representation
210 | 13 Density of states for a class of rank-structured matrices

2
parameters rqtt log N ≪ N, where the average QTT rank rqtt is a small rank parameter
depending on the truncation error ϵ > 0.
In the following numerical examples, we use a sampling vector defined on a fine
grid of size N ≈ 214 . We fix the QTT truncation error to ϵQTT = 0.04 (if not explicitly in-
dicated). For ease of interpretation, we set the pre-factor in (13.1) equal to 1. It is worth
noting that the QTT-approximation scheme is applied to the full TDA spectrum. Our
results demonstrate that the QTT-approximant renders good resolution in the whole
range of energies (in eV), including large “zero gaps”.

Figure 13.5: DOS for H2 O molecule via Lorentzians (blue) and its QTT approximation (red) (left). Zoom
in the small energy interval (right).

Figure 13.5 (left) represents the TDA DOS (blue line) for the H2 O computed via the
Lorentzian blurring with the parameter η = 0.4 and the corresponding rank-9.4 QTT
tensor approximation (red line) to the discretized function ϕη (t). For this example,
the number of eigenvalues is given by n = NBSE /2 = 180. Figure 13.5 (right) provides
a zoom of the corresponding DOS and its QTT approximant within the small energy
interval [0, 40] eV.
This means that for a fixed η, the QTT-rank remains rather modest, relative to the
molecular size. This observation confirms the QTT ranks estimates in Section 13.6. The
moderate size of QTT ranks in Figure 13.5 clearly demonstrates the potential of QTT
interpolation for modeling DOS of large lattice-type clusters.
We observe several gaps in the spectral densities with complicated shapes (see
Figures 13.5 and 13.6) indicating that the polynomial, rational, or trigonometric inter-
polation can be applied only to a small energy sub-intervals, but not in the whole
interval [0, a]. It is remarkable that the QTT approximant resolves well the DOS func-
tion in the whole energy interval, including nearly zero values within the spectral gaps
(hardly possible for polynomial/rational based interpolation).
13.5 Interpolation of the DOS function by using the QTT format | 211

13.5 Interpolation of the DOS function by using the QTT format


In the previous section we demonstrated that the QTT tensor approximation provides
good resolution for the DOS function also observed for a number of larger molecules,
see [27]. In what follows, we sketch a tensor-based heuristic QTT interpolation of DOS
by using only an incomplete set of sampling points introduced in [27], that is, QTT
representation by adaptive cross approximation (ACA) [230, 256]. This allows us to
recover the spectral density with controllable accuracy by using M ≪ N interpolation
points, where asymptotically M scales logarithmically in the grid size N.
This heuristic approach can be viewed as a kind of an “adaptive QTT ε-interpola-
tion”. In particular, we demonstrate by numerical experiments that the low-rank QTT
adaptive cross interpolation provides a good resolution of the target DOS with the
number of functional calls that asymptotically scales logarithmically O(log N) in the
size of representation grid N; see Figure 13.7. In the case of large N, this beneficial
feature allows us to compute the QTT approximation by requiring much less than N
computationally expensive functional evaluations of ϕη (t).
The QTT interpolation via ACA tensor approximation serves to recover the rep-
resentation parameters of the QTT tensor approximant and requires asymptotically
about
2
M = Cs rqtt log2 N ≪ N (13.15)

samples of the target N-vector1 with a small pre-factor Cs , usually satisfying Cs ≤ 10,
that is independent of the fine interpolation grid size N = 2d ; see, for example, [183].
This cost estimate seems promising in the perspective of extended or lattice-type
molecular systems, requiring large spectral intervals and, as a result, a large inter-
polation grid of size N. Here, the QTT rank parameter rqtt naturally depends on the
required truncation threshold ε > 0, characterizing the L2 -error between the exact
DOS and its QTT interpolant. The QTT tensor interpolation adaptively reduces the
number of functional calls, that is, M < N, if the QTT rank parameters (or threshold
ε > 0) are chosen to satisfy condition (13.15). The expression on the right-hand side of
(13.15) provides a rather accurate estimate on the number of functional evaluations.
To complete this discussion, we present numerical tests on the low-rank QTT ten-
sor interpolation applied to the long vector discretizing the Lorentzian-DOS on large
representation grid.
Figure 13.6 represents the results of the QTT interpolating approximation to the
discretized DOS function for NH3 molecule. We use the QTT cross approximation al-
gorithm based on [167, 230, 256] and implemented in the MATLAB TT-toolbox [232].
Here, we set ε = 0.08, η = 0.1, and N = 214 , providing rQTT = 9.8, see [27] for more
numerical examples.

1 In our application, this is the functional N-vector corresponding to representation of DOS via matrix
resolvents in (13.6).
212 | 13 Density of states for a class of rank-structured matrices

Figure 13.6: QTT ACA interpolation of DOS for NH3 molecule (left) and its error on the whole spec-
trum.

Figure 13.7: DOS for H2 O via Lorentzians: the


number of functional calls for QTT cross approxi-
mation (blue) vs. the full grid size N (red).

Figure 13.7 (see [27]) illustrates the logarithmic increase in the number of samples re-
quired for the QTT interpolation of DOS (for the H2 O molecule) represented on the grid-
size N = 2d with the different quantics dimensions d = 11, 12, . . . , 16. The rank trunca-
tion threshold is chosen by ϵ = 0.05 and the regularization parameter is η = 0.2. In this
example the effective pre-factor in (13.15) is estimated by Cs ≤ 10. This pre-factor char-
2
acterizes the average number of samples required for the recovery of each of rqtt log N
representation parameters involved in the QTT tensor ansatz.
We observe that the QTT tensor interpolant recovers the complicated shape of the
exact DOS with a high precision. The logarithmic asymptotic complexity scaling M =
O(log N) (i. e., the number of functional calls required for the QTT tensor interpolation)
vs. the grid size N can be observed in Figure 13.7 (blue line) for large representation
grids.
13.6 Upper bounds on the QTT ranks of DOS function | 213

13.6 Upper bounds on the QTT ranks of DOS function


In this section, we sketch the analysis of the upper bounds on the QTT ranks of the dis-
cretized DOS obtained by Gaussian broadening presented in [27]. The numerical tests
indicate that Lorentzian blurring leads to a similar QTT rank compared with Gaussians
blurring when both are applied to the same grid, and the same truncation threshold
ε > 0 is used in the QTT approximation. For technical reasons, we consider the case
of symmetric spectral interval, that is, t, λj ∈ [−a, a].
Assume that the function ϕη (t) = n1 ∑nj=1 gη (t − λj ), t ∈ [−a, a], in equation (13.2) is
discretized by sampling over the uniform N-grid Ωh with N = 2d , where the generating
2
1
Gaussian is given by gη (t) = √2πη exp (− 2ηt 2 ). Denote the corresponding N-vector by
g = gη ∈ ℝN and the resulting discretized density vector by

1 n
ϕη (t) 󳨃→ p = pη = ∑ g ∈ ℝN ,
n j=1 η,j

where the shifted Gaussian is assigned by the vector gη (t − λj ) 󳨃→ gj = gη,j .


Without loss of generality, we suppose that all eigenvalues are situated within the
set of grid points, that is, λj ∈ Ωh . Otherwise, we can slightly relax their positions pro-
vided that the mesh size h is small enough. This is not a severe restriction for the QTT
approximation of functional vectors since storage and complexity requests depend
only logarithmically on N.

Lemma 13.5 ([27]). Assume that the effective support of the shifted Gaussians gη (t − λj ),
j = 1, . . . , n, is included in the computational interval [−a, a]. Then the QTT ε-rank of the
vector pη is bounded by

rankQTT (pη ) ≤ Ca log3/2 (|log ε|),

where the constant C = O(|log η|) > 0 depends only logarithmically on the regularization
parameter η.

Proof. The main argument of the proof is similar to that in [148, 68]: the sum of dis-
cretized Gaussians, each represented in Fourier basis, can be expanded with merely
the same number m0 of Fourier harmonics as the individual Gaussian function (uni-
form basis).
Given exponent parameter η, we first estimate the number of essential Fourier
coefficients of the Gaussian vectors gη,j ,

m0 = O(a|log η| log3/2 (|log ε|))

taking into account their exponential decay. Notice that m0 depends only logarithmi-
cally on η. Since each Fourier harmonic has the exact rank-2 QTT representation (see
Section 4.2), we arrive at the desired bound.
214 | 13 Density of states for a class of rank-structured matrices

A similar QTT rank bound can be derived for the case of Lorentzian blurred DOS.
Indeed, we observe that the Fourier transform of the Lorentzian in (13.3) is given
by [27],

ℱ (Lη (t)) = e .
−|k|η

This leads to the logarithmic bound in the number m0 of essential Fourier coefficients
in the Lorentzian vectors.

Table 13.1: QTT ranks of Lorentzians-DOS for TDA matrices of some molecules with parameters
ε = 0.04, η = 0.4, N = 16 384.

Molecule H2 O NH3 H2 O2 N2 H4 C2 H5 OH C2 H5 NO2 C3 H7 NO2

n = Nov 180 215 531 657 1430 3000 4488


QTT ranks 11 11 12 11 15 16 13

Table 13.1 shows that the average QTT tensor rank of Lorentzians-DOS for various TDA
matrices remains almost independent of the molecular size, which confirms previous
observations. The weak dependence of the rank parameter on the molecular geometry
can be observed.
14 Tensor-based summation of long-range
potentials on finite 3D lattices
In Chapter 9 we described the method for direct tensor summation of the electrostatic
potentials used for calculation of the nuclear potential operator for molecules [156],
which reduces the volume summation of the potentials to one-dimensional rank-
structured operations. However, the rank of the resulting canonical tensor increases
linearly in a number of potentials and this growth may become crucial for larger
multi-particle systems. Favorably, the tensor approach often suggests new concepts
to solution of classical problems.
In this chapter, we discuss the assembled tensor method for summation of the
long-range potentials on finite rectangular L × L × L lattices introduced recently by
the authors in [148, 153]. This technique requires only O(L) computational work for
calculation of the collective electrostatic potential of large lattice systems and O(L2 )
for computation of their interaction energy, instead of O(L3 log L) when using the tra-
ditional Ewald summation techniques. Surprisingly, the assembled tensor summation
technique does not increase the tensor rank: the rank of the tensor for collective po-
tential of large 3D lattice clusters equals to the rank of a single 3D reference potential.
The approach was initiated by our former numerical observations in [173, 146] that the
Tucker tensor rank of a sum of Slater potentials placed at nodes of a three-dimensional
finite lattice remains the same as the rank of a single Slater function.
1
A single three-dimensional potential function (electrostatic potential ‖x‖ or other
type interactions generated by radial basis function) sampled on a large N × N × N rep-
resentation grid in a bounding box is approximated with a guaranteed precision by a
low-rank Tucker/canonical reference tensor. This tensor provides the values of the dis-
cretized potential at any point of this fine auxiliary 3D grid, but needs only O(N) stor-
age. Then each 3D singular kernel function involved in the summation is represented
on the same grid by a shift of the reference tensor along lattice vectors. Directional
vectors of the Tucker/canonical tensor defining a full lattice sum are assembled by the
1D summation of the corresponding univariate skeleton vectors specifying the shifted
tensor. The lattice nodes are not required to exactly coincide with the grid points of
the global N × N × N representation grid since the accuracy of the resulting tensor sum
is well controlled due to easy availability of large grid size N (e. g., fine resolution).
The key advantage of the assembled tensor method is that the summation of po-
tentials is implemented within the skeleton vectors of the generating canonical tensor,
thus not affecting the resulting tensor rank; the number of canonical vectors repre-
senting the total tensor sum remains the same as for a single reference kernel. For a
sum of electrostatic potentials over L × L × L lattice embedded in a box, the required
storage scales linearly in the one-dimensional grid-size, that is, as O(N), whereas the
numerical cost is estimated by O(NL). The important benefit of this summation tech-

https://doi.org/10.1515/9783110365832-014
216 | 14 Tensor-based summation of long-range potentials on finite 3D lattices

nique is that the resultant low-rank tensor representation of the total sum of potentials
can be evaluated at any grid point at the cost O(1).
In the case of periodic boundary conditions, the tensor approach leads to further
simplifications. Indeed, the respective lattice summation is reduced to 1D operations
on short canonical vectors of size n = N/L, which is the restriction (projection) of the
global N-vectors onto the unit cell. Here, n denotes merely the number of grid points
per unit cell. In this case, storage and computational costs are reduced to O(n) and
O(Ln), respectively, whereas the traditional FFT-based approach scales at least cubi-
cally in L, O(L3 log L), and in N. Notice that due to low cost of the tensor method at
the limit of large lattice size L, the conditionally convergent sums in periodic setting
can be regularized by subtraction of the constant term, which can be evaluated nu-
merically by the Richardson extrapolation on a sequence of lattice parameters L, 2L,
4L, etc. (see Section 14.3). Hence, in the new framework, the analytic treatment of the
conditionally convergent sums is no longer required.
We notice that the numerical treatment of long-range potentials in large lattice-
type systems was always considered as a computational challenge (see [72, 235, 199]
and [295, 207, 47, 208, 253]). Tracing back to Ewald summation techniques [79], the
development of lattice-sum methods has led to a number of established algorithms
for evaluating long-range electrostatic potentials of multiparticle systems; see for ex-
ample [58, 236, 278, 139, 63, 205] and references therein. These methods usually com-
bine the original Ewald summation approach with the fast Fourier transform (FFT).
The commonly used Ewald summation algorithms [79] are based on a certain specific
local-global analytical decomposition of the interaction potential. In the case of elec-
trostatic potentials, the Newton kernel is represented by
1 τ(r) 1 − τ(r)
= + ,
r r r
where the traditional choice of the cutoff function τ is the complementary error func-
tion

2
τ(r) = erfc(r) := ∫ exp(−t 2 )dt.
√π
r

The Ewald summation techniques were shown to be particularly attractive for comput-
ing the potential energies and forces of many-particle systems with long-range inter-
action potential in periodic boundary conditions. They are based on the spacial sepa-
ration of a sum of potentials into two parts, then the short-range part is treated in the
real space, and the long-range part (whose sum converges in the reciprocal space) re-
quires grid-based FFT calculations with unreducible O(L3 log L) computational work.
It is worth noting that the presented tensor method is applicable to the lattice
sums generated by rather general class of radial basis functions, which allow an ef-
ficient local-plus-separable approximation. In particular, along with Coulombic sys-
tems, it can be applied to a wide class of commonly used interaction potentials, for
14.1 Assembled tensor summation of potentials on finite lattices | 217

example, to the Slater, Yukawa, Stokeslet, Lennard-Jones, or van der Waals interac-
tions. In all these cases, the existence of low-rank grid-based tensor approximation
can be proved and this approximation can be constructed numerically by analytic-
algebraic methods as in the case of the Newton kernel; see the detailed discussion in
[153, 171].
The tensor approach is advantageous in other functional operations with the lat-
tice potential sums represented on a 3D grid such as integration, differentiation, or
force and energy calculations using tensor arithmetics of 1D complexity [174, 146, 152,
240, 24]. Notice that the summation cost in the Tucker/canonical formats O(L N) can
be reduced to the logarithmic scale in the grid size O(L log N) by using the low-rank
quantized tensor approximation (QTT) [167] of long canonical/Tucker vectors as it was
suggested and analyzed in [148].

14.1 Assembled tensor summation of potentials on finite lattices


In this section, following [148], we present the efficient scheme for fast assembled ten-
sor summation of electrostatic potentials for a finite 3D lattice system in a box. Given
the unit reference cell Ω = [−b/2, b/2]d , d = 3, of size b × b × b, we consider an interac-
tion potential in a bounded box

ΩL = B1 × B2 × B3

consisting of a union of L1 × L2 × L3 unit cells Ωk , obtained by a shift of Ω that is a


multiple of b in each variable, and specified by the lattice vector bk, k = (k1 , k2 , k3 ) ∈
ℤd , 0 ≤ kℓ ≤ Lℓ − 1 for Lℓ ∈ ℕ (ℓ = 1, 2, 3). Here, Bℓ = [−b/2, b/2 + (Lℓ − 1)b], so that the
case Lℓ = 1 corresponds to one-layer systems in the variable xℓ . Recall that b = nh by
construction, where h > 0 is the mesh-size (same for all spacial variables).
In the case of an extended system in a box, the summation problem for the total
potential vcL (x) is formulated in the rectangular volume ΩL = ⋃Lk1 ,k2 ,k3 =1 Ωk , where for
ease of exposition, we consider a lattice of equal sizes L1 = L2 = L3 = L. For imple-
mentational reasons, the computational volume box is chosen slightly larger than ΩL
by a size of several Ω (see (14.3) and Figure 14.1). On each Ωk ⊂ ΩL the potential sum
of interest vk (x) = (vcL )|Ωk is obtained by summation over all unit cells in ΩL ,

L−1 M0

vk (x) = ∑ ∑ , x ∈ Ωk , (14.1)
k1 ,k2 ,k3 =0 ν=1
‖x − aν (k1 , k2 , k3 )‖

where aν (k1 , k2 , k3 ) = aν + bk. This calculation is performed at each of L3 elementary


cells Ωk ⊂ ΩL , which presupposes substantial numerical costs for large L. In the pre-
sented approach, these costs are essentially reduced as described further.
Figure 14.1 shows an example of a computational box with a 3D lattice-type molec-
ular structure of 6 × 4 × 6 atoms.
218 | 14 Tensor-based summation of long-range potentials on finite 3D lattices

Figure 14.1: Rectangular 6 × 4 × 6 lattice in a box.

Let ΩNL be the NL × NL × NL uniform grid on ΩL with the same mesh-size h as above,
and introduce the corresponding space of piecewise constant basis functions of the
dimension NL3 . In this construction, we have

NL = n + n(L − 1) = Ln. (14.2)

In practice, the computational box ΩL and the grid size NL can be taken larger than
(14.2) by some “dummy” distance with the grid size N0 , so that

NL = Ln + 2N0 . (14.3)

Similarly to (11.9), we employ the rank-R reference tensor defined on the auxiliary box
̃ by scaling Ω with factor 2,
Ω L L

R
̃ L,R = ∑ p(1) ⊗ p(2) ⊗ p(3) ∈ ℝ2NL ×2NL ×2NL ,
P (14.4)
q q q
q=1

and let 𝒲ν(ki ) , i = 1, 2, 3, be the directional windowing operators associated with the
lattice vector k. The next theorem proves the storage and numerical costs for the lat-
tice sum of single potentials, each represented by a canonical rank-R tensor, which
corresponds to the choice of M0 = 1 and a1 = 0 in (14.1). The ΩL -windowing operator
𝒲 = 𝒲(k) (tracing onto NL × NL × NL window) is rank-1 separable:

𝒲(k) = 𝒲(k ) ⊗ 𝒲(k ) ⊗ 𝒲(k


(1) (2) (3)
1 2 3)

specifying the shift by the lattice vector bk.

Theorem 14.1 ([148]). Given a canonical rank-R tensor representation of a single long-
range potential (14.4), the projected tensor of the interaction potential vcL (x), x ∈ ΩL ,
representing the collective potential sum over L3 charges of a rectangular lattice, can be
14.1 Assembled tensor summation of potentials on finite lattices | 219

presented by the canonical tensor PcL with the same rank R:

R L−1 L−1 L−1


PcL = ∑ ( ∑ 𝒲(k1 ) p(1)
q ) ⊗ ( ∑ 𝒲(k2 ) pq ) ⊗ ( ∑ 𝒲(k3 ) pq ).
(2) (3)
(14.5)
q=1 k1 =0 k2 =0 k3 =0

The numerical cost and storage size are estimated by O(RLNL ) and O(RNL ), respectively,
where NL is the univariate grid size as in (14.2).

Proof. We fix the index ν = 1 in (14.1) and consider only the second sum defined on
the complete domain ΩL ,
L−1
Z
vcL (x) = ∑ , x ∈ ΩL . (14.6)
k1 ,k2 ,k3 =0
‖x − bk‖

Then the projected tensor representation of vcL (x) takes the form (setting Z = 1)

L−1 L−1 R
NL ×NL ×NL
PcL = ∑ 𝒲ν(k) PL,R = ∑ q ⊗ pq ⊗ pq ) ∈ ℝ
∑ 𝒲(k) (p(1) (2) (3)
,
k1 ,k2 ,k3 =0 k1 ,k2 ,k3 =0 q=1

where p(ℓ)
q , ℓ = 1, 2, 3, are vectors of the reference tensor (14.4) and the 3D shift vector
is defined by k ∈ ℤL×L×L . Now, the above summation can be represented by
R L−1
PcL = ∑ ∑ 𝒲(k1 ) pq ⊗ 𝒲(k2 ) pq ⊗ 𝒲(k3 ) pq .
(1) (2) (3)
(14.7)
q=1 k1 ,k2 ,k3 =0

To simplify the large sum over the full 3D lattice, we use the following property of a
sum of canonical tensors with equal ranks R and with two coinciding factor matri-
ces: the concatenation in the third mode ℓ can be reduced to point-wise summation
(“assembling”) of the respective canonical vectors

C (ℓ) = [a(ℓ)
1 + b1 , . . . , aR + bR ],
(ℓ) (ℓ) (ℓ)
(14.8)
a b

thus preserving the same rank parameter R for the resulting sum. Notice that, for each
fixed q, the inner sum in (14.7) satisfies the above property. By repeatedly applying this
property to all canonical tensors for q = 1, . . . , R, the 3D sum (14.7) can be simplified to
a rank-R tensor obtained by 1D summations only:
R L−1 L−1
PcL = ∑ ( ∑ 𝒲(k1 ) p(1)
q ) ⊗ ( ∑ 𝒲(k2 ) pq ⊗ 𝒲(k3 ) pq )
(2) (3)

q=1 k1 =0 k2 ,k3 =0
R L−1 L−1 L−1
= ∑ ( ∑ 𝒲(k1 ) p(1)
q ) ⊗ ( ∑ 𝒲(k2 ) pq ) ⊗ ( ∑ 𝒲(k3 ) pq ).
(2) (3)

q=1 k1 =0 k2 =0 k3 =0

The cost can be estimated by following the standard properties of canonical tensors.
220 | 14 Tensor-based summation of long-range potentials on finite 3D lattices

Figure 14.2: Assembled canonical vectors for a sum of electrostatic potentials for a cluster of 20 ×
30 × 4 Hydrogen atoms in a rectangular box of size ∼55.4 × 33.6 × 22.4 au3 . Top left-right: vectors in x-
and y-axes, respectively; bottom left: vectors along z-axis. Bottom right: the resulting sum of 2400
nuclei potentials at the middle cross-section with z = 11.2 au.

Remark 14.2. For the general case M0 > 1, the weighted summation over M0 charges
leads to the low-rank tensor representation, that is, rank(PcL ) ≤ M0 R, and

M0 R L−1 L−1 L−1


PcL = ∑ Zν ∑ ( ∑ 𝒲ν(k1 ) p(1)
q ) ⊗ ( ∑ 𝒲ν(k2 ) pq ) ⊗ ( ∑ 𝒲ν(k3 ) pq ).
(2) (3)
(14.9)
ν=1 q=1 k1 =0 k2 =0 k3 =0

The previous construction applies to the uniformly spaced positions of charges.


However, our tensor summation method remains valid for a non-equidistant L × L × L
tensor lattice.
Here we sketch some numerical examples presented in [148]. Figure 14.2 illus-
trates the shape of assembled canonical vectors for the 32 × 16 × 8 lattice sum in a
box (summation of 4096 potentials). Here, the canonical rank is R = 25, and ε = 10−6 .
It demonstrates how the assembled vectors composing the tensor lattice sum incor-
porate simultaneously the canonical vectors of shifted Newton kernels. It can be seen
that canonical vectors capture the local, intermediate, and long-range contributions
to the total sum. Figure 14.3 represents agglomerated canonical vectors in x-, y-, and
14.1 Assembled tensor summation of potentials on finite lattices | 221

Figure 14.3: Assembled canonical vectors in x-, y-, and z-axes for a sum of 1 572 864 nuclei poten-
tials.

Figure 14.4: CPU times (log scaling) for cal-


culating the sum of Coulomb potentials over
3D L × L × L lattice by using direct canonical
tensor summation (blue line) and assem-
bled lattice summation (red line).

z-axes for a sum of 1 572 864 nuclei potentials for a cluster of 192 × 128 × 64 Hydrogen
atoms in a box of size ≈ 19.8 × 13.4 × 7 nm3 .
The canonical tensor representation (14.5) reduces dramatically the numerical
costs and storage consumptions. Figure 14.4 compares the direct and assembled tensor
summation methods (grid-size of a unit cell, n = 256). Contrary to the direct canonical
summation of the nuclear potentials on a 3D lattice, which scales at least linearly in
the size of the cubic lattice as NL3 L3 (blue line), the CPU time for directionally agglom-
erated canonical summation in a box via (14.5) scales as NL L (red line).
Table 14.1 presents the times for assembled computation of the sum of potentials
positioned in nodes of L × L × L lattice clusters. Approximate sizes of finite clusters are
given in nanometers. This table shows that computation time for the tensor approach
scales logarithmically in the cluster size. We refer to [148] for the more detailed pre-
sentation of numerical experiments.
Figure 14.5 compares the tensor sum obtained by the assembled canonical vec-
tors with the results of direct tensor sum for the same configuration. The absolute dif-
ference of the corresponding sums for a cluster of 16 × 16 × 2 cells (here a cluster of
512 Hydrogen atoms) is close to machine accuracy ∼10−14 .
222 | 14 Tensor-based summation of long-range potentials on finite 3D lattices

Table 14.1: CPU times (sec) vs. the lattice size for the assembled calculation of their sum PcL over the
L × L × L clusters. Approximate sizes of finite clusters are given in nanometers.

L 32 64 128 256
Total L3 32 768 262 144 2 097 152 16 777 216
Cluster size 3.83 73 13.43 26.23
Summation time (sec) 0.2 0.27 0.83 3.87

Figure 14.5: Left: The electrostatic potential of the cluster of 16 × 16 × 2 Hydrogen atoms in a box
(512 atoms). Right: the absolute error of the assembled tensor sum on this cluster by (14.5) with
respect to the direct tensor summation (11.10).

14.2 Assembled summation of lattice potentials in Tucker tensor


format
Similar to (14.4), we introduce the rank-r “reference” Tucker tensor T̃ L,r ∈ ℝ2NL ×2NL ×2NL
defined on the auxiliary domain Ω ̃ .
L
The following theorem provides theoretical background for the fast tensor meth-
ods of grid-based computation of the large sum of long-range potentials on a 3D lattice.
It generalizes [148, Theorem 3.1], which was applied to the Newton kernel, to a rather
general class of functions p(‖x‖) in (14.1) and to the case of Tucker tensor decompo-
sitions. It justifies the low storage and numerical costs for the total potential sum in
terms of lattice size.
̃ L,r ∈ ℝ2NL ×2NL ×2NL ap-
Theorem 14.3 ([153]). Given the rank-r “reference” Tucker tensor T
proximating the potential function p(‖x‖), the rank-r Tucker approximation of a lattice-
sum vcL can be computed in the form

r
TcL = ∑ bm ( ∑ 𝒲(k1 )̃t(1)
m1 ) ⊗ ( ∑ 𝒲(k2 ) tm2 ) ⊗ ( ∑ 𝒲(k3 ) tm3 ).
̃(2) ̃(3) (14.10)
m=1 k1 ∈𝒦 k2 ∈𝒦 k3 ∈𝒦

The numerical cost and storage are estimated by O(3rLNL ) and O(3rNL ), respectively.
14.2 Assembled summation of lattice potentials in Tucker tensor format | 223

Proof. We apply a similar argument as in Theorem 14.1 to obtain

TcL = ∑ ̃ L,r
𝒲(k) T
k1 ,k2 ,k3 ∈𝒦
r
= ∑ bm ( ∑ 𝒲(k1 )̃t(1)
m1 ) ⊗ ( ∑ 𝒲(k2 ) tm2 ) ⊗ ( ∑ 𝒲(k3 ) tm3 ).
̃(2) ̃(3)
m=1 k1 ∈𝒦 k2 ∈𝒦 k3 ∈𝒦

Simple complexity estimates complete the proof.


Figure 14.6 illustrates the shape of several assembled Tucker vectors obtained by
assembling vectors ̃t(1)
m1 along x1 -axis. It can be seen that assembled Tucker vectors
simultaneously accumulate the contributions of all single potentials involved in the
total sum. Note that the assembled Tucker vectors do not preserve the initial orthogo-
nality of directional vectors {̃t(ℓ)
mℓ }. In this case the simple Gram–Schmidt orthogonal-
ization can be applied. Next remark generalizes Theorems 14.1 and 14.3.

m1 and tm1 along the x- and y-axes, respectively, for


Figure 14.6: Assembled Tucker vectors by using ̃t(1) ̃(2)
a sum over 16 × 8 × 1 lattice.

Remark 14.4. In the general case M0 > 1, the weighted summation over M0 charges
leads to the rank-Rc canonical tensor representation on the “reference” domain Ω ̃ ,
L
which can be used to obtain the rank-Rc representation of a sum in the whole L × L × L
lattice (cf. Remark 14.2 and Theorem 14.1):

Rc
PcL = ∑ ( ∑ 𝒲(k1 ) p q ) ⊗ ( ∑ 𝒲(k2 ) pq ) ⊗ ( ∑ 𝒲(k3 ) pq ).
̃ (1) ̃ (2) ̃ (3) (14.11)
q=1 k1 ∈𝒦 k2 ∈𝒦 k3 ∈𝒦

Likewise, the rank-rc Tucker approximation of a lattice potential sum vcL can be com-
puted in the form [153]
r0
TcL = ∑ bm ( ∑ 𝒲(k1 )̃t(1)
m1 ) ⊗ ( ∑ 𝒲(k2 ) tm2 ) ⊗ ( ∑ 𝒲(k3 ) tm3 ).
̃(2) ̃(3) (14.12)
m=1 k1 ∈𝒦 k2 ∈𝒦 k3 ∈𝒦
224 | 14 Tensor-based summation of long-range potentials on finite 3D lattices

Table 14.2: MATLAB calculations: time (sec.) vs. the total number of potentials L3 for the assem-
bled Tucker representation of the lattice sum TcL on the fine NL × NL × NL grid with the mesh size
h = 0.0034 Å.

L3 163 323 643 1283 2563 5123


Time 0.33 1.25 5.22 19.56 85.17 439.9
NL3 56323 97283 17 9203 34 3043 67 0723 132 6083

Table 14.3: Times in MATLAB for computation of the 3D FFT for a sequence of n3 grids. Times for grids
n ≥ 2048 are estimated by extrapolation.

n3 5123 10243 20483 40963 81923 16 3843


FFT3 5.4 51.6 ∼500 ∼1 hour ∼10 hours ∼100 hours

The previous construction applies to the uniformly spaced positions of charges. How-
ever, the agglomerated tensor summation method in both canonical and Tucker for-
mats applies, with slight modification of the windowing operator, to a non-equidistant
L1 × L2 × L3 tensor lattice. Such lattice sums cannot be treated by the traditional Ewald
summation methods based on the FFT transform.
Both the Tucker and canonical tensor representations (14.10) and (14.5) reduce
dramatically the numerical costs and storage consumptions.1 Table 14.2 illustrates
complexity scaling O(NL L) for computation of L×L×L lattice sum in the Tucker format.
We observe the increase of CPU time in a factor of 4 as the lattice size doubles, con-
firming our theoretical estimates. For comparison, in Table 14.3 we present the CPU
time (sec.) for 3D FFT transform, see [153] where the initial numerical examples have
been presented.
Figure 14.7 shows the sum of Newton kernels on a lattice 8 × 4 × 1 and the respec-
tive Tucker summation error achieved with the rank r = (16, 16, 16) Tucker tensor de-
fined on the large 3D representation grid with the mesh size about 0.002 atomic units
(0.001 Å). Figure 14.8 represents the Tucker vectors obtained from the canonical-to-
Tucker (C2T) approximation of the assembled canonical tensor sum of potentials on
an 8 × 4 × 1 lattice. In this case, the Tucker vectors are orthogonal.

14.3 Assembled tensor sums in a periodic setting


Here we sketch the results in [148, 153]. In the periodic case, we introduce the periodic
cell ℛ = bℤd , d = 1, 2, 3, and consider a 3D T-periodic supercell of size T × T × T with

1 Note that the total number of potentials on a lattice 2563 is more than 16 millions. The cluster size
in every space dimension is 2 (256 + 6) = 516 au, or ∼26 nanometers. (Here 2 au is the inter-atomic
distance, and 6 is the gap between the lattice and the boundary of a box.)
14.3 Assembled tensor sums in a periodic setting | 225

Figure 14.7: Left: Sum of Newton potentials on an 8 × 4 × 1 lattice generated in a volume with the 3D
grid of size 14 336 × 10 240 × 7168. Right: the absolute approximation error (about 8 ⋅ 10−8 ) of the
rank-r Tucker representation.

Figure 14.8: Several mode vectors from the C2T approximation visualized along x-, y-, and z-axes for
a sum on a 16 × 8 × 4 lattice and the resulting 3D potential (the cross-section at level z = 0).
226 | 14 Tensor-based summation of long-range potentials on finite 3D lattices

T = bL. The total electrostatic potential in ΩL is obtained by the respective summation


over the supercell ΩL for possibly large L. Then the electrostatic potential in any of
T-periods is obtained by replication of the respective data from ΩL .
The potential sum vcL (x) is designated at each elementary unit-cell in ΩL by the
same value (k-translation invariant). Consider the case d = 3. Supposing for simplicity
that L is odd, L = 2p + 1, the reference value of v0 (x) = vcL (x) will be computed at the
central cell x ∈ Ω0 , indexed by (p+1, p+1, p+1), by summation over all the contributions
from L3 elementary sub-cells in ΩL :
M0 L

v0 (x) = ∑ ∑ , x ∈ Ω0 . (14.13)
ν=1 k1 ,k2 ,k3 =1
‖x − aν (k1 , k2 , k3 )‖

Now the discretized potential can be computed as a tensor sum in (14.11).

Lemma 14.5 ([153]). The discretized potential vcL for the full sum over M0 charges can
be presented by rank-(M0 R) canonical tensor. The computational cost is estimated by
O(M0 RnL), whereas the storage size is bounded by O(M0 Rn).

Figure 14.9 (left) shows the assembled canonical vectors for a lattice structure in
a periodic setting. Recall that in the limit of large L the lattice sum PcL of the Newton
kernels is known to converge only conditionally. The same is true for a sum in a box.
The maximum norm increases as

C1 log L, C2 L, and C3 L2 (14.14)

for 1D, 2D, and 3D sums, respectively; see [153] for more detail. This issue is of spe-
cial significance in the periodic setting dealing with the limiting case L → ∞. In the
traditional Ewald-type summation techniques the regularization of lattice sums is im-
plemented by subtraction of the analytically precomputed constants describing the
asymptotic behavior in L.
To approach the limiting case, in our method, we compute PcL on a sequence of
large parameters L, 2L, 4L, etc. and then apply the Richardson extrapolation as de-
scribed in the following. As result, we obtain the regularized tensor p ̂ L obtained by

Figure 14.9: Periodic canonical vectors in the L × 1 × 1 lattice sum L = 16 (left). Regularized potential
sum p̂ L vs. m with L = 2m for L × L × 1 (middle) and L × L × L lattice sums (right).
14.4 QTT ranks of the assembled canonical vectors in the lattice sum | 227

subtraction of the leading terms in (14.14) and restricted to the reference unit cell Ω0 .
Denoting the target value of the potential by pL , the extrapolation formulas for the
linear (d = 2) and quadratic (d = 3) behavior take form

p
̂ L := 2pL − p2L and p
̂ L := (4pL − p2L )/3,

respectively.
The effect of Richardson extrapolation is illustrated in Figure 14.9. This figure in-
dicates that the potential sum computed at the same point as for the previous example
(in the case of L × L × 1 and L × L × L lattices) converges to the limiting values of p
̂ L after
applying the Richardson extrapolation (regularized sum).

14.4 QTT ranks of the assembled canonical vectors in the lattice


sum
Assembled canonical vectors in the rank-R tensor representation (14.5) are defined
over large uniform grid of size NL . Hence, the numerical cost for evaluating each of
these 3R vectors scales as O(NL L), which might become too expensive for large L (recall
that NL = nL scales linearly in L). Using the QTT approximation [167], this cost can be
reduced to the logarithmic scale in NL , whereas the storage need will become O(log NL )
only. The QTT-rank estimates are based on three main ingredients:
– the global canonical tensor representation of 1/‖x‖, x ∈ ℝ3 , on a supercell [111, 30];
– QTT approximation to the Gaussian function (Proposition 14.6); and
– the rank estimate for the block QTT decomposition (Lemma 14.7).

The next statement presents the QTT-rank estimate for Gaussian vector obtained by
x2
uniform sampling of e on the finite interval [68]; see also Section 4.2.

2p2

Proposition 14.6. For the given the uniform grid −a = x0 < x1 < ⋅ ⋅ ⋅ < xN = a, xi =
−a+hi, N = 2L on an interval [−a, a], and the vector g = [gi ] ∈ ℝN defined by its elements
xi2 a2
gi = e 2p2 , i = 0, . . . , N − 1, and fixed ε > 0, assume that e 2p2 ≤ ε. Then there exists the
− −

QTT approximation gr of the accuracy ‖g − gr ‖∞ ≤ cε, with the ranks bounded by

p
rankQTT (gr ) ≤ c log( ),
ε

where c does not depend on a, p, ε, or N.

The next lemma proves the important result that the QTT-rank of a weighted sum
of regularly shifted bumps (see for example Figure 14.9, left) does not exceed the prod-
uct of the QTT-rank of an individual sample and the weighting factor.
228 | 14 Tensor-based summation of long-range potentials on finite 3D lattices

Lemma 14.7 ([153]). Let N = 2L with L = L1 + L2 , where L1 , L2 ≥ 1, and assume that


the index set I := {1, 2, . . . , N} is split into n2 = 2L2 equal non-overlapping subintervals
n2
I = ⋃k=1 Ik , each of length n1 = 2L1 . Given n1 -vector x0 that obeys the rank-r0 QTT repre-
sentation, define N-vectors xk , k = 1, . . . , L2 , as

x (:) for i ∈ Ik ,
xk (i) = { 0 (14.15)
0 for i ∈ I \ Ik ,

and denote x = x1 + ⋅ ⋅ ⋅ + xL2 . Then for any choice of N-vector f, we have

rankQTT (f ⊙ x) ≤ rankQTT (f)r0 .

Notice that Lemma 14.7 provides a constructive algorithm and rigorous proof of
the low-rank QTT decomposition for certain class of Bloch functions [37] and Wannier-
type functions.
Figure 14.10 (left) illustrates shapes of the assembled canonical vectors modulated
by a sin-harmonics.

Figure 14.10: Canonical vectors of the lattice sum modulated by a sin-function (left). Right:
QTT-ranks of the canonical vectors of a 3D Newton kernel discretized on a cubic grids of size
n3 = 16 3843 , 32 7683 , 65 5363 , and 131 0723 .

The following Lemma estimates the bounds for the average QTT ranks of the assem-
bled vectors in PcL in a periodic setting.

Lemma 14.8 ([153]). For given tolerance ε > 0, suppose that the set of Gaussian func-
2 2
tions S := {gk = e−tk ‖x‖ }, k = 0, 1, . . . , M, representing canonical vectors in tensor decom-
2 2 2 2
position PR , is specified by parameters in (6.3) and set e−tk ‖x‖ = e−‖x‖ /2pk . Let us split
the set S into two subsets S = Sloc ∪ Sglob such that

Sloc := {gk : aε (gk ) ≤ b} and Sglob = S \ Sloc ,


14.4 QTT ranks of the assembled canonical vectors in the lattice sum | 229

where aε (gk ) = √2pk log1/2 (1/ε). Then the QTT-rank of each canonical vector vq ,
q = 1, . . . , R in (14.5), where R = M + 1, corresponding to Sloc , obeys the uniform in L
rank bound

rQTT ≤ C log(1/ε).

For vectors in Sglob , we have the rank estimate

rQTT ≤ C log(L/ε).

Proof. In our notation, we have 1/(√2pk ) = tk = (k log M)/M, k = 1, . . . , M (k = 0 is the


trivial case). We omit the constant factor √2 to obtain pk = M/(k log M).
a2
For functions gk ∈ Sloc , the condition e ≤ ε implies

2p2

O(1) = b ≥ aε (gk ) = √2pk log1/2 (1/ε),

justifying the uniform bound pk ≤ C, and then the rank estimate rQTT ≤ C log(1/ε) in
view of Proposition 14.6. Now we apply Lemma 14.7 to obtain the uniform in L rank
bound.
For globally supported functions in Sglob , we have bL ≥ aε ≃ pk log1/2 (1/ε) ≥ b.
Hence, we consider all these functions on the maximal support of the size of super-cell
bL and set a = bL. Using the trigonometric representation as in the proof of Lemma 2
2 2
in [68], we conclude that for each fixed k, the shifted Gaussians gk,ℓ (x) = e−tk ‖x−ℓb‖
(ℓ = 1, . . . , L) can be approximated by shifted trigonometric series
M
−π
2 m2 p2
πm(x − bℓ)
Gr (x − bℓ) = ∑ Cm pe 2a2 cos( ), a = bL,
m=0
a

which all have the common trigonometric basis containing about


pk bL
rankQTT (Gr ) = O(log( )) = O(log( ))
ε ε
terms. Hence, the sum of shifted Gaussian vectors over L unit cells will be approx-
imated with the same QTT-rank bound as each individual term in this sum, which
proves the assertion.

Based on the previous statements, we arrive at the following result.

Theorem 14.9 ([153]). The tensor representation of vcL for the full lattice sum generated
by a single charge can be presented by the rank-R QTT-canonical tensor
R L L L
PcL = ∑ (𝒬 ∑ 𝒲ν(k1 ) p(1)
q ) ⊗ (𝒬 ∑ 𝒲ν(k2 ) pq ) ⊗ (𝒬 ∑ 𝒲ν(k3 ) pq ),
(2) (3)
(14.16)
q=1 k1 =1 k2 =1 k3 =1

where 𝒬p(ℓ)
q denotes the QTT tensor approximations of the canonical vector pq . Here
(ℓ)

the QTT-rank of each canonical vector is bounded by rQTT ≤ C log(L/ε). The computa-
3
tional cost and storage are estimated by O(RLrQTT ) and O(R log2 (L/ε)), respectively.
230 | 14 Tensor-based summation of long-range potentials on finite 3D lattices

Figure 14.10, right, represents QTT-ranks of the canonical vectors of a single 3D


Newton kernel discretized on a large cubic grids.
Figures 14.10 and 14.11 demonstrate that the average QTT-ranks of the assembled
canonical vectors for k = 1, . . . , R scale logarithmically both in L and in the total grid-
size n = NL .

Figure 14.11: Left: QTT-ranks of the assembled canonical vectors vs. L for fixed grid size N3 = 16 3843 .
Right: Average QTT-ranks over R canonical vectors vs. log L for 3D evaluation of the L × 1 × 1 chain of
Hydrogen atoms on N × N × N grids, N = 2048, 4096, 8192, 16 384.

14.5 Summation of long-range potentials on 3D lattices with


defects
In this section, we describe a tensor method introduced in [149, 153] for fast summa-
tion of long-range potentials on 3D lattices with multiple defects, such as vacancies,
impurities, and in the case of hexagonal symmetries. The resulting lattice sum is calcu-
lated as a Tucker or canonical representation whose directional vectors are assembled
by the 1D summation of the generating vectors for the shifted reference tensor, once
precomputed on large N × N × N representation grid in a 3D bounding box.
For lattices with defects, the overall potential is obtained as an algebraic sum of
several tensors, each representing the contribution of a certain cluster of individual
defects. This leads to increase in the tensor rank of the resultant potential sum. For
rank reduction in the canonical format, the canonical-to-Tucker decomposition is ap-
plied based on the RHOSVD approximation [174]; see Section 3.3. For the RHOSVD
approximation to a sum of canonical/Tucker tensors, the stable error bounds in the
relative norm in terms of discarded singular values of the side matrices are proven
in [153].
14.5 Summation of long-range potentials on 3D lattices with defects | 231

14.5.1 Sums of potentials on defected lattices in canonical format

We consider the sum of canonical tensors on a lattice with defects located at S sources.
The canonical rank of the resultant tensor may increase at a factor of S. The effective
rank of the perturbed sum may be reduced by using the RHOSVD approximation via
Can 󳨃→ Tuck 󳨃→ Can algorithm (see [174]). This approach basically provides the com-
pressed tensor with the canonical rank quadratically proportional to those of the re-
spective Tucker approximation to the sum with defects.
Here, for the readers convenience, we recall shortly the basics of the RHOSVD and
C2T decomposition described in detail in Section 3.3.2. In what follows, we focus on
the stability conditions for RHOSVD approximation and their applicability in the sum-
mation of spherically symmetric interaction potentials. The canonical rank-R tensor
representation (2.13) can be written as the rank-(R, R, R) Tucker tensor by introducing
the diagonal Tucker core tensor ξ := diag{ξ1 , . . . , ξR } ∈ ℝR×R×R such that ξν1 ,ν2 ,ν3 = 0
except when ν1 = ⋅ ⋅ ⋅ = ν3 with ξν,...,ν = ξν , ν = 1, . . . , R (see Figure 3.12)
A = ξ ×1 A(1) ×2 A(2) ×d A(3) . (14.17)

Given the rank parameter r = (r1 , r2 , r3 ), to define the reduced rank-r HOSVD-type
Tucker approximation to the tensor in (2.13), we set nℓ = n and suppose for definiteness
that n ≤ R, so that SVD of the side-matrix A(ℓ) is given by
n
T T
A(ℓ) = Z (ℓ) Dℓ V (ℓ) = ∑ σℓ,k z(ℓ)
k
v(ℓ)
k
, z(ℓ)
k
∈ ℝn , v(ℓ)
k
∈ ℝR ,
k=1

with the orthogonal matrices Z = [z(ℓ) 1 , . . . , zn ] and V


(ℓ) (ℓ) (ℓ)
1 , . . . , vn ], ℓ = 1, 2, 3.
= [v(ℓ) (ℓ)

Given rank parameters r1 , . . . , rℓ < n, introduce the truncated SVD of the side-matrix
T
A(ℓ) , Z0(ℓ) Dℓ,0 V0(ℓ) (ℓ = 1, 2, 3), where Dℓ,0 = diag{σℓ,1 , σℓ,2 , . . . , σℓ,rℓ }, and Z0(ℓ) ∈ ℝn×rℓ and
V0 (ℓ) ∈ ℝR×rℓ represent the orthogonal factors being the respective sub-matrices in the
SVD factors of A(ℓ) . Here, we recall the definition of RHOSVD tensor approximation
(see Section 3.3): the RHOSVD approximation of A, further denoted as A0(r) , is defined
as the rank-r Tucker tensor obtained by the projection of A in the form (14.17) onto the
orthogonal matrices of the dominating singular vectors in Z0(ℓ) (ℓ = 1, 2, 3).
The stability of RHOSVD approximation is formulated in the following assertion.

Lemma 14.10 ([174]). Let the canonical decomposition (2.13) satisfy the stability condi-
tion R
∑ ξν2 ≤ C‖A‖2 . (14.18)
ν=1

Then the quasi-optimal RHOSVD approximation is robust in the relative norm


1/2
3 min(n,R)
‖A − A0(r) ‖ ≤ C‖A‖ ∑ ( ∑ 2
σℓ,k ) ,
ℓ=1 k=rℓ +1

where σℓ,k (k = rℓ + 1, . . . , n) denote the truncated singular values.


232 | 14 Tensor-based summation of long-range potentials on finite 3D lattices

Notice that the stability condition (14.18) is fulfilled, in particular, if


(a) All canonical vectors in (2.13) are non-negative, which is the case for sinc-quad-
rature based approximations to Green’s kernels based on integral transforms (6.8)–
(6.11), since ak > 0.
(b) The partial orthogonality of the canonical vectors holds, that is, rank-1 tensors
a(1)
ν ⊗ ⋅ ⋅ ⋅ ⊗ a(d)
ν (ν = 1, . . . , R) are mutually orthogonal. We refer to [192] for various
definitions of orthogonality for canonical tensors.

14.5.2 Tucker tensor format in summation on defected lattices

In this section, following [153], we analyze the assembled summation of Tucker/canon-


ical tensors on the defected lattices in the algebraic framework. Denote the perturbed
Tucker tensor by U. ̂ Let us introduce a set of k-indices on the lattice, 𝒮 =: {k1 , . . . , kS },
where the unperturbed Tucker tensor U0 := TcL initially given by summation over the
full rectangular lattice (14.10) is defected at positions associated with k ∈ 𝒮 by the
Tucker tensor Uk = Uks = Us (s = 1, . . . , S) given by

rs
Us = ∑ bs,m u(1)
s,m1 ⊗ us,m2 ⊗ us,m3 ,
(2) (3)
s = 1, . . . , S. (14.19)
m=1

Without loss of generality, all Tucker tensors Us (s = 0, 1, . . . , S) can be assumed or-


thogonal.
Now the perturbed Tucker tensor U ̂ is obtained from the non-perturbed one U0 by
adding a sum of all defects Uk , k ∈ 𝒮 :

S
̂ = U0 + ∑ Us ,
U0 󳨃→ U (14.20)
s=1

̂
which implies the simple upper rank estimates for best Tucker approximation of U,

S
̂rℓ ≤ r0,ℓ + ∑ rs,ℓ for ℓ = 1, 2, 3.
s=1

If the number of perturbed cells S is large enough, then the numerical computations
with the Tucker tensor of rank ̂rℓ become prohibitive, and the rank reduction proce-
dure is required.
In the case of Tucker sum (14.20), we define the assembled side matrices U ̂ (ℓ) by
concatenation of the directional side-matrices of individual tensors Us , s = 0, 1, . . . , S:

̂ (ℓ) = [u(ℓ) ⋅ ⋅ ⋅ u(ℓ) , u(ℓ) ⋅ ⋅ ⋅ u(ℓ) , . . . , u(ℓ) ⋅ ⋅ ⋅ u(ℓ) ] ∈ ℝn×(r0,ℓ +∑s=1,...,S rs,ℓ ) ,
U ℓ = 1, 2, 3.
1 r0,ℓ 1 r1,ℓ 1 rS,ℓ
(14.21)
̂ (ℓ) ,
Given the rank parameter r = (r1 , r2 , r3 ), introduce the truncated SVD of U
14.5 Summation of long-range potentials on 3D lattices with defects | 233

̂ (ℓ) ≈ Z (ℓ) Dℓ,0 V (ℓ) T ,


U Z0(ℓ) ∈ ℝn×rℓ , V0 (ℓ) ∈ ℝ(r0,ℓ +∑s=1,...,S rs,ℓ )×rℓ ,
0 0

where Dℓ,0 = diag{σℓ,1 , σℓ,2 , . . . , σℓ,rℓ }. Here, instead of fixed rank parameter, the trun-
cation threshold ε > 0 can be chosen.
The stability criteria for RHOSVD approximation, as in Lemma 14.10, allows nat-
ural extension to the case of generalized RHOSVD approximation applied to a sum of
Tucker tensors in (14.20).
The following theorem proven in [153] provides an error estimate for the general-
ized RHOSVD approximation, converting a sum of Tucker tensors to a single Tucker
tensor with fixed rank bounds or subject to the given tolerance ε > 0.

Theorem 14.11 (Tucker-sum-to-Tucker). Given a sum of Tucker tensors (14.20) and the
rank truncation parameter r = (r1 , . . . , rd ):
(a) Let σℓ,1 ≥ σℓ,2 ≥ ⋅ ⋅ ⋅ ≥ σℓ,min(n,R) be the singular values of the ℓ-mode side-matrix
̂ (ℓ) ∈ ℝn×R (ℓ = 1, 2, 3) defined in (14.21). Then the generalized RHOSVD approx-
U
imation U0(r) obtained by the projection of U ̂ onto the dominating singular vectors
T
̂ (ℓ) ≈ Z (ℓ) Dℓ,0 V (ℓ) exhibits the error estimate
Z0(ℓ) of the Tucker side-matrices U 0 0

min(n,̂
rℓ ) 1/2
d S
󵄩󵄩̂ 0 󵄩 ̂ ∑( ∑ 2 ̂ 2 = ∑ ‖Us ‖2 .
󵄩󵄩U − U(r) 󵄩󵄩󵄩 ≤ |U| σℓ,k ) , where |U| (14.22)
ℓ=1 k=rℓ +1 s=0

(b) Assume the stability condition ∑Ss=0 ‖Us ‖2 ≤ C‖U‖


̂ 2 for the sum (14.20). Then the
generalized RHOSVD approximation provides the quasi-optimal error bound

min(n,̂
rℓ ) 1/2
d
󵄩󵄩̂ 0 󵄩 ̂ ∑( ∑ 2
󵄩󵄩U − U(r) 󵄩󵄩󵄩 ≤ C‖U‖ σℓ,k ) .
ℓ=1 k=rℓ +1

The resultant Tucker tensor U0(r) can be considered as the initial guess for the ALS
iteration to compute best Tucker ε-approximation of a sum of Tucker tensors.
Figure 14.12 (left) visualizes the result of assembled Tucker summation of the
three-dimensional grid-based Newton potentials on a 16×16×1 lattice, with a vacancy
and impurity, each of 2 × 2 × 1 lattice size. Figure 14.12 (right) shows the corresponding
Tucker vectors along x-axis, which distinctly display the local shapes of vacancies
and impurities.

14.5.3 Numerical examples for non-rectangular and composite lattices

Though the rectangular structures with lattice-type vacancies and impurities are the
most representative structures in crystalline-type systems, in many practically inter-
esting cases, the physical lattice may have a non-rectangular geometry that does not
234 | 14 Tensor-based summation of long-range potentials on finite 3D lattices

Figure 14.12: Left: assembled grid-based Tucker sum of 3D Newton potentials on a lattice 16 × 16 × 1
with an impurity and a vacancy, both of size 2 × 2 × 1. Right: the Tucker vectors along x-axis.

fit exactly the tensor-product structure of the canonical/Tucker data arrays. For ex-
ample, the hexagonal or parallelepiped-type lattices can be considered. Here, follow-
ing [153], we discuss how to apply tensor summation methods to certain classes of
non-rectangular geometries and show a few numerical examples demonstrating the
required (minor) modifications of the basic assembled summation schemes.
It is worth noting that most of interesting lattice structures (say, arising in crys-
talline modeling) inherit a number of spacial symmetries, which allow us, first, to
classify and then simplify the computational schemes for each particular case of sym-
metry. In this regard, we mention the following class of lattice topologies, which can
be efficiently treated by our tensor summation techniques:
– The target lattice ℒ can be split into the union of several (few) sub-lattices ℒ =
⋃ ℒq such that each sub-lattice ℒq allows a 3D rectangular grid-structure.
– Defects in the target composite lattice may be distributed over rectangular sub-
domains (clusters) represented on a coarser scale. Numerically, it reduces to sum-
mation of tensors corresponding to each of ℒq .
– Defects in the target lattice are distributed over rectangular subdomains (clusters)
represented on a coarser scale.

For such lattice topologies, the assembled tensor summation algorithm applies inde-
pendently to each rectangular sub-lattice ℒq , and then the target tensor is obtained as
a direct sum of tensors associated with ℒq accomplished with the subsequent rank-
reduction procedure. The example of such a geometry is given by hexagonal lattice
presented in Figure 14.13 (rectangular in the third axis), which can be split into a union
of two rectangular sub-lattices ℒ1 (red) and ℒ2 (blue).
Numerically it is implemented by summation of two tensors via concatenation of
the canonical vectors corresponding to “blue” and “green” lattices, both living on the
same fine 3D Cartesian grid.
14.5 Summation of long-range potentials on 3D lattices with defects | 235

Figure 14.13: Hexagonal lattice is a union of two


rectangular lattices, “green” and “blue”.

Figure 14.14: Left: Sum of potentials over the hexagonal lattice of the type shown in Figure 14.13.
Right: rotated view.

The following numerical results basically reproduce those in [153]. Figures 14.14 (left
and right) show the resulting potential sum for the hexagonal lattice structure com-
posed of a sum of 7 × 7 × 1 “blue” and 7 × 7 × 1 “green” potentials. The rank of the tensor
representing the sum is two times larger than the rank of the single reference Newton
kernel.
In the case of regularly positioned vacancies, as in Figure 14.15, showing the result
of assembled canonical summation of the grid-based Newton potentials on a lattice
24 × 24 × 1 with 6 × 6 × 1 vacancies (two-level lattice), the resulting tensor rank is only
two times larger than the rank of a single Newton potential.
Figure 14.16 illustrates the situation when defects are located in a compact subdo-
main. It represents the result of assembled canonical sum of the Newton potentials on
L-shaped (left) and O-shaped (right) lattices. The resulting potentials sum for L-shape
lattice is a difference of a full 24 × 18 × 1 lattice and a sublattice of size 12 × 9 × 1. For
O-shape, the resultant tensor is obtained as the difference between the full lattice sum
over 12 × 12 × 1 and central 6 × 6 × 1 clusters. In both cases, the total canonical tensor
rank is two times larger than the rank of the single reference potential.
236 | 14 Tensor-based summation of long-range potentials on finite 3D lattices

Figure 14.15: Assembled summa-


tion of 3D grid-based Newton po-
tentials in canonical format on a
24 × 24 × 1 lattice with regular
6 × 6 × 1 vacancies.

Figure 14.16: Assembled summation of 3D grid-based Newton potentials in canonical format on a


24 × 18 × 1 lattice with L-shaped geometry (left) and O-shaped geometry 12 × 12 × 1 lattice with
subtracted vacancy sub-lattice 6 × 6 × 1.

For composite shapes of lattice geometries, one can use the canonical-to-Tucker trans-
form to reduce the canonical rank. In the case of complicated geometries, the Tucker
reference tensor for the Newton kernel may be preferable. For example, in the case of
O-shaped domain, the maximal Tucker rank of the resultant tensor is 25, whereas the
respective ranks for rectangular compounds are 17 and 15.
Since the lattice is not necessarily aligned with the 3D representation grid, it is
easy to assemble potentials centered independently on the lattice nodes, for example,
for modeling lattices with insertions having other inter-atomic displacements com-
pared with the main lattice. Figure 14.17 represents the result of assembled canonical
summation of 3D grid-based Newton potentials on a lattice 12 × 12 × 1 with an impurity
of size 2 × 2 × 1 with the interatomic distances different from the main lattice. Since the
impurity potentials are determined on the same fine NL × NL × NL representation grid,
14.6 Interaction energy of the long-range potentials on finite lattices | 237

Figure 14.17: Left: assembled canonical summation of 3D grid-based Newton potentials on a lattice
10 × 10 × 1 with an impurity of size 2 × 2 × 1. Right: the vertical projection.

variations in the inter-potential distances do not influence the numerical treatment of


defects.
We conclude that in all cases discussed above, the tensor summation approach
[153] can be gainfully applied. The overall numerical cost may depend on the geo-
metric structure and symmetries of the system under consideration, since violation
of the tensor-product rectangular structure of the lattice may lead to the increase in
the Tucker/canonical rank. This is clearly observed in the case of moderate number
of defects distributed randomly. In all such cases, the RHOSVD approximation, com-
bined with the ALS iteration, serves for the robust rank reduction in the Tucker format.
We also note that many of the crystalline-type structures belong to types of face-cubic
or hexagonal symmetries. Taking into account the facilities for easy numerical treat-
ments of multiple defects, we could expect that the tensor summation method of long-
range potentials can be used in a wide range of applications.
Finally we notice that our approach has the natural extension to the higher-
dimensional lattices in ℝd , d > 3, so that along with the canonical and Tucker tensors,
the TT tensor format [226] can be adapted.

14.6 Interaction energy of the long-range potentials on finite


lattices
Fast and accurate computation of the interaction energy of the long-range potentials
on finite lattices is one of the challenging tasks in computer modeling of macromolec-
ular structures such as quantum dots, nanostructures, and biological systems. In this
section, we recall the efficient scheme for tensor-based calculation of the interaction
energy of the long-range potentials on finite lattices proposed in [152]. For the nuclear
charges {Zk } centered at points xk , k ∈ 𝒦3 , which are located on the L × L × L lattice
ℒL = {xk } with the step-size b, the total interaction energy of these charges is defined
238 | 14 Tensor-based summation of long-range potentials on finite 3D lattices

as
1 Zk Zj
EL = ∑ , i. e., for ‖xj − xk ‖ ≥ b. (14.23)
2 k,j∈𝒦,k=j̸ ‖xj − xk ‖

Notice that local density approximation for long-range and short-range energy func-
tionals have been addressed in [279].
The tensor summation scheme can be directly applied to this computational prob-
lem. For this discussion, we assume that all charges are equal, that is, Zk = Z. First,
notice that the rank-R reference tensor h−3 P̃ defined in (14.4) approximates with high
2 1 ̃ (for ‖x‖ ≥ b that is required for the en-
accuracy O(h ) the Coulomb potential ‖x‖ in Ω L
ergy expression) on the fine 2n×2n×2n representation grid with mesh size h. Likewise,
the tensor h−3 PcL approximates the potential sum vcL (x) on the same fine representa-
tion grid including the lattice points xk .
We evaluate the energy expression (14.23) by using tensor sums as in (14.5), but
now applied to a small sub-tensor of the rank-R canonical reference tensor P, ̃ that is,
P ̃ |x ] ∈ ℝ2L×2L×2L obtained by tracing of P
̃ L := [P ̃ at the accompanying lattice of the
k
double size 2L × 2L × 2L, that is, ℒ ̃ . Here, P
̃L = {xk } ∪ {xk󸀠 } ∈ Ω ̃ |x denotes the tensor
L k
entry corresponding to the kth lattice point designating the atomic center xk .
We are interested in the computation of the rank-R tensor P ̂ c = [Pc ]k∈𝒦 ∈
L L |xk
ℝL×L×L , where PcL |x denotes the tensor entry corresponding to the kth lattice point
k
̂ c can be computed at the expense O(L2 ) by
on ℒL . The tensor P L

R
̂ c = ∑ ( ∑ 𝒲(k ) p
P L,q ⊗ ∑ 𝒲(k2 ) pL,q ⊗ ∑ 𝒲(k3 ) pL,q ).
L 1
̃ (1) ̃ (2) ̃ (3)
q=1 k1 ∈𝒦 k2 ∈𝒦 k3 ∈𝒦

This leads to the representation of the energy sum (14.23) (with accuracy O(h2 )) in a
form

Z 2 h−3 ̂ ̃ |x =0 ),
EL,T = (⟨PcL , 1⟩ − ∑ P
2 k∈𝒦
k

where the first term in brackets represents the full canonical tensor lattice sum re-
stricted to the k-grid composing the lattice ℒL , whereas the second term introduces
the correction at singular points xj − xk = 0. Here, 1 ∈ ℝL×L×L is the all-ones tensor.
By using the rank-1 tensor P0L = P̃ |x =0 1, the correction term can be represented by a
k
simple tensor operation

̃ |x =0 = ⟨P0L , 1⟩.
∑ P k
k∈𝒦

Finally, the interaction energy EL allows the approximate representation

Z 2 h−3 ̂
EL ≈ EL,T = (⟨PcL , 1⟩ − ⟨P0L , 1⟩), (14.24)
2
14.6 Interaction energy of the long-range potentials on finite lattices | 239

Table 14.4: Comparison of times for the full (Tfull ), (O(L6 )), and tensor-based (Ttens. ) calculation of the
interaction energy sum for the lattice electrostatic potentials.

L3 Tfull Ttens. EL,T abs. err.


3 6
24 37 1.2 3.7 ⋅ 10 2 ⋅ 10−8
323 250 1.5 1.5 ⋅ 107 1.5 ⋅ 10−9
483 3374 2.8 1.12 ⋅ 108 0
643 – 5.7 5.0 ⋅ 108 –
1283 – 13.5 1.6 ⋅ 1010 –
2563 – 68.2 5.2 ⋅ 1011 –

which can be implemented in O(L2 ) ≪ L3 log L complexity by tensor operations with


the rank-R canonical tensors in ℝL×L×L .
Table 14.4 illustrates the performance of the algorithm described above. We com-
pare the exact value computed by (14.23), of complexity O(L6 ), with the calculation
time obtained by using our scheme with the approximate tensor representation (14.24)
on the fine representation grid with n = n0 L, n0 = 128. The tested lattice systems are
composed of Hydrogen atoms with interatomic distance 2.0 bohr. The geometric size of
the largest 3D lattice with 2563 potentials is about 5243 bohr3 or 263 cubic nanometers,
see [152] for further details.
15 Range-separated tensor format for many-particle
systems
Numerical modelling of long-range electrostatic potentials in many-particle systems
leads to challenging computational problems, as it was already mentioned in Sec-
tion 14. Well recognized traditional approaches based on the Ewald summation [79],
fast Fourier transform or fast multipole expansion [103] methods usually apply to cal-
culating the interaction energy of a system (including the evaluation of the potential
only at N points sν ) scale as O(N log N) for N-particle systems. These approaches need
large computer facilities for meshing up the result of Ewald sums. Computation of
long-range interaction potentials of large multiparticle systems is discussed for exam-
ple in [58, 199, 278, 139], and using grid-based approaches in [16, 312]. Ewald-type
splitting of the Coulomb interaction into long- and short-range components was ap-
plied in density functional theory calculations [280].
A novel range-separated (RS) canonical/Tucker tensor format was recently intro-
duced in [24] for modeling of the multidimensional long-range interaction potentials
in multi-particle systems of general type. The main idea of the RS tensor format is the
independent grid-based low-rank representation of the localized and global parts in
the target tensor, which allows the efficient numerical approximation of N-particle in-
teraction potentials. The single reference potential such as 1/‖x‖ is split into a sum of
localized and long-range low-rank canonical tensors represented on a fine 3D n × n × n
Cartesian grid. The smooth long-range contribution to the total potential sum is rep-
resented in the form of low-rank canonical/Tucker tensor in O(n) storage. It is proven
that the resultant rank parameters depend only logarithmically on the number of par-
ticles N and the grid-size n. Agglomeration of the short-range part in the sum is re-
duced to an independent treatment of N localized terms with almost disjoint effective
supports, calculated in O(N) operations. Last but not the least, the RS-tensor format
allows to represents the collective potential of a multiparticle system at any point of
fine n × n × n grid at O(1) cost.
The RS canonical/Tucker tensor representations reduce the cost of multi-linear al-
gebraic operations on the 3D potential sums arising in multi-dimensional data model-
ing by radial basis functions, for example, in computation of the electrostatic potential
of a protein, in 3D integration and convolution transform, computation of gradients,
forces, and the interaction energy of a many-particle systems, and in the approxima-
tion of d-dimensional scatters data by reducing all of them to 1D calculations.
The presentation here mainly follows [24] (see also the recent publication [28]).
For a given non-local generating kernel p(‖x‖), x ∈ ℝ3 , the calculation of a weighted
sum of interaction potentials in the large N-particle system with the particle locations
at sν ∈ ℝ3 , ν = 1, . . . , N,

N
P(x) = ∑ν=1 Zν p(‖x − sν ‖), Zν ∈ ℝ, sν , x ∈ Ω = [−b, b]3 , (15.1)

https://doi.org/10.1515/9783110365832-015
242 | 15 Range-separated tensor format for many-particle systems

leads to computationally intensive numerical task. Indeed, the generating radial ba-
sis function p(‖x‖) is allowed to have a slow polynomial decay in 1/‖x‖ so that each
individual term in (15.1) contributes essentially to the total potential at each point in
Ω, thus predicting the O(N) complexity for the straightforward summation at every
fixed target x ∈ ℝ3 . Moreover, in general, the function p(‖x‖) has a singularity or a
cusp at the origin x = 0. Typical examples of the radial basis function p(‖x‖) are given
by the Newton 1/‖x‖, Slater e−λ‖x‖ , Yukawa e−λ‖x‖ /‖x‖, and other Green’s kernels (see
examples in Section 15.3.1).
The important ingredient of the RS approach is the splitting of a single reference
potential, say p(‖x‖) = 1/‖x‖, into a sum of localized and long-range low-rank canoni-
cal tensors represented on the grid Ωn . In this regard, it can be shown that the explicit
sinc-based canonical tensor decomposition of the generating reference kernel p(‖x‖)
by a sum of Gaussians implies the distinct separation of its long- and short-range parts.
Such range separation techniques can be gainfully applied to summation of a
large number of generally distributed potentials in (15.1). Indeed, a sum of the long-
range contributions can be represented by a single tensor leaving on the Ωn ⊂ Ω grid
by using the canonical-to-Tucker transform [174], which returns this part in the form
of a low-rank Tucker tensor. Hence, the smooth long-range contribution to the overall
sum is represented on the fine n × n × n grid Ωn in O(n) storage via the global canonical
or Tucker tensor with the separation rank that only weakly (logarithmically) depends
on the number of particles N. This important feature is confirmed by numerical tests
for the large clusters of generally distributed potentials in 3D; see Section 15.2.
In turn, the short-range contribution to the total sum is constructed by using a
single low-rank reference tensor with a small local support selected from the “short-
range” canonical vectors in the tensor decomposition of p(‖x‖). To that end, the whole
set of N short-range clusters is represented by replication and rescaling of the small-
size localized reference tensor, thus reducing the storage to the O(1)-parametrization
of the reference canonical tensor, and the list of coordinates and charges of particles.
Representation of the short-range part over n × n × n grid needs O(N n) computational
work for N-particle system. Such cumulated sum of the short-range components al-
lows the “local operations” in the RS-canonical format, making it particularly efficient
for tensor multilinear algebra.
The RS tensor formats provide a tool for the efficient numerical treatment of inter-
action potentials in many-particle systems, which, in some aspects, can be considered
as an alternative to the well-established multipole expansion method [103]. The par-
ticular benefit is the low-parametric representation of the collective interaction poten-
tial on large 3D Cartesian grid in the whole computational domain in the linear cost
O(n), thus outperforming the grid-based summation techniques based on the full-grid
O(n3 )-representation in the volume. Both global and local summation schemes are
quite easy in program implementation. The prototype algorithms in MATLAB applied
on a laptop allow computing the electrostatic potential of large many-particle systems
on fine grids of size up to n3 = 1012 .
15.1 Tensor splitting of the kernel into long- and short-range parts | 243

15.1 Tensor splitting of the kernel into long- and short-range


parts
From the definition of the sinc-quadrature (6.6), (6.3), we can easily observe that the
full set of approximating Gaussians includes two classes of functions: those with
small “effective support” and the long-range functions. Clearly, functions from differ-
ent classes may require different tensor-based schemes for their efficient numerical
treatment. Hence, the idea of the new approach is the constructive implementation
of a range separation scheme that allows the independent efficient treatment of both
the long- and short-range parts in the approximating radial basis functions.
Without loss of generality, we further consider the case of the Newton kernel, so
that the sum in (6.6) reduces to k = 0, 1, . . . , M (due to symmetry argument). From
(6.3), we observe that the sequence of quadrature points {tk } can be split into two sub-
sequences
𝒯 := {tk | k = 0, 1, . . . , M} = 𝒯l ∪ 𝒯s

with
𝒯l := {tk | k = 0, 1, . . . , Rl } and 𝒯s := {tk | k = Rl + 1, . . . , M}. (15.2)

The set 𝒯l includes quadrature points tk condensed “near” zero, hence generating the
long-range Gaussians (low-pass filters), and 𝒯s accumulates the increasing in M → ∞
sequence of “large” sampling points tk with the upper bound C02 log2 (M), correspond-
ing to the short-range Gaussians (high-pass filters). The quasi-optimal choice of the
constant C0 ≈ 3 was determined numerically in [30]. We further denote
𝒦l := {k | k = 0, 1, . . . , Rl } and 𝒦s := {k | k = l + 1, . . . , M}.

Splitting (15.2) generates the additive decomposition of the canonical tensor PR


onto the short- and long-range parts:
PR = PRs + PRl ,

where
PRs = ∑ p(1)
k
⊗ p(2)
k
⊗ p(3)
k
, PRl = ∑ p(1)
k
⊗ p(2)
k
⊗ p(3)
k
. (15.3)
tk ∈𝒯s tk ∈𝒯l

The choice of the critical number Rl = #𝒯l − 1 (or equivalently, Rs = #𝒯s = M − Rl )


that specifies the splitting 𝒯 = 𝒯l ∪ 𝒯s , is determined by the active support of the short-
range components such that one can cut off the vectors p(ℓ) k
, tk ∈ 𝒯s , outside of the
sphere Bσ of radius σ > 0, subject to a certain threshold δ > 0. For fixed δ > 0, the
choice of Rs is uniquely defined by the (small) parameter σ and vise versa. Given σ,
the two basic criteria corresponding to (A) the max-norm and (B) L1 -norm estimates
can be applied:
2 2 2 2
(A) 𝒯s = {tk : ak e−tk σ ≤ δ} ⇔ Rl = min k : ak e−tk σ ≤ δ (15.4)
244 | 15 Range-separated tensor format for many-particle systems

or
−tk2 x2 2 2
(B) 𝒯s := {tk : ak ∫ e dx ≤ δ} ⇔ Rl = min k : ak ∫ e−tk x dx ≤ δ. (15.5)
Bσ Bσ

Clearly, the sphere Bσ can be subsituted by the small box of the corresponding size.
The quantitative estimates on the value of Rl can be easily calculated by using the
explicit equation (6.3) for the quadrature parameters. For example, in case C0 = 3 and
a(t) = 1, criteria (A) implies that Rl solves the equation

2
3Rl log M h
( ) σ 2 = log( M ).
M δ

Criteria (15.4) and (15.5) can be slightly modified, depending on the particular applica-
tions to many-particles systems. For example, in electronic structure calculations, the
parameter σ can be associated with the typical inter-atomic distance in the molecular
system of interest (Van der Waals distance).
Figures 15.1 and 15.2 illustrate the splitting (15.2) for the tensor PR computed on
the n × n × n grid with the parameters R = 20, Rl = 12 and Rs = 8, respectively. Fig-
ure 15.1 shows the long-range canonical vectors from PRl in (15.3), whereas Figure 15.2
displays the short-range part described by PRs . Following criteria (A) with δ ≈ 10−4 ,
the effective support for this splitting is determined by σ = 0.9. The complete New-
ton kernel simultaneously resolves both the short- and long-range behavior, whereas
the function values of the tensor PRs vanish exponentially fast apart from the effective
support, as can be seen in Figure 15.2.
Inspection of the quadrature point distribution in (6.3) shows that the short- and
long-range subsequences are distributed nearly equally balanced, so that one can ex-
pect approximately

Rs ≈ Rl = M/2. (15.6)

Figure 15.1: Long-range canonical vectors for n = 1024, R = 20, Rl = 12, and the corresponding
potential.
15.2 Tensor summation of range-separated potentials | 245

Figure 15.2: Short-range canonical vectors for n = 1024, R = 20, Rs = 8, and the corresponding
potential.

The optimal choice may depend on the particular application specified by the separa-
tion parameter σ > 0 and the required accuracy.
The main advantage of the range separation in the splitting to the canonical tensor
PR in (15.3) is the opportunity for independent tensor representations of both sub-
tensors PRs and PRl , which leads to simultaneous reduction of their complexity and
storage demands. Indeed, the effective local support characterized by σ > 0 includes
a much smaller number of grid points ns ≪ n compared with the global grid size.
Hence, the storage cost Stor(PRs ) for the canonical tensor representation of the short-
range part is estimated by

Stor(PRs ) ≤ Rs ns ≪ Rn.

Furthermore, the long-range part Pl approximates a global smooth function, which


can be represented in Ω on a coarser grid with the number of grid points nl ≪ n.
Hence, we gain from the reduced complexity estimate

Stor(PRl ) ≤ Rl nl ≪ Rn.

It is worth noting that the advantage of separate treatment of smooth-nonlocal


and non-smooth but locally supported tensor components, allows not only the dra-
matic reduction of the storage costs Rs ns + Rl nl = O(Rn), but also efficient bilinear
tensor operations preserving the individual storage complexities as will be shown in
the next sections.

15.2 Tensor summation of range-separated potentials


In this section, following [24], we describe how the range-separated tensor represen-
tation of the generating potential function can be applied to the fast and accurate
grid-based computation of a large sum of non-local potentials centered at arbitrary
246 | 15 Range-separated tensor format for many-particle systems

locations in the 3D volume. This task leads to the bottleneck computational problem
in the modeling of large stationary and dynamical N-particle systems.

15.2.1 Quasi-uniformly separable point distributions

One of the main limitations for the use of direct grid-based canonical/Tucker approxi-
mations to the large potential sums is due to the strong increase in tensor rank propor-
tionally to the number of particles N0 in a system. Figures 15.3 and 15.5 show the Tucker
ranks for electrostatic potential in the protein-type system consisting of N0 = 783
atoms.

Figure 15.3: The directional Tucker ranks computed by RHOSVD for a protein-type system with
n = 1024 (left) and n = 512 (right).

Given the generating kernel p(‖x‖), we consider the problem of efficiently calculating
the weighted sum of a large number of single potentials located in a set 𝒮 of separable
distributed points (sources) sν ∈ ℝ3 , ν = 1, . . . , N0 , embedded into the fixed bounding
box Ω = [−b, b]3 ,
N0
P0 (x) = ∑ zν p(‖x − sν ‖), zν ∈ ℝ. (15.7)
ν=1

The function p(‖x‖) is allowed to have slow polynomial decay in 1/‖x‖ so that each
individual source contributes essentially to the total potential at each point in Ω.

Definition 15.1 (Well-separable point distribution). Given a constant σ∗ > 0, a set 𝒮 =


{sν } of points in ℝd is called σ∗ -separable if

d(sν , sν󸀠 ) := ‖sν − sν󸀠 ‖ ≥ σ∗ for all ν ≠ ν󸀠 . (15.8)

A family of point sets {𝒮1 , . . . , 𝒮m } is called uniformly σ∗ -separable if (15.8) holds for
every set 𝒮m󸀠 , m󸀠 = 1, 2, . . . , m, independently of the number of particles in a set 𝒮m󸀠 .
15.2 Tensor summation of range-separated potentials | 247

Condition (15.8) can be reformulated in terms of the so-called separation distance


q𝒮 of the point set 𝒮 :

q𝒮 := min min d(sν , s) ≥ σ∗ . (15.9)


s∈𝒮 sν ∈𝒮\s

Definition 15.1 on separability of point distributions is fulfilled, particularly, in the


case of large molecular systems (proteins, crystals, polymers, nano-clusters), where
all atomic centers are strictly separated from each other by a certain fixed inter-atomic
distance (e. g., Van der Waals distance). The same happens for lattice-type structures,
where each atomic cluster within the unit cell is separated from the neighbors by a
distance proportional to the lattice step-size.

Figure 15.4: Inter-particle distances in an ascendant order for protein-type structure with 500 parti-
cles (left); zoom for the first 100 smallest inter-particle distances (right).

Figure 15.4 (left) shows inter-particle distances in ascending order for a protein-type
structure including 500 particles. The total number of distances equals to N(N − 1)/2,
where N is the number of particles. Figure 15.4 (right) indicates that the number of
particles with small inter-particle distances is very moderate. In particular, for this
example, the number of pairs with interparticle distances less than 1 Å is about 0.04 %
(≈110) of the total number of 2.495 ⋅ 105 distances.
For ease of presentation, we further confine ourselves to the case of electrostatic
1
potentials described by the Newton kernel p(‖x‖) = ‖x‖ .

15.2.2 Low-rank representation to the sum of long-range terms

First, we describe the tensor summation method for calculating the collective inter-
action potential of a multi-particle system that includes only the long-range contri-
bution from the generating kernel. We introduce the n × n × n rectangular grid Ωn in
Ω = [−b, b]3 and the auxiliary 2n × 2n × 2n grid on the accompanying domain Ω̃ = 2Ω
248 | 15 Range-separated tensor format for many-particle systems

of double size. Conventionally, the canonical rank-R tensor representing the Newton
kernel (by projecting onto the n × n × n grid) is denoted by PR ∈ ℝn×n×n ; see (6.6).
Consider the splitting (15.3) applied to the reference canonical tensor PR and to its
extended version P̃ R = [p
̃ R (i1 , i2 , i3 )], iℓ ∈ Iℓ , ℓ = 1, 2, 3 such that

P
̃R = P ̃ R ∈ ℝ2n×2n×2n .
̃R + P
s l

For technical reasons, we further assume that the tensor grid Ωn is fine enough, such
that all charge centers 𝒮 = {sν } specifying the total electrostatic potential in (15.7)
belong to the set of grid points, that is, sν = (sν,1 , sν,2 , sν,3 )T = h(j1(ν) , j2(ν) , j3(ν) )T ∈ Ωh with
some indices 1 ≤ j1(i) , j2(i) , j3(i) ≤ n.
The total electrostatic potential P0 (x) in (15.7) is represented by a projected ten-
sor P0 ∈ ℝn×n×n , which can be constructed by a direct sum of shift-and-windowing
transforms of the reference tensor P ̃ R (see Chapter 14 for more detail):

N0 N0
P0 = ∑ zν 𝒲ν (P
̃ R ) = ∑ zν 𝒲ν (P
̃R + P
s
̃ R ) =: Ps + Pl .
l
(15.10)
ν=1 ν=1

The shift-and-windowing transform 𝒲ν maps a reference tensor P ̃ R ∈ ℝ2n×2n×2n onto


its sub-tensor of smaller size n × n × n, obtained by first shifting the center of the tensor
P
̃ R to the point sν and then tracing (windowing) the result onto the domain Ωn :

𝒲ν : P 󳨃 P(ν) = [p(ν) p(ν)


i ,i ,i := pR (i1 + j1 , i2 + j2 , i3 + j3 ), iℓ ∈ Iℓ .
̃R → (ν) (ν) (ν)
i ,i ,i ],
̃
1 2 3 1 2 3

Notice that the Tucker rank of the full tensor sum P0 increases almost proportionally to
the number N0 of particles in the system (see Figure 15.5) representing singular values
of the side matrix in the canonical tensor P0 . On the other hand, the canonical rank
of the tensor P0 shows up the pessimistic bound ≤ R N0 .
To overcome this difficulty, in what follows, we consider the global tensor decom-
position of only the long-range part in the tensor P0 , defined by

N0 N0
Pl = ∑ zν 𝒲ν (P
̃ R ) = ∑ zν 𝒲ν ( ∑ p
l
̃ (1)
k
⊗p
̃ (2)
k
⊗p
̃ (3)
k
). (15.11)
ν=1 ν=1 k∈𝒦l

The initial canonical rank of the tensor Pl equals to Rl N0 , and, again, it may increase
dramatically for a large number of particles N0 . Since by construction the tensor Pl
approximates rather smooth function on the domain Ω, one may expect that the large
initial rank can be reduced considerably to some value R∗ , which remains almost inde-
pendent of N0 . The same beneficial property can be expected for the Tucker rank of Pl .
The principal ingredient of our tensor approach is the rank reduction in the initial
canonical sum Pl by application of RHOSVD and the multigrid accelerated canonical-
to-Tucker transform [174].
15.2 Tensor summation of range-separated potentials | 249

Figure 15.5: Mode-1 singular values of the side matrix in the full potential sum vs. the number of
particles N0 = 200, 400, 774 and grid-size n: n = 512 (left), n = 1024 (right).

To simplify the exposition, we suppose that the tensor entries in Pl are computed by
collocation of Gaussian sums at the centers of the grid-cells. This provides the repre-
sentation that is very close to that obtained by (6.6).
x2
We consider the Gaussian in normalized form Gp (x) = e so that the relation

2p2

2
−tk2 x2 − x2 1
e = e2p holds, that is, we set tk = √2pk
with tk = khM , k = 0, 1, . . . , M, where
hM = C0 log M/M. Now criterion (B) on the bound of the L1 -norm (see (15.5)) reads

x2


2p2 ε
ak ∫ e k ≤ < 1, ak = hM .
2
a

The following theorem proves the important result justifying the efficiency of
range-separated formats applied to a class of radial basis functions p(r): the Tucker
ε-rank of the long-range part in accumulated sum of potentials computed in the
bounding box Ω = [−b, b]3 remains almost uniformly bounded in the number of
particles N0 (but depends on the size b of the domain).

Theorem 15.2 ([24]). Let the long-range part Pl in the total interaction potential (see
(15.11)) correspond to the choice of splitting parameter in (15.6) with M = O(log2 ε).
Then the total ε-rank r0 of the Tucker approximation to the canonical tensor sum Pl is
bounded by

|r0 | := rankTuck (Pl ) ≤ Cb log3/2 (|log(ε/N0 )|),

where the constant C does not depend on the number of particles N0 .

Proof. The proof can be sketched by the following steps: First, we represent all shifted
Gaussian functions contributing to the total sum in the fixed set of basis functions by
using truncated Fourier series. Second, we prove that on the “long-range” index set
k ∈ 𝒯l the parameter pk remains uniformly bounded in N0 from below, implying the
250 | 15 Range-separated tensor format for many-particle systems

uniform bound on the number of terms in the ε-truncated Fourier series. Finally we
take into account that the summation of elements presented in the fixed Fourier basis
set does not enlarge the Tucker rank, but only affects the Tucker core. The dependence
on b appears in the explicit form.
Specifically, let us consider the rank-1 term in the splitting (15.3) with maximal
index k ∈ 𝒯l . Taking into account the asymptotic choice M = log2 ε (see (6.5)), where
ε > 0 is the accuracy of the sinc-quadrature, relation (15.6) implies
M
max tk = Rl hM = C log(M)/M ≈ log(M) = 2 log(|log(ε)|). (15.12)
k∈𝒯l 2 0
Now we consider the Fourier transform of the univariate Gaussian on [−b, b],
2 M 󵄨󵄨󵄨 ∞ 󵄨
−x πmx πmx 󵄨󵄨󵄨󵄨
Gp (x) = e 2p2 = ∑ αm cos( ) + η, with |η| = 󵄨󵄨󵄨 ∑ αm cos( )󵄨󵄨 < ε,
󵄨
m=0
b 󵄨󵄨
󵄨m=M+1 b 󵄨󵄨󵄨

where
x2
b b
∫−b e cos( πmx

2p2
b
)dx πmx 2b if m = 0,
αm = with |Cm |2 = ∫ cos2 ( )dx = {
|Cm |2 b b otherwise.
−b

Following arguments in [68], one obtains


2 m2 p2
−π
αm = (pe 2a2 − ξm )/|Cm |2 , where 0 < ξm < ε.

The truncation coefficients αm at m = m0 such that αm0 ≤ ε lead to the bound

√2 b p √2 b p 1
m0 ≥ log0.5 ( )= log0.5 ( ).
π p (1 + |CM |2 )ε π p 1+bε
On the other hand, (15.12) implies

1/pk ≤ c log(|log ε|), k ∈ 𝒯l , i. e., 1/pRl ≈ log(|log ε|),

which ensures the following estimate of m0 :

m0 = O(b log3/2 (|log ε|)). (15.13)

Following [148], we represent the Fourier transform of the shifted Gaussians by


M
πm(x − xν )
Gp (x − xν ) = ∑ αm cos( ) + ην , |ην | < ε,
m=0
b

which requires only the double number of terms compared with the single Gaussian
analyzed above. To compensate the possible increase in | ∑ν ην |, we refine ε 󳨃→ ε/N0 .
These estimates also apply to all Gaussian functions presented in the long-range sum
since they have larger values of pk than pRl . Indeed, in view of (15.6), the number of
summands in the long-range part is of order Rl = M/2 = O(log2 ε). Combining these
arguments with (15.13) proves the resulting estimate.
15.2 Tensor summation of range-separated potentials | 251

Figure 15.6 illustrates the very fast decay of the Fourier coefficients for the “long-
range” discrete Gaussians sampled on n-point grid (left) and the slow decay of Fourier
coefficients for the “short-range” Gaussians (right). In the latter case, almost all the
coefficients remain essential, resulting in the full rank decomposition. The grid size is
chosen as n = 1024.

Figure 15.6: Fourier coefficients of the long- (left) and short-range (right) discrete Gaussians.

Remark 15.3. Notice that for fixed σ > 0 the σ-separability of the point distributions
(see Definition 15.1) implies that the volume size of the computational box [−b, b]3
should increase proportionally to the number of particles N0 , i. e., b = O(N01/3 ). Hence,
Theorem 15.2 indicates that the number of entries in the Tucker core of size r1 × r2 × r3
can be estimated by CN0 . This asymptotic cost remains of the same order in N0 as that
for the short-range part in the potential sum.

Figure 15.7 (left) illustrates that the singular values of side matrices for the long-
range part (by choosing Rl = 12) exhibit fast exponential decay with a rate indepen-
dent of the number of particles N0 = 214, 405, 754. Figure 15.7 (right) zooms into the
first 50 singular values, which are almost identical for different values of N0 . The fast
decay in these singular values guarantees the low-rank RHOSVD-based Tucker decom-
position of the long-range part in the potential sum.
Table 15.1 shows the Tucker ranks of sums of long-range ingredients in the elec-
trostatic potentials for the N-particle clusters. The Newton kernel is generated on the
grid with n3 = 10243 in the computational box of volume size b3 = 403 Å, with accu-
racy ε = 10−4 and canonical rank 21. Particle clusters with 200, 400, and 782 atoms
are taken as a part of protein-like multiparticle system. The clusters of size 1728 and
4096 correspond to the lattice structures of sizes 12 × 12 × 12 and 16 × 16 × 16, with
randomly generated charges. The line “RS-canonical rank” shows the resulting rank
after the canonical-to-Tucker and Tucker-to-canonical transform with εC2T = 4 ⋅ 10−5
and εT2C = 4 ⋅ 10−6 . Figures 15.8 show the accuracy of the RS-canonical tensor ap-
proximation for a multiparticle cluster of 400 particles at the middle section of the
252 | 15 Range-separated tensor format for many-particle systems

Figure 15.7: Mode-1 singular values of side matrices for the long range part (Rl = 12) in the total
potential vs. the number of particles N (left), and zoom of the first singular values (right).

Table 15.1: Tucker ranks and the RS canonical rank of the multiparticle potential sum vs. the number
of particles N for varying parameters Rℓ and Rs . Grid size n3 = 10243 .

N 200 400 782 1728 4096


Rℓ /Rs Ranks full can. 4200 8400 16 422 32 288 86 016
9/12 Ranks long range 1800 3600 7038 15 552 36 864
RS-Tucker ranks 21,16,18 22,19,23 24,22,24 23,24,24 24,24,24
RS-canonical rank 254 292 362 207 243

computational box [−20, 20]3 Å by using an n × n × n 3D Cartesian grid with n = 1024


and step size h = 0.04 Å. The top-left figure shows the surface of the potential at the
level z = 0, whereas the top-right figure shows the absolute error of the RS approxima-
tion with the ranks Rl = 15, Rs = 11, and the separation distance σ ∗ = 1.5. The bottom
figures visualize the long-range (left) and short-range (right) parts of the RS-tensor,
respectively.
Figure 15.9 demonstrates the decay in singular values of the side matrices in the
canonical tensor representing potential sums of long-range parts for Rl = 10, 11,
and 12.
The proof of Theorem 15.2 indicates that the Tucker directional vectors living on
large n⊗d spatial grids are represented in the uniform Fourier basis with a small num-
ber of terms. Hence, following the arguments in [68] and [148], we are able to apply the
low-rank QTT tensor approximation [167] to these long vectors (see [225] for the case
of matrices). The QTT tensor compression makes it possible to reduce the representa-
tion complexity of the long-range part in an RS tensor to the logarithmic scale in the
univariate grid size O(log n).
The most time- consuming part in our scheme is the canonical-to-Tucker algo-
rithm for computing the long-range part of the RS format tensors. Table 15.2 indicates
15.2 Tensor summation of range-separated potentials | 253

Figure 15.8: Top: the potential sum at the middle plane of a cluster with 400 atoms (left) and the
error of the RS-canonical approximation (right). Bottom: long-range part of a sum (left); short range
part of a sum (right).

Figure 15.9: Example of potential surface at level z = 0 (left) for a sum of N0 = 200 particles com-
puted using only their long-range parts with Rl = 12. Decay in singular values of the side matrices for
the canonical tensor representing sums of long-range parts for Rl = 10, 11, and 12.

almost linear scaling of CPU time in the number of particles and in the univariate grid-
size n of the n × n × n representation grid. The last column shows the resulting ranks
of side matrices U (ℓ) in the canonical tensor U, (see (15.16)). The asymptotically opti-
254 | 15 Range-separated tensor format for many-particle systems

Table 15.2: Times (sec) for canonical-to-Tucker rank reduction vs. number of particles N and grid
size n3 .

N / n3 5123 10243 20483 40963 81923 16 3843 RRS,C

100 0.9 1.5 2.3 4.1 6.0 12.2 183


200 2.3 3.0 4.7 7.9 14.4 23.4 214
400 5.2 7.0 8.7 16.1 32.9 71.7 227
770 12.3 13.8 18.3 32.7 67.5 147.3 290

mal complexity scaling of the RS decomposition and the required storage is the main
motivation for applications of the RS tensor format.

15.2.3 Range-separated canonical and Tucker tensor formats

In applications to many-particle modeling, the initial rank parameter R of the canon-


ical tensor representation is proportional to the (large) number of particles N0 with
pre-factor about 30, whereas the weights zk can be rather arbitrary. Notice that the
sub-class of the so-called orthogonal canonical tensors [192] allows stable multi-linear
algebra, but suffers from the poor approximation capacity. Another important class is
specified by the case of “monotone” or all positive canonical vectors (see [174, 153] for
definition), which is also the case in decomposition of the elliptic Green’s kernels.

Remark 15.4. The second class of all positive vectors ensures the stability of RHOSVD
for problems such as (15.7) in the case of all positive (negative) weights (see Lem-
ma 14.10 and discussions thereafter).

The idea regarding how to get rid of the “curse of ranks”, the critical bottleneck
in applying tensor methods to problems such as (15.7), is suggested by results in The-
orem 15.2 on the almost uniform bound (in the number of particles N0 ) of the Tucker
rank for the long-range part of a multi-particle potential. Thanks to this beneficial
property, the new range-separated (RS) tensor formats was introduced in [24]. It is
based on the aggregated composition of global low-rank canonical/Tucker tensors
with the locally supported canonical tensors living on non-intersecting sub-sets in-
dices embedded into the large corporate multi-index set ℐ = I1 × ⋅ ⋅ ⋅ × Id , Iℓ = {1, . . . , n}.
Such a parametrization attempts to represent the large multidimensional arrays with
a storage cost linearly proportional to the number of cumulated inclusions (sub-
tensors).
The structure of the range-separated canonical/Tucker tensor formats is specified
by a composition of the local-global low parametric representations, which provide
good approximation features in application to the problems of grid-based representa-
tion to many-particle interaction potentials with multiple singularities.
15.2 Tensor summation of range-separated potentials | 255

Figure 15.10: Schematic illustration of effective supports of the cumulated canonical tensor (left);
short-range canonical vectors for k = 1, . . . , 11, presented in logarithmic scale (right).

Definition 15.5 (Cumulated canonical tensors, [24]). Given the index set ℐ , a set of
multi-indices (sources) 𝒥 = {j(ν) := (j1(ν) , j2(ν) , . . . , jd(ν) )}, ν = 1, . . . , N0 , jℓ(ν) ∈ Iℓ , and the
width index parameter γ ∈ ℕ such that the γ-vicinity of each point j(ν) ∈ 𝒥 , that is,
𝒥γ(ν) := {j : |j − j(ν) | ≤ γ}, does not intersect all others:

󸀠
𝒥γ
(ν)
∩ 𝒥γ(ν ) = ⌀, ν ≠ ν 󸀠 .

A rank-R0 cumulated canonical tensor U ̂ associated with 𝒥 and width parameter γ is


defined as a set of tensors that can be represented in form

̂ = ∑N0 cν Uν
U with rank(Uν ) ≤ R0 , (15.14)
ν=1

where the rank-R0 canonical tensors Uν = [uj ] are vanishing beyond the γ-vicinity
of j(ν) :

uj = 0 for j ⊂ ℐ \ 𝒥γ(ν) , ν = 1, . . . , N0 . (15.15)

Definition 15.5 provides the description of a sum of short-range potentials hav-


ing the local (up to some threshold) non-intersecting supports. Given the particular
point distribution, the effective support of the localized sub-tensors should be of the
size close to the parameter σ∗ (see Definition 15.1) that introduces the σ∗ -separable
point distributions characterized by the separation parameter σ∗ > 0. In this case,
we use the relation σ∗ ≈ γh, where h = 2b/n is the mesh size of the computational
(n × ⋅ ⋅ ⋅ × n)-grid.
Figure 15.10 (left) illustrates the effective supports of a cumulated canonical tensor
in the non-overlapping case, whereas Figure 15.10 (right) presents the supports for
the first 11 short-range canonical vectors (selected from rank-24 reference canonical
tensor PR ), which allows choosing the parameter γ in separation criteria.
256 | 15 Range-separated tensor format for many-particle systems

The separation criteria in Definition 15.5 leads to a rather “aggressive” strategy for
selecting the short-range part PRs in the reference canonical tensor PR allowing easy
implementation of the cumulated canonical tensor (non-overlapping case). However,
in some cases, this may lead to overestimation of the Tucker/canonical rank in the
long-range tensor component. To relax the criteria in Definition 15.5, we propose the
“soft” strategy that allows including a few (i. e., O(1) for large N0 ) neighboring particles
into the local vicinity 𝒥γ(ν) of the source point sν , which can be achieved by increasing
the overlap parameter γ > 0. This allows controlling the bound on the rank param-
eter in the long-range tensor almost uniformly in the system size N0 . The following
example illustrates this issue.

Example 15.6. Assume that the separation distance is equal to σ∗ = 0.8 Å, corre-
sponding to the example in Figure 15.4 (right), and the given computational threshold
is ε = 10−4 . Then we find from Figure 15.10 (right) that the “aggressive” criteria in
Definition 15.5 lead to choosing Rs = 10, since the value of the canonical vector with
k = 11 at point x = σ∗ is about 10−3 . Hence, in order to control the required rank
parameter Rl , we have to extend the overlap area to larger parameter σ∗ and, hence,
to larger γ. This will lead to a small O(1)-overlap between supports of the short-range
tensor components, but without asymptotic increase in the total complexity.

Table 15.3 represents the Tucker ranks r = (r1 , r2 , r3 ) for the long-range parts of
N0 -particle potentials. The reference Newton kernel is approximated on a 3D grid of
size 20483 with the rank R = 29 and accuracy ε𝒩 = 10−5 . Here, the Tucker tensor is
computed with the stopping criteria εT2C = 10−5 in the ALS iteration. It can be seen
that for fixed Rl , the Tucker ranks increase only moderately in the system size N0 .

Table 15.3: Tucker ranks r = (r1 , r2 , r3 ) for the long-range parts of N0 -particle potentials.

N0 / Rl 8 9 10 11 12 13

200 10,10,11 13,12,12 18,15,16 23,19,21 32,24,27 42,30,34


400 11,10,11 14,13,14 19,16,20 26,21,26 35,27,36 47,34,47
782 11,11,12 15,14,15 20,18,20 28,26,27 39,35,37 52,46,50

Below, we distinguish a special subclass of uniform CCT tensors.

Definition 15.7 (Uniform CCT tensors, [24]). A CCT tensor in (15.14) is called uniform if
R0
all components Uν are generated by a single rank-R0 tensor U0 = ∑m=1 μm û (1)
m ⊗⋅ ⋅ ⋅⊗ um
̂ (d)
such that Uν |𝒥 (ν) = U0 .
δ

Now, we are in a position to define the range separated canonical and Tucker ten-
sor formats in ℝn1 ×⋅⋅⋅×nd . The RS canonical format is defined as follows.
15.2 Tensor summation of range-separated potentials | 257

Definition 15.8 (RS-canonical tensors, [24]). The RS-canonical tensor format specifies
the class of d-tensors A ∈ ℝn1 ×⋅⋅⋅×nd , which can be represented as a sum of a rank-R
canonical tensor U ∈ ℝn1 ×⋅⋅⋅×nd and a (uniform) cumulated canonical tensor generated
by U0 with rank(U0 ) ≤ R0 as in Definition 15.7 (or more generally in Definition 15.5):
R N0
A = ∑ ξk u(1)
k
⊗ ⋅ ⋅ ⋅ ⊗ u(d)
k
+ ∑ cν Uν , (15.16)
k=1 ν=1

where diam(suppUν ) ≤ 2γ in the index size.

For a given grid-point i ∈ ℐ = I1 × ⋅ ⋅ ⋅ × Id , we define the set of indices

ℒ(i) := {ν ∈ {1, . . . , N0 } : i ∈ supp Uν },

which label all short-range tensors Uν , including the grid-point i within its effective
support.

Lemma 15.9 ([24]). The storage cost of RS-canonical tensor is estimated by

stor(A) ≤ dRn + (d + 1)N0 + dR0 γ.

Given i ∈ ℐ , denote by ui the row-vector with index iℓ in the side matrix U (ℓ) ∈ ℝnℓ ×R ,
(ℓ)

and let ξ = (ξ1 , . . . , ξd ). Then the ith entry of the RS-canonical tensor A = [ai ] can be
calculated as a sum of long- and short-range contributions by

ai = (⊙dℓ=1 ui )ξ T + ∑ cν Uν (i)
(ℓ)

ν∈ℒ(i)

at the expense O(dR + 2dγR0 ).

Proof. Definition 15.8 implies that each RS-canonical tensor is uniquely defined by
the following parametrization: rank-R canonical tensor U, the rank-R0 local reference
canonical tensor U0 with mode-size bounded by 2γ, and list 𝒥 of the coordinates and
weights of N0 particles. Hence, the storage cost directly follows. To justify the represen-
tation complexity, we notice that by well-separability assumption (see Definition 15.1),
we have #ℒ(i) = O(1) for all i ∈ ℐ . This proves the complexity bounds.

Now we define the class of RS-Tucker tensors.

Definition 15.10 (RS-Tucker tensors, [24]). The RS-Tucker tensor format specifies the
class of d-tensors A ∈ ℝn1 ×⋅⋅⋅×nd , which can be represented as a sum of a rank-r Tucker
tensor V and a (uniform) cumulated canonical tensor generated by U0 with rank(U0 ) ≤
R0 as in Definition 15.7 (or more generally in Definition 15.5):
N0
A = β ×1 V (1) ×2 V (2) ⋅ ⋅ ⋅ ×d V (d) + ∑ cν Uν , (15.17)
ν=1

where the tensor Uν , ν = 1, . . . , N0 , has local support, that is, diam(supp Uν ) ≤ 2γ.
258 | 15 Range-separated tensor format for many-particle systems

Similar to Lemma 15.9, the corresponding statement for the RS-Tucker tensors can
be proven.

Lemma 15.11 ([24]). The storage size for RS-Tucker tensor does not exceed

stor(A) ≤ r d + drn + (d + 1)N0 + dR0 γ.

Let the rℓ -vector v(ℓ)


iℓ
be the iℓ th row of the matrix V (ℓ) . Then the ith element of the RS-
Tucker tensor A = [ai ] can be calculated by

ai = β ×1 v(1) × v(2) ⋅ ⋅ ⋅ ×d v(d)


il 2 i i + ∑ cν Uν (i)
1 2 d
ν∈ℒ(i)

at the expanse O(r d + 2dγR0 ).

Proof. In view of Definition 15.10, each RS-Tucker tensor is uniquely defined by the
following parametrization: the rank-r = (r1 , . . . , rd ) Tucker tensor V ∈ ℝn1 ×⋅⋅⋅×nd , the
rank-R0 local reference canonical tensor U0 with diam(suppU0 ) ≤ 2γ, list 𝒥 of the
coordinates of N0 centers of particles, {sν }, and N0 weights {cν }. This proves the com-
plexity bounds.

The main computational benefits of the new range-separated canonical/Tucker


tensor formats are explained by the important uniform bounds on the Tucker-rank
of the long-range part in the large sum of interaction potentials (see Theorem 15.2
and numerics in Section 15.2.2). Moreover, we have the low storage cost for RS-
canonical/Tucker tensors, cheap representation of each entry in an RS-tensor, pos-
sibility for simple implementation of multilinear algebra on these tensors (see the
discussion at the end of this section).
The total rank of the sum of canonical tensors in U
̂ (see (15.14)) may become large
for larger N0 since the pessimistic bound rank(U) ≤ N0 R0 . However, cumulated canon-
̂
ical tensors (CCT) have two beneficial features, which are particularly useful in the
low-rank tensor representation of large potential sums.

Proposition 15.12 (Properties of CCT tensors).


(A) The rank of a CCT tensor U
̂ is bounded by R0 : rankloc (U)
̂ := maxν rank(Uν ) ≤ R0 .
(B) Local components in the CCT tensor (15.14) are “block orthogonal” in the sense that

⟨Uν , Uν󸀠 ⟩ = 0, ∀ν ≠ ν󸀠 . (15.18)

N
(C) ‖U‖ = ∑ν=1
0
cν ‖Uν ‖.

If R0 = 1, that is, U
̂ is the conventional rank-N0 canonical tensor, then property (B)
in Proposition 15.12 leads to the definition of orthogonal canonical tensors in [192].
Hence, in case R0 > 1, we arrive at the generalization further called the block orthogo-
nal canonical tensors.
15.2 Tensor summation of range-separated potentials | 259

The rank bound R󸀠 = rank(U) ̂ ≤ N0 R0 indicates that the direct summation in


(15.14) in the canonical/Tucker formats may lead to practically non-tractable represen-
tations. However, the block orthogonality property in Proposition 15.12 (B) allows ap-
plying the stable RHOSVD approximation for the rank optimization (see Section 3.3).
The stability of RHOSVD in the case of orthogonal canonical tensors was analyzed in
[174, 153]. Concerning stability of RHOSVD for RS tensor format see Remark 15.4.
In what follows, we prove the stability of such tensor approximation applied to
CCT representations.
R
Lemma 15.13 ([24]). Let the local canonical tensors be stable, that is, ∑m=1
0
μ2m ≤ C‖Uν ‖2
(see Definition 15.7). Then the rank-r RHOSVD-Tucker approximation U0(r) to the CCT U ̂
provides the stable error bound
1/2
3 min(n,R󸀠 )
0 󵄩 2
󵄩󵄩U − U(r) 󵄩󵄩󵄩 ≤ C ∑ ( σℓ,k ) ‖U‖,
󵄩󵄩̂ ̂

ℓ=1 k=rℓ +1

where σℓ,k denote the singular values of the side matrices U (ℓ) ; see (3.34).

Proof. We apply the general error estimate for RHOSVD approximation [174] to obtain
1/2 N R 1/2
3 min(n,R󸀠 ) 0 0
0 󵄩 2 2 2
󵄩󵄩U − U(r) 󵄩󵄩󵄩 ≤ C ∑ ( ∑ σℓ,k ) ( ∑ ∑ cν μm )
󵄩󵄩̂
ℓ=1 k=rℓ +1 ν=1 m=1

and then take into account the property (C), Proposition 15.12 to estimate
N0 R0 N0 R0 N0
∑ ∑ cν2 μ2m = ∑ cν2 ∑ μ2m ≤ C ∑ cν2 ‖Uν ‖2 = C‖U‖
̂ 2,
ν=1 m=1 ν=1 m=1 ν=1

which completes the proof.

The stability assumption in Lemma 15.13 is satisfied in the case of the constructive
canonical tensor approximation to the Newton and other types of Green’s kernels ob-
tained by sinc-quadrature based representations, where all canonical skeleton vectors
are non-negative and monotone.

Remark 15.14. In the case of higher dimensions d > 3, the local canonical tensors
can be combined with the global tensor train (TT) format [226] such that the simple
canonical-to-TT transform can be applied. In this case, the RS-TT format can be intro-
duced as a set of tensor represented as a sum of CCT term and the global TT-tensor.
The complexity and structural analysis is completely similar to those in the case of
the RS-Canonical and RS-Tucker formats.

We sketch the algebraic operations on the RS tensors. Multilinear algebraic op-


erations in the format of RS-canonical/Tucker tensor parametrization can be imple-
mented by using 1D vector operations applied to both localized and global tensor com-
ponents. In particular, the following operations on RS canonical/Tucker tensors can
260 | 15 Range-separated tensor format for many-particle systems

be realized efficiently: (a) storage of a tensor; (b) real space representation on a fine
rectangular grid; (c) summation of many-particle interaction potentials represented
on the fine tensor grid; (d) computation of scalar products; and (e) computation of
gradients and forces.
Estimates on the storage complexity for the RS-canonical and RS-Tucker formats
were presented in Lemmas 15.9 and 15.11. Items (b) and (c) were addressed earlier. Cal-
culation of the scalar product of two RS-canonical tensors in the form (15.16), defined
on the same set 𝒮 of particle centers, can be reduced to the standard calculation of the
cross scalar products between all elementary canonical tensors presented in (15.16).
Hence, the numerical cost can be estimated by O( 21 R(R − 1)dn + 2γRR0 N0 ).

15.3 Outline of possible applications


The RS tensor formats can be gainfully applied in computational problems, includ-
ing functions with multiple local singularities or cusps, Green kernels with essentially
non-local behavior, and in various approximation problems by means of radial basis
functions. In this section, we follow [24] and sketch how the RS tensor representations
can be applied to some computationally extensive problems, such as grid representa-
tion of multidimensional scattered data, interaction energy of charged many-particle
system, computation of gradients and forces for nano-particle potentials, and con-
struction of approximate boundary/interface conditions in the Poisson–Boltzmann
equation describing the electrostatic potential of proteins.

15.3.1 Multidimensional data modeling

Here we and briefly describe the model reduction approach to the problem of multi-
dimensional data fitting based on the RS tensor approximation. The problems of mul-
tidimensional scattered data modeling and data mining are known to lead to compu-
tationally intensive simulations. We refer to [42, 141, 34, 84, 129] for the discussion of
most commonly used computational approaches in this field of numerical analysis.
The mathematical problems in scattered data modeling are concerned with the
approximation of multi-variate function f : ℝd → ℝ (d ≥ 2) by using samples given at
certain finite set 𝒳 = {x1 , . . . , xN } ⊂ ℝd of pairwise distinct points; see, e. g., [42]. The
function f may describe the surface of a solid body, the solution of a PDE, many-body
potential field, multiparametric characteristics of physical systems, or some other
multidimensional data.
In a particular problem setting, one may be interested in recovering f from a given
sampling vector f|𝒳 = (f (x1 ), . . . , f (xN )) ∈ ℝN . One of the traditional ways to tackle this
problem is based on constructing a suitable functional interpolant PN : ℝd → ℝ,
15.3 Outline of possible applications | 261

satisfying PN|𝒳 = f|𝒳 =: f, that is,

PN (xj ) = f (xj ), ∀1≤j≤N (15.19)

or approximating the sampling vector f|𝒳 on the set 𝒳 in the least squares sense. We
consider the approach based on using radial basis functions (RBFs) providing the tra-
ditional tools for multivariate scattered data interpolation. To that end, the radial ba-
sis function (RBF) interpolation approach deals with a class of interpolants PN in the
form
N
PN (x) = ∑ cj p(‖x − xj ‖) + Q(x), Q is some smooth function, (15.20)
j=1

where p : [0, ∞) → ℝ is a fixed radial function, and ‖ ⋅ ‖ is the Euclidean norm on ℝd .


To fix the idea, here we consider the particular version of (15.20) by setting Q = 0. No-
tice that the interpolation ansatz PN in (15.20) has the same form as the multi-particle
interaction potential in (15.7). This observation indicates that the numerical treatment
of various problems based on the use of interpolant PN can be handled by using the
same tools of model reduction via rank-structured RS tensor approximation.
The particular choice of RBFs described in [42, 141] includes functions p(r) in the
form
ν
rν , (1 + r 2 ) , (ν ∈ ℝ), exp(−r 2 ), r 2 log(r).

For our tensor-based approach, the common feature of all these function classes is
the existence of low-rank tensor approximations to the grid-based discretization of the
RBF p(‖x‖) = p(x1 , . . . , xd ), x ∈ ℝd , where we set r = ‖x‖. We can add to the above exam-
ples a few examples of traditional RBFs functions commonly used in quantum chem-
istry, such as the Coulomb potential 1/r, Slater function exp(−λr), Yukawa potential
exp(−λr)/r, and the class of Matérn RBFs, traditionally applied in stochastic modeling
[219, 206]. Other examples are given by the Lennard-Jones (Van der Waals), dipole–
dipole interaction, and Stokeslet potentials (see [205]), given by p(r) = 4ϵ[( σr )12 −( σr )6 ],
p(r) = r13 , and 3 × 3 matrix P(‖x‖) = I/r + (xxT )/r 3 for x ∈ ℝ3 , respectively.
In the context of numerical data modeling, we shall focus on the following com-
putational tasks:
(A) Fixed coefficient vector c = (c1 , . . . , cN )T ∈ ℝN : the efficient representation and
storage of the interpolant in (15.20), sampled on fine tensor grid in ℝd , that al-
lows the O(1)-fast point evaluation of PN in the whole volume Ω and computation
of various integral-differential operations on that interpolant, such as gradients,
forces, scalar products, convolution integrals, etc.
(B) Finding the coefficient vector c that solves the interpolation problem (15.19).

We look on the problems (A) and (B) with the intent to apply the RS tensor representa-
tion to the interpolant PN (x). The point is that the representation (15.20) can be viewed
262 | 15 Range-separated tensor format for many-particle systems

as the many-particle interaction potential (with charges cj ) considered in the previ-


ous sections. Hence, the RS tensor approximation can be successfully applied if the
d-dimensional tensor approximating the RBF p(‖x‖), x ∈ ℝd , on tensor grid allows
the low-rank canonical representation that can be split into the short- and long-range
parts. This can be proven for functions listed above (see the example in Section 6.1
for the Newton kernel 1/‖x‖). Notice that the Gaussian is already the rank-1 separable
function.
Problem (A). We consider the particular choice of the set 𝒳 ⊂ [0, 1]d , which
can be represented by using the nearly optimal point sampling. The so-called op-
timal point sets give rise to the trade-off between the separation distance q𝒳 =
mins∈𝒳 minsν ∈𝒳 \s d(sν , s) (see (15.9)) and the fill distance h𝒳 ,Ω = maxy∈Ω d(𝒳 , y)
thereby solving the problem (see [42])

q𝒳 /h𝒳 ,Ω → max .

We choose the set of points 𝒳 as a subset of the n⊗ square grid Ωh with the mesh-
size h = 1/(n − 1), such that the separation distance satisfies σ∗ = q𝒳 ≥ αh, α ≥ 1. Here,
N ≤ N0 = nd . The square grid Ωh is an example of the almost optimal point set (see the
discussion in [141]). The construction below also applies to nonuniform rectangular
grids.
Now, we are in a position to apply the RS tensor representation to the total inter-
polant PN . Let PR be the n × n × n (say, for d = 3) rank-R tensor representing the RBF
p(‖ ⋅ ‖), which allows the RS splitting by (15.3) generating the global RS representation
(15.10). Then PN can be represented by the tensor PN in the RS-Tucker (15.17) or RS-
canonical (15.16) formats. The storage cost scales linear in both N and n, O(N + dRl n).
Problem (B). The interpolation problem (15.19) reduces to solve the linear system
of equations for unknown coefficient vector c = (c1 , . . . , cN )T ∈ ℝN ,

Ap,𝒳 c = f, where Ap,𝒳 = [p(‖xi − xj ‖)]1≤i,j≤N ∈ ℝN×N (15.21)

with the symmetric matrix Ap,𝒳 . Here, without loss of generality, we assume that the
RBF p(‖ ⋅ ‖) is continuous. The solvability conditions for the linear system (15.21) with
the matrix Ap,𝒳 are discussed, for example, in [42]. We consider two principal cases.
Case (A). We assume that the point set 𝒳 coincides with the set of grid-points in Ωh ,
that is, N = nd . Introducing the d-tuple multi-index i = (i1 , . . . , id ) and j = (j1 , . . . , jd ),
we reshape the matrix Ap,𝒳 into the tensor form
d
Ap,𝒳 󳨃→ A = [a(i1 , j1 , . . . , id , jd )] ∈ ⨂ ℝn×n ,
ℓ=1

which corresponds to folding of an N-vector to a d-dimensional n⊗d tensor. This d-level


Toeplitz matrix is generated by the tensor PR obtained by collocation of the RBF p(‖ ⋅ ‖)
15.3 Outline of possible applications | 263

on the grid Ωh . Splitting the rank-R canonical tensor PR into a sum of short- and long-
range terms
Rl
PR = PRs + PRl with PRl = ∑ p(1)
k
⊗ ⋅ ⋅ ⋅ ⊗ p(d)
k
k=1

allows representing the matrix A in the RS form as a sum of low-rank canonical tensors
A = ARs + ARl . Here, the first one corresponds to the diagonal (nearly diagonal in the
case of “soft” separation strategy) matrix by assumption on the locality of PRs . The
second matrix takes the form of Rl -term Kronecker product sum

Rl
ARl = ∑ A(1)
k
⊗ ⋅ ⋅ ⋅ ⊗ A(d)
k
,
k=1

where each “univariate” matrix A(ℓ)k


∈ ℝn×n , ℓ = 1, . . . , d, takes the symmetric Toeplitz
form generated by the first column vector p(ℓ)
k
. The storage complexity of the resultant
RS representation to the matrix A is estimated by O(N + dRl n).
Now, we let the coefficient vector c ∈ ℝN be represented as the d-dimensional n⊗d
tensor c 󳨃→ C ∈ ℝn . Then the matrix vector multiplication AC = (ARs + ARl )C im-
⊗d

plemented in tensor formats can be accomplished in O(cN + dRl N log n) operations,


that is, with the asymptotically optimal cost in the number of sampling points N. The
reason is that the matrix ARs has the diagonal form, whereas the matrix–vector mul-
tiplication between Toeplitz matrices A(ℓ) k
constituting the Kronecker factors ARl and
the corresponding n-columns (fibers) of the tensor C can be implemented by 1D FFT
in O(n log n) operations. One can customarily enhance this scheme by introducing the
low-rank tensor structure for the target vector (tensor) C.
Case (B). This construction can be generalized to the situation when 𝒳 is a subset
of Ωh , i. e., N < nd . In this case, the complexity again scales linearly in N if N = O(nd ).
When N ≪ nd , the matrix–vector operation applies to the vector C that vanishes be-
yond the small set 𝒳 . In this case, the corresponding block-diagonal sub-matrices in
A(ℓ)
k
loose the Toeplitz form, thus resulting in the slight increase in the overall cost
O(N 1+1/d ).
In both cases (A) and (B) the new rank-structured matrix construction can be
applied within any favorable preconditioned iteration for solving the linear system
(15.21).

15.3.2 Interaction energy for many-particle systems

Consider the calculation of the interaction energy (IE) for a charged multiparticle
system. In the case of lattice-structured systems, the fast tensor-based computation
scheme for IE was described in [152]. Here we follow [24].
264 | 15 Range-separated tensor format for many-particle systems

Recall that the interaction energy of the total electrostatic potential generated by
the system of N charged particles located at xk ∈ ℝ3 (k = 1, . . . , N) is defined by the
weighted sum

1 N N
zk
EN = EN (x1 , . . . , xN ) = ∑ zj ∑ , (15.22)
2 j=1 k=1,k=j̸ ‖xj − xk ‖

where zk denotes the particle charge. Letting σ > 0 be the minimal physical dis-
tance between the centers of particles, we arrive at the σ-separable systems (see
Definition 15.1). The double sum in (15.22) applies only to the particle positions
‖xj − xk ‖ ≥ σ. Hence, the quantity in (15.22) is computable also for singular kernels
such as p(r) = 1/r.
We observe that the quantity of interest EN can be recast in terms of the intercon-
nection matrix Ap,𝒳 defined by (15.21) with p(r) = 1/r, 𝒳 = {x1 , . . . , xN },

1
EN = ⟨(Ap,𝒳 − diag Ap,𝒳 )z, z⟩, where z = (z1 , . . . , zN )T . (15.23)
2
Hence, EN can be calculated by using the approach already addressed in the previous
section.
To fix the idea, we recall that the reference canonical tensor PR approximating the
single Newton kernel on an n×n×n tensor grid Ωh in the computational box Ω = [−b, b]3
is represented by (6.6), where h > 0 is the fine mesh size. For ease of exposition, we
further assume that the particle centers xk are located exactly at some grid points in
Ωh (otherwise, an additional approximation error may be introduced) such that each
point xk inherits some multi-index ik ∈ ℐ , and the origin x = 0 corresponds to the
central point n0 = (n/2, n/2, n/2) on the grid. In turn, the canonical tensor P0 approxi-
mating the total interaction potential PN (x) (x ∈ Ω) for the N-particle system,
N
zk
PN (x) = ∑ ⇝ P0 = Ps + Pl ∈ ℝn×n×n ,
k=1
‖x − xk ‖

is represented by (15.10) as a sum of short- and long-range tensor components. Now,


the tensor P0 = P0 (xh ) can be defined as a function of discrete variable x h at each
point xh ∈ Ωh and, in particular, in the vicinity of each particle center xk , that is,
at the grid-points xk + he, where the directional vector e = (e1 , e2 , e3 )T is specified
by some choice of 3D coordinates eℓ ∈ {−1, 0, 1} for ℓ = 1, 2, 3. This allows intro-
ducing the useful notations P0 (xk + he), which can be applied to all tensors living
on Ωh .
The following lemma describes the tensor scheme for calculating EN by utilizing
the long-range part Pl only in the tensor representation of PN (x).

Lemma 15.15 ([24]). Let the effective support of the short-range components in the ref-
erence potential PR not exceed σ > 0. Then the interaction energy EN of the N-particle
system can be calculated by using only the long-range part in the total potential sum
15.3 Outline of possible applications | 265

1 N
EN = EN (x1 , . . . , xN ) = ∑ z (P (x ) − zj PRl (x = 0)) (15.24)
2 j=1 j l j

in O(dRl N) operations, where Rl is the canonical rank of the long-range component.

Proof. Similarly to [152], where the case of lattice-structured systems was analyzed, we
show that the interior sum in (15.22) can be obtained from the tensor P0 traced onto
the centers of particles xk , where the term corresponding to xj = xk is removed:

N
zk
∑ ⇝ P0 (xj ) − zj PR (x = 0).
k=1,k =j̸
‖xj − xk ‖

Here, the value of the reference canonical tensor PR (see (6.6)) is evaluated at the origin
x = 0, i. e., corresponding to the multi-index n0 = (n/2, n/2, n/2). Hence, we arrive at
the tensor approximation

1 N
EN ⇝ ∑ z (P (x ) − zj PR (x = 0)). (15.25)
2 j=1 j 0 j

Now, we split P0 into the long-range part (15.11) and the remaining short-range po-
tential to obtain P0 (xj ) = Ps (xj ) + Pl (xj ), and the same for the reference tensor PR . By
assumption, the short-range part Ps (xj ) at point xj in (15.25) consists only of the local
term PRs (x = 0) = zj PR (x = 0). Due to the corresponding cancellations in the right-
hand side of (15.25), we find that EN depends only on Pl , leading to the final tensor
representation in (15.24).
We arrive at the linear complexity scaling O(dRl N) taking into account the O(dRl )
cost of the point evaluation for the canonical tensor Pl .

Table 15.4 presents the error of energy computation by (15.25) by using the RS ten-
sor format with Rl = 14 and Rs = 13.

Table 15.4: Absolute and relative errors in the interaction energy of N-particle clusters computed by
RS-tensor approximation with Rl = 14 (Rs = 13).

grid size, h N 100 200 400 782


Exact EN −8.4888 −18.1712 −35.9625 −90.2027
81923 , 6.8 ⋅ 10−3 (EN − EN,T )/EN 10−4 2 ⋅ 10−4 2 ⋅ 10−4 10−4
16 3843 , 3.4 ⋅ 10−3 (EN − EN,T )/EN 2 ⋅ 10−4 10−4 10−4 10−5

Table 15.5 represents the approximation error in EN computed by RS tensor represen-


tation (15.24) for the different values of system size. The grid size is fixed to n3 = 40963
and h = 0.0137; the canonical rank for the reference tensor is R = 29. The short-range
part of the RS tensor is taken as Rs = 10.
266 | 15 Range-separated tensor format for many-particle systems

Table 15.5: Error in the interaction energy of clusters of N particles computed by the RS tensor ap-
proach (Rs = 10).

N 200 300 400 500 600 700


Exact EN −17.91 −26.47 −35.56 −47.1009 −62.32 −77.47
(EN − EN,T )/EN 6 ⋅ 10−5 9 ⋅ 10−7 3.8 ⋅ 10−5 2.4 ⋅ 10−4 3.0 ⋅ 10−4 2.0 ⋅ 10−4

Table 15.6 shows the results for several clusters of particles generated by random as-
signment of charges zj to finite lattices of sizes 83 , 123 , 16 × 16 × 8, and 163 . Newton
kernel is approximated with εN = 10−4 on the grid of size 40963 with the rank R = 25.
Computation of the interaction energy was performed using the only long-range part
with Rl = 12. For the rank reduction, the multigrid C2T algorithm is applied [174], with
the rank truncation parameters εC2T = 10−5 , and εT2C = 10−6 . The box size is about
40 × 40 × 40 atomic units with mesh size h = 0.0098.

Table 15.6: Errors in the interaction energy of clusters of N particles computed by RS tensor approxi-
mation with the long-range rank parameter Rl = 12 (Rs = 13).

N of particles 512 1728 2048 4096


Exact EN 51.8439 −133.9060 −138.5562 −207.8477
(EN − EN,T )/EN 0.0022 0.001 0.0016 0.001

Table 15.6 illustrates that the relative accuracy of energy calculations by using the RS
tensor format remains of order 10−3 almost independent of the cluster size. Tucker
ranks only slightly increase with the system size N. The computation time for the ten-
sor Pl remains almost constant, whereas the point evaluations time for this tensor
(with pre-computed data) increases linearly in N (see Lemma 15.15).

15.3.3 Gradients and forces

Computation of electrostatic forces and gradients of interaction potential in multipar-


ticle systems is a computationally extensive problem. The algorithms based on Ewald
summation technique was discussed in [63, 133]. We describe the alternative approach
using RS tensor format proposed in [24].
First, we consider the computation of gradients. Given an RS-canonical tensor A
as in (15.16) with the width parameter γ > 0, the discrete gradient ∇h = (∇1 , . . . , ∇d )T
applied to the long-range part in A at the grid points of Ωh can be calculated simulta-
neously as the R-term canonical tensor by applying the simple one-dimensional finite-
15.3 Outline of possible applications | 267

difference (FD) operations to the long-range part of A = As + Al ,


R
T
∇h Al = ∑ ξk (G(1)
k
, . . . , G(d)
k
) , (15.26)
k=1

with tensor entries

G(ℓ)
k
= u(1)
k
⊗ ⋅ ⋅ ⋅ ⊗ ∇ℓ u(ℓ)
k
⊗ ⋅ ⋅ ⋅ ⊗ u(d)
k
,

where ∇ℓ (ℓ = 1, . . . , d) is the univariate FD differentiation scheme (by using backward


or central differences). Numerical complexity of the representation (15.26) can be es-
timated by O(dRn), provided that the canonical rank is almost uniformly bounded in
the number of particles. The gradient operator applies locally to each short-range term
in (15.16), which amounts to complexity O(dR0 γN).
The gradient of an RS-Tucker tensor can be calculated in a completely similar way.
Furthermore, in the setting of Section 15.3.2, the force vector Fj on the particle
j is obtained by differentiating the electrostatic potential energy EN (x1 , . . . , xN ) with
respect to xj ,
𝜕
Fj = − E = −∇|xj EN ,
𝜕xj N

which can be calculated explicitly (see [133]) in the form


N xj − xk
1
Fj = zj ∑ zk .
2 k=1,k=j̸ ‖xj − xk ‖3

The Ewald summation technique for force calculations was presented in [64, 133]. In
principle, it is possible to construct the RS tensor representation for this vector field
directly by using the radial basis function p(r) = 1/r 2 .
However, here we describe the alternative approach based on numerical differ-
entiation of the energy functional by using RS tensor representation of the N-particle
interaction potential on fine spacial grid. The differentiation in RS-tensor format with
respect to xj is based on the explicit representation (15.24), which can be rewritten in
the form
1 N
EN (x1 , . . . , xN ) = ÊN (x1 , . . . , xN ) − (∑ zj2 )PRl (x = 0), (15.27)
2 j=1

where ÊN (x1 , . . . , xN ) = 21 ∑Nj=1 zj Pl (xj ) denotes the “non-calibrated” interaction energy
with the long-range tensor component Pl . In the following discussion, for definiteness,
we set j = N. Since the second term in (15.27) does not depend on the particle positions,
it can be omitted in calculation of variations in EN with respect to xN . Hence, we arrive
at the representation for the first difference in direction ei , i = 1, 2, 3,

EN (x1 , . . . , xN ) − EN (x1 , . . . , xN − hei ) = ÊN (x1 , . . . , xN ) − ÊN (x1 , . . . , xN − hei ).


268 | 15 Range-separated tensor format for many-particle systems

The straightforward implementation of the above relation for three different values of
e1 = (1, 0, 0)T , e2 = (0, 1, 0)T , and e3 = (0, 0, 1)T is reduced to the four calls of the basic
procedure for computation the tensor Pl corresponding to four different dispositions
of points x1 , . . . , xN leading to the cost of order O(dRn).
However, the factor four can be reduced to merely one, taking into account that the
two canonical/Tucker tensors Pl computed for particle positions (x1 , . . . , xN−1 , xN ) and
(x1 , . . . , xN−1 , xN −he) differ in a small part (since the positions x1 , . . . , xN−1 remain fixed).
This requires only minor modifications compared with repeating the full calculation
of ÊN (x1 , . . . , xN ).

15.3.4 Regularization scheme for the Poisson–Boltzmann equation

Following [24], we describe the application scheme to the Poisson–Boltzmann equa-


tion (PBE) commonly used for numerical modeling of the electrostatic potential of pro-
teins [135, 209].
Consider a solvated biomolecular system modeled by dielectrically separated
domains with singular Coulomb potentials distributed in the molecular region. For
schematic representation, we consider the system occupying a rectangular domain Ω
with boundary 𝜕Ω (see Figure 15.11). The solute (molecule) region is represented by
Ωm , and the solvent region by Ωs .

Figure 15.11: Computational domain for PBE.

The linearized Poisson–Boltzmann equation takes the form (see [209])

− ∇ ⋅ (ϵ∇u) + κ2 u = ρf in Ω, (15.28)

where u denotes the target electrostatic potential of a protein, and ρf = ∑Nk=1 zk δ(‖x −
xk ‖) is the scaled singular charge distribution supported at points xk in Ωm , where δ
is the Dirac delta. Here, ϵ = 1 and κ = 0 in Ωm , whereas in the solvent region Ωs , we
have κ ≥ 0 and ϵ ≤ 1. The boundary conditions on the external boundary 𝜕Ω can be
specified depending on the particular problem setting. For definiteness, we impose
15.3 Outline of possible applications | 269

the simplest Dirichlet boundary condition u|𝜕Ω = 0. The interface conditions on the
interior boundary Γ = 𝜕Ωm arise from the dielectric theory:

𝜕u
[u] = 0, [ϵ ] on Γ. (15.29)
𝜕n

The practically useful solution methods for the PBE are based on regularization
schemes aimed at removing the singular component from the potentials in the govern-
ing equation. Among others, we consider one of the most commonly used approaches
based on the additive splitting of the potential only in the molecular region Ωm (see
[209]). To that end, we introduce the additive splitting

u = ur + us , where us = 0 in Ωs ,

where the singular component satisfies the equation

− ϵm Δus = ρf in Ωm ; us = 0 on Γ. (15.30)

Now, equation (15.28) can be transformed to that for the regular potential ur :

−∇ ⋅ (ϵ∇ur ) + κ 2 ur = ρf in Ω, (15.31)
r s
[ur ] = 0,
𝜕u 𝜕u
[ϵ ] = −ϵm on Γ.
𝜕n 𝜕n

To facilitate solving equation (15.30) with singular data, we define the singular poten-
tial U in the free space by

ϵm ΔU = ρf in ℝ3

and introduce its restriction U s onto Ωm ,

U s = U|Ω in Ωm ; Us = 0 in Ωs .
m

Then we have us = U s + uh , where a harmonic function uh compensates the disconti-


nuity of U s on Γ,

Δuh = 0 in Ωm ; uh = −U s on Γ.

The advantage of this formulation is due to:


(a) the absence of singularities in the solution ur and
(b) the localization of the solution splitting only on the domain Ωm .
Calculating the singular potential U, which may include a sum of hundreds or
even thousands of single Newton kernels in 3D, leads to a challenging computational
problem. In the considered approach, it can be represented on large tensor grids with
controlled precision by using the range separated tensor formats described above. The
270 | 15 Range-separated tensor format for many-particle systems

long-range component in the formatted parametrization remains smooth and allows


global low-rank representation.
Notice that the short-range part in the tensor representation of U does not con-
tribute to the right-hand side in the interface conditions on Γ in equation (15.31). This
crucial simplification is possible since the physical distance between the atomic cen-
ters in protein modeling is bounded from below by the fixed constant σ > 0, whereas
the effective support of the localized parts in the tensor representation of U can be
chosen as the half of σ.
Moreover, all normal derivatives can be easily calculated by differentiation of uni-
variate canonical vectors in the long-range part of the electrostatic potential U pre-
computed a on fine tensor grid in ℝ3 (see Section 15.3.3). Hence, the numerical cost
to build up the interface conditions in (15.31) becomes negligible compared with the
solution of the equation (15.31). We conclude with the following:

Proposition 15.16. Let the effective support of the short-range components in the refer-
ence potential PR be not larger than σ/2. Then the interface conditions in the regularized
formulation (15.31) of the PBE depend only on the low-rank long-range component in the
free-space electrostatic potential of the system. The numerical cost to build up the inter-
face conditions on Γ in (15.31) does not depend on the number of particles N.

An important characterization of a protein molecule is the electrostatic solvation


energy [209], which is the difference between the electrostatic-free energy in the sol-
vated state (described by the PBE) and the electrostatic-free energy in the absence of
solvent, that is, EN . Now, the electrostatic solvation energy can be computed in the
framework of the new regularized formulation (15.31) of PBE.
The particular numerical schemes for solving the PBE by using the RS tensor for-
mat are considered in [26]. An accurate tensor representation of the right-hand side
in the PBE can be modelled by using the range-separated splitting of the Dirac delta
introduced in [172].
Bibliography
[1] P.-A. Absil, R. Mahoni, and R. Sepulchre. Optimization Algorithms on Matrix Manifolds.
Princeton University Press, Princeton, 2008.
[2] E. Acar, T. G. Kolda, and D. M. Dunlavy. A scalable optimization approach for fitting canonical
tensor decompositions. J. Chemom., 25 (2), 67–86, 2011.
[3] J. Almlöf. Direct methods in electronic structure theory. In D. R. Yarkony , ed., Modern
Electronic Structure Theory, vol. II, World Scientific, Singapore, pp. 110–151, 1995.
[4] F. Aquilante, L. Gagliardi, T. B. Pedersen, and R. Lindh. Atomic Cholesky decompositions:
a root to unbiased auxiliary basis sets for density fitting approximation with tunable accuracy
and efficiency. J. Chem. Phys., 130, 154107, 2009.
[5] D. Z. Arov and I. P. Gavrilyuk. A method for solving initial value problems for linear differential
equations in Hilbert space based on the Cayley transform. Numer. Funct. Anal. Optim.,
14 (5–6), 456–473, 1993.
[6] P. Y. Ayala and G. E. Scuseria. Linear scaling second-order Møller–Plesset theory in the atomic
orbital basis for large molecular systems. J. Chem. Phys., 110 (8), 3660–3671, 1999.
[7] M. Bachmayr. Adaptive Low-Rank Wavelet Methods and Applications to Two-Electron
Schrödinger Equations. PhD dissertation, RWTH Aachen, 2012.
[8] M. Bachmayr and W. Dahmen. Adaptive near-optimal rank tensor approximation for
high-dimensional operator equations. Found. Comput. Math., 15 (4), 2015.
[9] B. W. Bader and T. G. Kolda. Algorithm 862: MATLAB tensor classes for fast algorithm
prototyping. ACM Trans. Math. Softw., 32 (4), 2006.
[10] J. Ballani and L. Grasedyck. A projection method to solve linear systems in tensor format.
Numer. Linear Algebra Appl., 20 (1), 27–43, 2013.
[11] J. Ballani, L. Grasedyck, and M. Kluge. Black box approximation of tensors in hierarchical
Tucker format. Linear Algebra Appl., 428, 639–657, 2013.
[12] M. Barrault, E. Cancés, W. Hager, and C. Le Bris. Multilevel domain decomposition for
electronic structure calculations. J. Comput. Phys., 222, 86–109, 2007.
[13] P. Baudin, J. Marin, I. G. Cuesta, and A. M. S.de Meras. Calculation of excitation energies from
the CC2 linear response theory using Cholesky decomposition. J. Chem. Phys., 140, 104111,
2014.
[14] M. Bebendorf. Adaptive cross approximation of multivariate functions. Constr. Approx. 34 (2),
149–179, 2011.
[15] M. Bebendorf and S. Rjasanow. Adaptive low-rank approximation of collocation matrices.
Computing, 70 (1), 1–24, 2003.
[16] T. Beck. Real-space mesh techniques in density-functional theory. Rev. Mod. Phys., 72,
1041–1080, 2000.
[17] N. H. F. Beebe and J. Linderberg. Simplifications in the generation and transformation of
two-electron integrals in molecular calculations. Int. J. Quant. Chem., 12 (7), 683–705,
1977.
[18] R. E. Bellman. Dynamic Programming. Princeton University Press, Princeton, 1957.
[19] P. Benner, V. Mehrmann, and H. Xu. A new method for computing the stable invariant
subspace of a real Hamiltonian matrix. J. Comput. Appl. Math., 86, 17–43, 1997.
[20] P. Benner, H. Faßbender, and M. Stoll. Solving large-scale quadratic eigenvalue problems
with Hamiltonian eigenstructure using a structure-preserving Krylov subspace method.
Electron. Trans. Numer. Anal., 29, 212–229, 2008.
[21] P. Benner, A. Onwunta, and M. Stoll. Low-rank solution of unsteady diffusion equations with
stochastic coefficients. SIAM/ASA J. Uncertain. Quantificat., 3 (1), 622–649, 2015.

https://doi.org/10.1515/9783110365832-016
272 | Bibliography

[22] P. Benner, H. Faßbender, and C. Yang. Some remarks on the complex J-symmetric
eigenproblem. Preprint, Max Planck Institute Magdeburg, MPIMD/15-12, July 2015,
http://www2.mpi-magdeburg.mpg.de/preprints/2015/12/
[23] P. Benner, V. Khoromskaia, and B. N. Khoromskij. A reduced basis approach for calculation
of the Bethe–Salpeter excitation energies using low-rank tensor factorizations. Mol. Phys.,
114 (7–8), 1148–1161, 2016.
[24] P. Benner, V. Khoromskaia, and B. N. Khoromskij. Range-separated tensor formats for
numerical modeling of many-particle interaction potentials. arXiv:1606.09218 (39 pp.), 2016.
[25] P. Benner, S. Dolgov, V. Khoromskaia, and B. N. Khoromskij. Fast iterative solution of the
Bethe–Salpeter eigenvalue problem using low-rank and QTT tensor approximation. J. Comput.
Phys., 334, 221–239, 2017.
[26] P. Benner, V. Khoromskaia, B. N. Khoromskij, C. Kweyu, and M. Stein. Application of the
range-separated tensor format in solution of the Poisson–Boltzmann equation. Manuscript,
2017.
[27] P. Benner, V. Khoromskaia, B. N. Khoromskij, and C. Yang. Computing the density of states for
optical spectra by low-rank and QTT tensor approximation. arXiv:1801.03852, 2017.
[28] P. Benner, V. Khoromskaia, and B. N. Khoromskij. Range-separated tensor format for
many-particle modeling. SIAM J. Sci. Comput., 40 (2), A1034–A1062, 2018.
[29] A. Bensoussan, J.-L. Lions, and G. Papanicolaou. Asymptotic Analysis for Periodic Structures.
North-Holland, Amsterdam, 1978.
[30] C. Bertoglio, and B. N. Khoromskij. Low-rank quadrature-based tensor approximation of the
Galerkin projected Newton/Yukawa kernels. Comput. Phys. Commun., 183 (4), 904–912, 2012.
[31] G. Beylkin and M. J. Mohlenkamp. Numerical operator calculus in higher dimensions. Proc.
Natl. Acad. Sci. USA, 99, 10246–10251, 2002.
[32] G. Beylkin and M. J. Mohlenkamp. Algorithms for numerical analysis in high dimension. SIAM
J. Sci. Comput., 26 (6), 2133–2159, 2005.
[33] G. Beylkin, M. J. Mohlenkamp, and F. Pérez. Approximating a wavefunction as an
unconstrained sum of Slater determinants. J. Math. Phys., 49, 032107, 2008.
[34] G. Beylkin, J. Garcke, and M. J. Mohlenkamp, Multivariate regression and machine learning
with sums of separable functions. SIAM J. Sci. Comput., 31 (3), 1840–1857, 2009.
[35] F. A. Bischoff, E. F. Valeev. Computing molecular correlation energies with guaranteed
precision. J. Chem. Phys., 139 (11), 114106, 2013.
[36] T. Blesgen, V. Gavini, and V. Khoromskaia. Tensor product approximation of the electron
density of large aluminium clusters in OFDFT. J. Comput. Phys., 231 (6), 2551–2564, 2012.
[37] A. Bloch. Les theoremes de M. Valiron sur les fonctions entieres et la theorie
de l’uniformisation. Ann. Fac. Sci. Univ. Toulouse, 17 (3), 1–22, 1925, ISSN 0240-2963.
[38] S. F. Boys, G. B. Cook, C. M. Reeves, and I. Shavitt. Automatic fundamental calculations of
molecular structure. Nature, 178, 1207–1209, 1956.
[39] D. Braess. Nonlinear Approximation Theory. Springer-Verlag, Berlin, 1986.
[40] D. Braess. Asymptotics for the approximation of wave functions by exponential-sums.
J. Approx. Theory, 83, 93–103, 1995.
[41] S. Brenner and R. Scott. The Mathematical Theory of Finite Element Methods. Springer, Berlin,
1994.
[42] M. D. Buhmann. Radial Basis Functions. Cambridge University Press, Cambridge, 2003.
[43] H. J. Bungartz, and M. Griebel. Sparse grids. Acta Numer., 1–123, 2004.
[44] E. Cancés and C. Le Bris. On the convergence of SCF algorithms for the Hartree–Fock
equations. ESAIM: M2AN, 34 (4), 749–774, 2000.
[45] E. Cancés and C. Le Bris. Mathematical modeling of point defects in materials science. Math.
Models Methods Appl. Sci., 23, 1795–1859, 2013.
Bibliography | 273

[46] E. Cancés, A. Deleurence, and M. Lewin. A new approach to the modeling of local defects in
crystals: the reduced Hartree–Fock case. Commun. Math. Phys., 281, 129–177, 2008.
[47] E. Cancés, V. Ehrlacher, and Y. Maday. Periodic Schrödinger operator with local defects and
spectral pollution. SIAM J. Numer. Anal., 50 (6, 3016–3035, 2012.
[48] E. Cancés, V. Ehrlacher, and T. Leliévre. Greedy algorithms for high-dimensional
non-symmetric linear problems. ESAIM Proc. 41, 95–131, 2013.
[49] J. D. Carrol and J. Chang. Analysis of individual differences in multidimensional scaling via an
N-way generalization of ‘Eckart–Young’ decomposition. Psychometrika 35, 283–319, 1970.
[50] J. D. Carrol, S. Pruzansky, and J. B. Kruskal. CANDELINC: A general approach to
multidimensional analysis of many-way arrays with linear constraints on parameters.
Psychometrika, 45, 3–24, 1980.
[51] M. E. Casida. Time-dependent density-functional response theory for molecules. In
D. P. Chong, ed., Recent Advances in Density Functional Methods, Part I, World Scientific,
Singapore, 155–192, 1995.
[52] S. R. Chinnamsetty, M. Espig, W. Hackbusch, B. N. Khoromskij, and H.-J. Flad. Kronecker
tensor product approximation in quantum chemistry. J. Chem. Phys., 127, 084110, 2007.
[53] P. G. Ciarlet and C. Le Bris, eds. Handbook of Numerical Analysis, vol. X, Computational
Chemistry. Elsevier, Amsterdam, 2003.
[54] A. Cichocki and Sh. Amari. Adaptive Blind Signal and Image Processing: Learning Algorithms
and Applications. Wiley, New York, 2002.
[55] A. Cichocki, N. Lee, I. Oseledets, A. H. Pan, Q. Zhao, and D. P. Mandic. Tensor networks
for dimensionality reduction and large-scale optimization: Part 1 low-rank tensor
decompositions. Found. Trends Mach. Learn., 9 (4–5), 249–429, 2016.
[56] C. Cramer and D. Truhlar. Density functional theory for transition metals and transition metal
chemistry. Phys. Chem. Chem. Phys., 11 (46), 10757–10816, 2009.
[57] W. Dahmen, R. Devore, L. Grasedyck, E. Süli. Tensor-sparsity of solutions to high-dimensional
elliptic partial differential equations. Found. Comput. Math., 16 (4), 813–874, 2016.
[58] T. Darten, D. York, and L. Pedersen. Particle mesh Ewald: an O(N log N) method for Ewald
sums in large systems. J. Chem. Phys., 98, 10089–10091, 1993.
[59] L. De Lathauwer. Signal Processing Based on Multilinear Algebra. PhD thesis, Katholeke
Universiteit Leuven, 1997.
[60] L. De Lathauwer, B. De Moor, and J. Vandewalle. On the best rank-1 and rank-(R1 , . . . , RN )
approximation of higher-order tensors. SIAM J. Matrix Anal. Appl., 21, 1324–1342, 2000.
[61] L. De Lathauwer, B. De Moor, and J. Vandewalle. A multilinear singular value decomposition.
SIAM J. Matrix Anal. Appl., 21, 1253–1278, 2000.
[62] V. De Silva and L.-H. Lim. Tensor rank and the ill-posedness of the best low-rank
approximation problem. SIAM J. Matrix Anal. Appl., 30 (3), 1084–1127, 2008.
[63] M. Deserno and C. Holm. How to mesh up Ewald sums. I. A theoretical and numerical
comparison of various particle mesh routines. J. Chem. Phys., 109 (18), 7678–7693, 1998.
[64] M. Deserno and C. Holm. How to mesh up Ewald sums. II. A theoretical and numerical
comparison of various particle mesh routines. J. Chem. Phys., 109 (18), 7694–7701, 1998.
[65] S. Dolgov. Tensor Product Methods in Numerical Simulation of High-Dimensional
Dynamical Problems. PhD thesis, University of Leipzig, 2014.
http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-151129
[66] S. V. Dolgov, and B. N. Khoromskij. Two-level Tucker-TT-QTT format for optimized tensor
calculus. SIAM J. Matrix Anal. Appl., 34 (2),593–623, 2013.
[67] S. Dolgov, and B. N. Khoromskij. Simultaneous state-time approximation of the chemical
master equation using tensor product formats. Numer. Linear Algebra Appl., 22 (2), 197–219,
2015.
274 | Bibliography

[68] S. V. Dolgov, B. N. Khoromskij, and I. Oseledets. Fast solution of multi-dimensional parabolic


problems in the TT/QTT formats with initial application to the Fokker–Planck equation. SIAM J.
Sci. Comput., 34 (6), A3016–A3038, 2012.
[69] S. V. Dolgov, B. N. Khoromskij, and D. Savostyanov. Superfast Fourier transform using QTT
approximation. J. Fourier Anal. Appl., 18 (5), 915–953, 2012.
[70] S. Dolgov, B. N. Khoromskij, D. Savostyanov, and I. Oseledets. Computation of extreme
eigenvalues in higher dimensions using block tensor train format. Comput. Phys. Commun.,
185 (4), 1207–1216, 2014.
[71] S. Dolgov, B. N. Khoromskij, A. Litvinenko, and H. G. Matthies. Computation of the response
surface in the tensor train data format. SIAM J. Uncertain. Quantificat., 3, 1109–1135,
2015.
[72] R. Dovesi, R. Orlando, C. Roetti, C. Pisani, and V. R. Sauders. The periodic Hartree–Fock
method and its implementation in the CRYSTAL code. Phys. Status Solidi (b), 217, 63, 2000.
[73] D. A. Drabold and O. F. Sankey. Maximum entropy approach for linear scaling in the electronic
structure problem. Phys. Rev. Lett., 70, 3631–3634, 1993.
[74] F. Ducastelle and F. Cyrot-Lackmann. Moments developments and their application to the
electronic charge distribution of d bands. J. Phys. Chem. Solids, 31, 1295–1306, 1970.
[75] T. H. Dunning, Jr. Gaussian basis sets for use in correlated molecular calculations. I. The
atoms boron through neon and hydrogen. J. Chem. Phys., 90, 1007–1023, 1989.
[76] A. Durdek, S. R. Jensen, J. Juselius, P. Wind, T. Flå, and L. Frediani. Adaptive order polynomial
algorithm in a multi-wavelet representation scheme. Appl. Numer. Math., 92, 40–53, 2015.
[77] V. Ehrlacher, C. Ortner, and A. V. Shapeev. Analysis of boundary conditions for crystal defect
atomistic simulations. Arch. Ration. Mech. Anal., 222 (3), 1217–1268, 2016.
[78] L. Elden and B. Savas. A Newton–Grassmann method for computing the best multilinear
rank-(r1 , r2 , r3 ) approximation of a tensor. SIAM J. Matrix Anal. Appl., 31 (2), 248–271, 2009.
[79] P. P. Ewald. Die Berechnung optische und elektrostatischer Gitterpotentiale. Ann. Phys., 64,
253, 1921.
[80] H.-J. Flad, W. Hackbusch, and R. Schneider. Best N-term approximation in electronic structure
calculations: I. One-electron reduced density matrix. ESAIM: M2AN 40, 49–61, 2006.
[81] H.-J. Flad, B. N. Khoromskij, D. V. Savostyanov, and E. E. Tyrtyshnikov. Verification of the
cross 3d algorithm on quantum chemistry data. Russ. J. Numer. Anal. Math. Model., 4, 1–16,
2008.
[82] H.-J. Flad, R. Schneider, and B.-W. Schulze. Asymptotic regularity of solutions to Hartree–Fock
equations with Coulomb potential. Math. Methods Appl. Sci., 31 (18), 2172–2201, 2008.
[83] H.-J. Flad, W. Hackbusch, B. N. Khoromskij, and R. Schneider. Concepts of data-sparse
tensor-product approximation in many-particle modeling. In V. Olshevsky, E. Tyrtyshnikov,
eds., Matrix Methods: Theory, Algorithms, Applications (Dedicated to the Memory of Gene
Golub), World Scientific Publishing, Singapore, pp. 313–347, 2010.
[84] B. Fornberg and N. Flyer. A Primer on Radial Basis Functions with Applications to the
Geosciences. CBMS–NSF Regional Conference Series in Applied Mathematics, vol. 87, SIAM,
Philadelphia, 2015.
[85] L. Frediani and D. Sundholm. Real-space numerical grid methods in quantum chemistry.
Phys. Chem. Chem. Phys., 17, 31357–31359, 2015.
[86] L. Frediani, E. Fossgaard, T. Flå, and K. Ruud. Fully adaptive algorithms for multivariate
integral equations using the non-standard form and multiwavelets with applications to
the Poisson and bound-state Helmholtz kernels in three dimensions. Mol. Phys., 111 (9–11),
1143–1160, 2013.
[87] S. Friedland, V. Mehrmann, A. Miedlar, and M. Nkengla. Fast low rank approximation of
matrices and tensors. Electron. J. Linear Algebra, 22, 1031–1048, 2011.
Bibliography | 275

[88] M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman,


G. Scalmani, V. Barone, B. Mennucci, G. A. Petersson, et al. Gaussian Development Version
Revision H.1, Gaussian Inc., Wallingford, CT, 2009.
[89] I. P. Gavrilyuk, Super exponentially convergent approximation to the solution of the
Schrödinger equation in abstract setting. Comput. Methods Appl. Math., 10 (4), 345–358,
2010.
[90] I. V. Gavrilyuk and B. N. Khoromskij. Quantized-TT-Cayley transform to compute dynamics and
spectrum of high-dimensional Hamiltonians. Comput. Methods Appl. Math., 11 (3), 273–290,
2011.
[91] I. P. Gavrilyuk, W. Hackbusch, and B. N. Khoromskij. ℋ-matrix approximation for the operator
exponential with applications. Numer. Math., 92, 83–111, 2002.
[92] I. P. Gavrilyuk, W. Hackbusch, and B. N. Khoromskij. Data-sparse approximation to
operator-valued functions of elliptic operator. Math. Comput., 73, 1297–1324, 2003.
[93] I. P. Gavrilyuk, W. Hackbusch, and B. N. Khoromskij. Hierarchical tensor-product
approximation to the inverse and related operators in high-dimensional elliptic problems.
Computing, 74, 131–157, 2005.
[94] I. P. Gavrilyuk, W. Hackbusch, and B. N. Khoromskij. Tensor-product approximation to elliptic
and parabolic solution operators in higher dimensions. Computing, 74, 131–157, 2005.
[95] I. P. Gavrilyuk, W. Hackbusch, and B. N. Khoromskij. Data-sparse approximation to a class of
operator-valued functions. Math. Comput., 74, 681–708, 2005.
[96] I. P. Gavrilyuk, W. Hackbusch, and B. N. Khoromskij. Data-sparse approximation of a class of
operator-valued functions. Math. Comput. 74, 681–708, 2005.
[97] L. Genovese, A. Neelov, S. Goedecker, T. Deutsch, S. A. Ghasemi, A. Willand, D. Caliste,
O. Zilberberg, M. Rayson, A. Bergman, and R. Schneider. Daubechies wavelets as a basis
set for density functional pseudopotential calculations. J. Chem. Phys., 129, 014109, 2008.
[98] G. H. Golub, C. F. Van Loan. Matrix Computations, 4th edn. Johns Hopkins University Press,
Baltimore, 2013.
[99] S. A. Goreinov, E. E. Tyrtyshnikov, and N. L. Zamarashkin. A theory of pseudoskeleton
approximations. Linear Algebra Appl., 261, 1–21, 1997.
[100] L. Grasedyck. Hierarchical singular value decomposition of tensors. SIAM J. Matrix Anal.
Appl., 31, 2029–2054, 2010.
[101] L. Grasedyck. Polynomial approximation in hierarchical Tucker format by vector tensorization.
Preprint 43, DFG/SPP1324, RWTH Aachen, 2010.
[102] L. Grasedyck, D. Kressner, and C. Tobler. A literature survey of low-rank tensor approximation
techniques. GAMM-Mitt., 36 (1), 53–78, 2013.
[103] L. Greengard and V. Rochlin. A fast algorithm for particle simulations. J. Comput. Phys., 73,
325, 1987.
[104] W. H. Greub. Multilinear Algebra, 2nd edn. Springer, Berlin, 1978.
[105] M. Griebel and J. Hamaekers. Sparse grids for the Schroedinger equation. Modél. Math. Anal.
Numér., 41, 215–247, 2007.
[106] M. Griebel and J. Hamaekers. Tensor product multiscale many-particle spaces with
finite-order weights for the electronic Schrödinger equation. Z. Phys. Chem., 224, 527–543,
2010.
[107] E. Gross and W. Kohn. Time-dependent density-functional theory. Adv. Quantum Chem., 21,
255–291, 1990.
[108] W. Hackbusch. Efficient convolution with the Newton potential in d dimensions. Numer. Math.,
110 (4), 449–489, 2008.
[109] W. Hackbusch. Convolution of hp-functions on locally refined grids. IMA J. Numer. Anal., 29,
960–985, 2009.
276 | Bibliography

[110] W. Hackbusch. Tensor Spaces and Numerical Tensor Calculus. Springer, Berlin, 2012.
[111] W. Hackbusch and B. N. Khoromskij. Low-rank Kronecker product approximation to
multi-dimensional nonlocal operators. Part I. Separable approximation of multi-variate
functions. Computing, 76, 177–202, 2006.
[112] W. Hackbusch and B. N. Khoromskij. Low-rank Kronecker-product approximation to
multi-dimensional nonlocal operators. Part II. HKT representation of certain operators.
Computing, 76, 203–225, 2006.
[113] W. Hackbusch and B. N. Khoromskij. Tensor-product approximation to operators and
functions in high dimension. J. Complex., 23, 697–714, 2007.
[114] W. Hackbusch and B. N. Khoromskij. Tensor-product approximation to multi-dimensional
integral operators and Green’s functions. SIAM J. Matrix Anal. Appl., 30 (3), 1233–1253, 2008.
[115] W. Hackbusch, and S. Kühn. A new scheme for the tensor representation. J. Fourier Anal.
Appl., 15, 706–722, 2009.
[116] W. Hackbusch, B. N. Khoromskij, and E. E. Tyrtyshnikov. Hierarchical Kronecker tensor-product
approximations. J. Numer. Math., 13, 119–156, 2005.
[117] W. Hackbusch, B. N. Khoromskij, and E. Tyrtyshnikov. Approximate iteration for structured
matrices. Numer. Math., 109, 365–383, 2008.
[118] W. Hackbusch, B. N. Khoromskij, S. Sauter, and E. Tyrtyshnikov. Use of tensor formats in
elliptic eigenvalue problems. Numer. Linear Algebra Appl., 19 (1), 133–151, 2012.
[119] W. Hackbusch, and R. Schneider. Tensor spaces and hierarchical tensor representations.
In S. Dahlke, W. Dahmen, et al., eds, Lecture Notes in Computer Science and Engineering,
vol. 102, Springer, Berlin, 2014.
[120] N. Hale and L. N. Trefethen. Chebfun and numerical quadrature. Sci. China Math., 55 (9),
1749–1760, 2012.
[121] H. Harbrecht, M. Peters, and R. Schneider. On the low-rank approximation by the pivoted
Cholesky decomposition. Appl. Numer. Math., 62 (4), 428–440, 2012.
[122] R. J. Harrison, G. I. Fann, T. Yanai, Z. Gan, and G. Beylkin. Multiresolution quantum chemistry:
basic theory and initial applications. J. Chem. Phys., 121 (23), 11587–11598, 2004.
[123] D. R. Hartree. The Calculation of Atomic Structure. Wiley, New York, 1957.
[124] R. Haydock, V. Heine, and M. J. Kelly. Electronic structure based on the local atomic
environment for tight-binding bands. J. Phys. C, Solid State Phys., 5, 2845–2858, 1972.
[125] M. Head-Gordon, J. A. Pople, and M. Frisch. MP2 energy evaluation by direct methods. Chem.
Phys. Lett., 153 (6), 503–506, 1988.
[126] L. Hedin. New method for calculating the one-particle Green’s function with application to the
electron–gas problem. Phys. Rev. 139, A796, 1965.
[127] T. Helgaker, P. Jørgensen, and N. Handy. A numerically stable procedure for calculating
Møller–Plesset energy derivatives, derived using the theory of Lagrangians. Theor. Chim.
Acta, 76, 227–245, 1989.
[128] T. Helgaker, P. Jørgensen, and J. Olsen. Molecular Electronic-Structure Theory. Wiley, New
York, 1999.
[129] J. S. Hesthaven, G. Rozza, and B. Stamm. Certified Reduced Basis Methods for Parametrized
Partial Differential Equations. Springer, Berlin, 2016.
[130] N. Higham. Analysis of the Cholesky decomposition of a semi-definite matrix. In M. G. Cox and
S. J. Hammarling, eds, Reliable Numerical Computations, Oxford University Press, Oxford,
pp. 161–185, 1990.
[131] F. L. Hitchcock. The expression of a tensor or a polyadic as a sum of products. J. Math. Phys.,
6, 164–189, 1927.
[132] F. L. Hitchcock. Multiple invariants and generalized rank of a p-way matrix or tensor. J. Math.
Phys., 7, 39–79, 1927.
Bibliography | 277

[133] R. W. Hockney and J. W. Eastwood. Computer Simulation Using Particles. IOP, Bristol, 1988.
[134] E. G. Hohenstein, R. M. Parrish, and T. J. Martinez. Tensor hypercontraction density fitting.
Quartic scaling second- and third-order Møller–Plesset perturbation theory. J. Chem. Phys.,
137, 044103, 2012.
[135] M. Holst, N. Baker, and F. Wang. Adaptive multilevel finite element solution of the
Poisson–Boltzmann equation: algorithms and examples. J. Comput. Chem., 21, 1319–1342,
2000.
[136] S. Holtz, T. Rohwedder, and R. Schneider. On manifold of tensors of fixed TT-rank. Numer.
Math., 120 (4), 701–731, 2012.
[137] S. Holtz, T. Rohwedder, and R. Schneider. The alternating linear scheme for tensor
optimization in the tensor train format. SIAM J. Sci. Comput., 34 (2), A683–A713, 2012.
[138] T. Huckle, K. Waldherr, and T. Schulte-Herbrüggen. Computations in quantum tensor
networks. Linear Algebra Appl., 438, 750–781, 2013.
[139] P. H. Hünenberger. Lattice-sum methods for computing electrostatic interactions in molecular
simulations. AIP Conf. Proc., 492, 17, 1999.
[140] M. Ishteva, L. De Lathauwer, P.-A. Absil, and S. Van Huffel. Differential-geometric Newton
method for the best rank-(R1 , R2 , R3 ) approximation of tensors. Numer. Algorithms, 51 (2),
179–194, 2009.
[141] A. Iske. Multiresolution Methods in Scattered Data Modeling. Springer, Berlin, 2004.
[142] V. Kazeev, and B. N. Khoromskij. Explicit low-rank QTT representation of Laplace operator and
its inverse. SIAM J. Matrix Anal. Appl., 33 (3), 2012, 742–758.
[143] V. Kazeev, B. N. Khoromskij, and E. E. Tyrtyshnikov. Multilevel Toeplitz matrices generated by
tensor-structured vectors and convolution with logarithmic complexity. SIAM J. Sci. Comput.
35 (3), A1511–A1536, 2013.
[144] V. Kazeev, M. Khammash, M. Nip, and Ch. Schwab. Direct solution of the chemical master
equation using quantized tensor trains. PLoS Comput. Biol. 10 (3), 2014.
[145] V. Khoromskaia. Computation of the Hartree–Fock exchange in the tensor-structured format.
Comput. Methods Appl. Math., 10 (2), 1–16, 2010.
[146] V. Khoromskaia. Numerical Solution of the Hartree–Fock Equation by Multilevel
Tensor-Structured Methods. PhD dissertation, TU Berlin, 2010.
https://depositonce.tu-berlin.de/handle/11303/3016
[147] V. Khoromskaia. Black-box Hartree–Fock solver by tensor numerical methods. Comput.
Methods Appl. Math., 14 (1), 89–111, 2014.
[148] V. Khoromskaia and B. N. Khoromskij. Grid-based lattice summation of electrostatic
potentials by assembled rank-structured tensor approximation. Comput. Phys. Commun.,
185, 3162–3174, 2014.
[149] V. Khoromskaia and B. N. Khoromskij. Tucker tensor method for fast grid-based summation of
long-range potentials on 3D lattices with defects. arXiv:1411.1994, 2014.
[150] V. Khoromskaia and B. N. Khoromskij. Møller–Plesset (MP2) energy correction using tensor
factorizations of the grid-based two-electron integrals. Comput. Phys. Commun., 185, 2–10,
2014.
[151] V. Khoromskaia and B. N. Khoromskij. Tensor approach to linearized Hartree–Fock equation
for lattice-type and periodic systems. Preprint 62/2014, Max-Planck Institute for Mathematics
in the Sciences, Leipzig. arXiv:1408.3839, 2014.
[152] V. Khoromskaia and B. N. Khoromskij. Tensor numerical methods in quantum chemistry:
from Hartree–Fock to excitation energies. Phys. Chem. Chem. Phys., 17 (47), 31491–31509,
2015.
[153] V. Khoromskaia and B. N. Khoromskij. Fast tensor method for summation of long-range
potentials on 3D lattices with defects. Numer. Linear Algebra Appl., 23, 249–271, 2016.
278 | Bibliography

[154] V. Khoromskaia and B. N. Khoromskij. Block circulant and Toeplitz structures in the linearized
Hartree–Fock equation on finite lattices: tensor approach. Comput. Methods Appl. Math.,
17 (3), 431–455, 2017.
[155] V. Khoromskaia, B. N. Khoromskij, and R. Schneider. QTT representation of the Hartree and
exchange operators in electronic structure calculations. Comput. Methods Appl. Math., 11 (3),
327–341, 2011.
[156] V. Khoromskaia, D. Andrae, and B. N. Khoromskij. Fast and accurate 3D tensor calculation of
the Fock operator in a general basis. Comput. Phys. Commun., 183, 2392–2404, 2012.
[157] V. Khoromskaia, B. N. Khoromskij, and R. Schneider. Tensor-structured factorized calculation
of two-electron integrals in a general basis. SIAM J. Sci. Comput., 35 (2), A987–A1010,
2013.
[158] V. Khoromskaia, B. N. Khoromskij, and F. Otto. A numerical primer in 2D stochastic
homogenization: CLT scaling in the representative volume element. Preprint 47/2017,
Max-Planck Institute for Math. in the Sciences, Leipzig 2017.
[159] B. N. Khoromskij. Data-sparse elliptic operator inverse based on explicit approximation to the
Green function. J. Numer. Math., 11 (2), 135–162, 2003.
[160] B. N. Khoromskij. An Introduction to Structured Tensor-Product Representation of Discrete
Nonlocal Operators. Lecture Notes, vol. 27, Max-Planck Institute for Mathematics in the
Sciences, Leipzig, 2005.
[161] B. N. Khoromskij. Structured rank-(r1 , . . . , rd ) decomposition of function-related tensors in ℝd .
Comput. Methods Appl. Math., 6 (2), 194–220, 2006.
[162] B. N. Khoromskij. Structured data-sparse approximation to high order tensors arising from
the deterministic Boltzmann equation. Math. Comput., 76, 1292–1315, 2007.
[163] B. N. Khoromskij. On tensor approximation of Green iterations for Kohn–Sham equations.
Comput. Vis. Sci., 11, 259–271, 2008.
[164] B. N. Khoromskij. Tensor-structured preconditioners and approximate inverse of elliptic
operators in ℝd . Constr. Approx., 30, 599–620, 2009.
[165] B. N. Khoromskij. O(d log N)-quantics approximation of N-d tensors in high-dimensional
numerical modeling. Preprint 55/2009, Max-Planck Institute for Mathematics in the Sciences,
Leipzig 2009.
http://www.mis.mpg.de/publications/preprints/2009/prepr2009-55.html
[166] B. N. Khoromskij. Fast and accurate tensor approximation of a multivariate convolution with
linear scaling in dimension. J. Comput. Appl. Math., 234, 3122–3139, 2010.
[167] B. N. Khoromskij. O(d log N)-quantics approximation of N-d tensors in high-dimensional
numerical modeling. Constr. Approx., 34 (2), 257–289, 2011.
[168] B. N. Khoromskij. Introduction to tensor numerical methods in scientific computing. Lecture
Notes, Preprint 06-2011, University of Zuerich, Institute of Mathematics, 2011, pp. 1–238,
http://www.math.uzh.ch/fileadmin/math/preprints/06_11.pdf
[169] B. N. Khoromskij. Tensors-structured numerical methods in scientific computing: survey on
recent advances. Chemom. Intell. Lab. Syst., 110, 1–19, 2012.
[170] B. N. Khoromskij. Tensor numerical methods for high-dimensional PDEs: basic theory and
initial applications. ESAIM, 48, 1–28, 2014.
[171] B. N. Khoromskij. Tensor Numerical Methods in Scientific Computing. Research Monograph,
De Gruyter Verlag, Berlin, 2018.
[172] B. N. Khoromskij. Operator-Dependent Approximation of the Dirac Delta by Using
Range-Separated Tensor Format. Manuscript, 2018.
[173] B. N. Khoromskij and V. Khoromskaia. Low rank Tucker-type tensor approximation to classical
potentials. Cent. Eur. J. Math., 5 (3), 523–550, 2007 (Preprint 105/2006 Max-Planck Institute
for Mathematics in the Sciences, Leipzig 2006).
Bibliography | 279

[174] B. N. Khoromskij and V. Khoromskaia. Multigrid tensor approximation of function related


arrays. SIAM J. Sci. Comput., 31 (4), 3002–3026, 2009.
[175] B. N. Khoromskij and S. Miao. Superfast wavelet transform using QTT approximation. I: Haar
wavelets. Comput. Methods Appl. Math., 14 (4), 537–553, 2014.
[176] B. N. Khoromskij and I. Oseledets. Quantics-TT collocation approximation of
parameter-dependent and stochastic elliptic PDEs. Comput. Methods Appl. Math., 10 (4),
34–365, 2010.
[177] B. N. Khoromskij and I. Oseledets. DMRG+QTT approach to the computation of ground
state for the molecular Schrödinger operator. Preprint 68/2010, Max-Planck Institute for
Mathematics in the Sciences, Leipzig, 2010.
[178] B. N. Khoromskij, and I. Oseledets. Quantics-TT approximation of elliptic solution operators in
higher dimensions. Russ. J. Numer. Anal. Math. Model., 26 (3), 303–322, 2011.
[179] B. N. Khoromskij and I. Oseledets. Quantics-TT approximation of elliptic solution operators in
higher dimensions. Preprint 79/2009, Max-Planck Institute for Mathematics in the Sciences,
Leipzig 2009.
[180] B. N. Khoromskij and S. Repin. A fast iteration method for solving elliptic problems
with quasiperiodic coefficients. Russ. J. Numer. Anal. Math. Model., 30 (6), 329–344,
2015.
[181] B. N. Khoromskij and S. Repin. Rank structured approximation method for quasi-periodic
elliptic problems. Comput. Methods Appl. Math. 17 (3), 457–477, 2017.
[182] B. N. Khoromskij, and Ch. Schwab. Tensor approximation of multi-parametric elliptic
problems in SPDEs. SIAM J. Sci. Comput., 33 (1), 364–385, 2011.
[183] B. Khoromskij and A Veit. Efficient computation of highly oscillatory integrals by using QTT
tensor approximation. Comput. Methods Appl. Math., 16 (1), 145–159, 2016.
[184] B. N. Khoromskij and G. Wittum. Numerical Solution of Elliptic Differential Equations by
Reduction to the Interface. Research Monograph, LNCSE, vol. 36, Springer-Verlag, Berlin,
2004.
[185] B. N. Khoromskij, A. Litvinenko, and H. G. Matthies. Application of hierarchical matrices for
computing the Karhunen–Loéve expansion. Computing, 84, 49–67, 2009.
[186] B. N. Khoromskij, V. Khoromskaia, S. R. Chinnamsetty, and H.-J. Flad. Tensor decomposition
in electronic structure calculations on 3D Cartesian grids. J. Comput. Phys., 228, 5749–5762,
2009.
[187] B. N. Khoromskij, V. Khoromskaia, and H.-J. Flad. Numerical solution of the Hartree–Fock
equation in multilevel tensor-structured format. SIAM J. Sci. Comput., 33 (1), 45–65, 2011.
[188] B. N. Khoromskij, S. Sauter, and A. Veit. Fast quadrature techniques for retarded potentials
based on TT/QTT tensor approximation. Comput. Methods Appl. Math., 11 (3), 342–362,
2011.
[189] B. N. Khoromskij, K. K. Naraparaju, and J. Schneider. Quantized-CP approximation and sparse
tensor interpolation of function generated data. arXiv:1707.04525, 2017.
[190] A. V. Knyazev. Toward the optimal preconditioned eigensolver: locally optimal block
preconditioned conjugate gradient method. SIAM J. Sci. Comput., 23 (2), 517–541, 2001.
[191] O. Koch and Ch. Lubich. Dynamical low rank approximation. SIAM J. Matrix Anal. Appl., 29 (2),
434–454, 2007.
[192] T. Kolda. Orthogonal tensor decompositions. SIAM J. Matrix Anal. Appl., 23, 243–255, 2001.
[193] T. G. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM Rev., 51 (3),
455–500, 2009.
[194] S. Körbel, P. Boulanger, I. Duchemin, X. Blase, M. AL Marques, and S. Botti. Benchmark
many-body GW and Bethe–Salpeter calculations for small transition metal molecules.
J. Chem. Theory Comput., 10 (9), 3934–3943, 2014.
280 | Bibliography

[195] D. Kressner and C. Tobler. Preconditioned low-rank methods for high-dimensional elliptic PDE
eigenvalue problems. Comput. Methods Appl. Math., 11 (3), 363–381, 2011.
[196] D. Kressner, M. Steinlechner, and A. Uschmajew. Low-rank tensor methods with subspace
correction for symmetric eigenvalue problems. SIAM J. Sci. Comput., 36 (5), A2346–A2368,
2014.
[197] P. M. Kroonenberg and J. De Leeuw. Principal component analysis of three-mode data by
means of alternating least squares algorithms. Psychometrika, 45, 69–97, 1980.
[198] J. B. Kruskal. Three-way arrays: rank and uniqueness of trilinear decompositions, with
applications to arithmetic complexity and statistics. Linear Algebra Appl., 18, 95–138, 1977.
[199] K. N. Kudin, and G. E. Scuseria, Revisiting infinite lattice sums with the periodic Fast Multipole
Method, J. Chem. Phys. 121, 2886–2890, 2004.
[200] L. Laaksonen, P. Pyykkö, and D. Sundholm. Fully numerical Hartree–Fock methods for
molecules, Comput. Phys. Rep., 4, 313–344, 1986.
[201] J. M. Landsberg. Tensors: Geometry and Applications. American Mathematical Society,
Providence, RI, 2012.
[202] S. Lang. Linear Algebra, 3rd edn. Springer, Berlin, 1987.
[203] C. Le Bris, Computational chemistry from the perspective of numerical analysis. Acta Numer.,
363–444, 2005.
[204] L. Lin, Y. Saad, and Ch. Yang. Approximating spectral densities of large matrices. SIAM Rev.,
58, 34, 2016.
[205] D. Lindbo and A.-K. Tornberg. Fast and spectrally accurate Ewald summation for 2-periodic
electrostatic systems. J. Chem. Phys., 136, 164111, 2012.
[206] A. Litvinenko, D. Keyes, V. Khoromskaia, B. Khoromskij, and H. Matthies. Tucker Tensor
analysis of Matérn functions in spatial statistics. arXiv:1711.06874, 2017.
[207] M. Lorenz, D. Usvyat, and M. Schütz. Local ab initio methods for calculating optical band
gaps in periodic systems. I. Periodic density fitted local configuration interaction singles
method for polymers. J. Chem. Phys., 134, 094101, 2011.
[208] S. A. Losilla, D. Sundholm, and J. Juselius. The direct approach to gravitation and
electrostatics method for periodic systems. J. Chem. Phys., 132 (2), 024102, 2010.
[209] B. Z. Lu, Y. C. Zhou, M. J. Holst, and J. A. McCammon. Recent progress in numerical methods
for Poisson–Boltzmann equation in biophysical applications. Commun. Comput. Phys., 3 (5),
973–1009, 2008.
[210] Ch. Lubich. On variational approximations in quantum molecular dynamics. Math. Comput.,
74, 765–779, 2005.
[211] Ch. Lubich. From Quantum to Classical Molecular Dynamics: Reduced Models and Numerical
Analysis. Zurich Lectures in Advanced Mathematics, EMS, Zurich, 2008.
[212] Ch. Lubich and I. V. Oseledets. A projector-splitting integrator for dynamical low-rank
approximation. BIT Numer. Math., 54 (1), 171–188, 2014.
[213] Ch. Lubich, T. Rohwedder, R. Schneider, and B. Vandereycken. Dynamical approximation
of hierarchical Tucker and tensor-train tensors. SIAM J. Matrix Anal. Appl., 34 (2), 470–494,
2013.
[214] C Lubich, I. V. Oseledets, and B. Vandereycken. Time integration of tensor trains. SIAM J.
Numer. Anal., 53 (2), 917–941, 2015.
[215] J. Lund and K. L. Bowers. Sinc Methods for Quadrature and Differential Equations. SIAM,
Philadelphia, 1992.
[216] F. R. Manby. Density fitting in second-order linear-r12 Møller–Plesset perturbation theory.
J. Chem. Phys., 119 (9), 4607–4613, 2003.
[217] F. R. Manby, P. J. Knowles, and A. W. Lloyd. The Poisson equation in density fitting for the
Kohn–Sham Coulomb problem. J. Chem. Phys., 115, 9144–9148, 2001.
Bibliography | 281

[218] G. I. Marchuk and V. V. Shaidurov. Difference Methods and Their Extrapolations. Applications
of Mathematics, Springer, New York, 1983.
[219] H. G. Matthies, A. Litvinenko, O. Pajonk, B. L. Rosic, and E. Zander. Parametric and uncertainty
computations with tensor product representations. In: Uncertainty Quantification in Scientific
Computing, Springer, Berlin, pp. 139–150, 2012.
[220] V. Mazyja, G. Schmidt. Approximate Approximations. Mathematical Surveys and Monographs,
vol. 141, AMS, Providence, 2007.
[221] H.-D. Meyer, F. Gatti, and G. A. Worth. Multidimensional Quantum Dynamics: MCTDH Theory
and Applications. Willey–VCH, Wienheim, 2009.
[222] C. Møller and M. S. Plesset. Note on an approximation treatment for many-electron systems.
Phys. Rev., 46, 618, 1934.
[223] K. K. Naraparaju and J. Schneider. Generalized cross approximation for 3d-tensors. Comput.
Vis. Sci., 14 (3), 105–115, 2011.
[224] G. Onida, L. Reining, A. Rubio. Electronic excitations: density-functional versus many-body
Green’s-function approaches. Rev. Mod. Phys., 74 (2), 601, 2002.
[225] I. V. Oseledets. Approximation of 2d × 2d matrices using tensor decomposition. SIAM J. Matrix
Anal. Appl., 31 (4), 2130–2145, 2010.
[226] I. V. Oseledets. Tensor-train decomposition. SIAM J. Sci. Comput., 33 (5), 2295–2317, 2011.
[227] I. V. Oseledets. Constructive representation of functions in low-rank tensor formats. Constr.
Approx., 37 (1), 1–18, 2013.
[228] I. V. Oseledets and S. V. Dolgov. Solution of linear systems and matrix inversion in the
TT-format. SIAM J. Sci. Comput., 34 (5), A2718–A2739, 2012.
[229] I. V. Oseledets, and E. E. Tyrtyshnikov, Breaking the curse of dimensionality, or how to use
SVD in many dimensions. SIAM J. Sci. Comput., 31 (5), 3744–3759, 2009.
[230] I. Oseledets and E. E. Tyrtyshnikov. TT-cross approximation for multidimensional arrays.
Linear Algebra Appl., 432 (1), 70–88, 2010.
[231] I. V. Oseledets, D. V. Savostyanov, and E. E. Tyrtyshnikov. Tucker dimensionality reduction
of three-dimensional arrays in linear time. SIAM J. Matrix Anal. Appl., 30 (3), 939–956,
2008.
[232] I. V. Oseledets et al. Tensor Train Toolbox, 2014. https://github.com/oseledets/TT-Toolbox
[233] R. Parrish, E. G. Hohenstein, T. J. Martinez, and C. D. Sherrill. Tensor hypercontraction. II.
Least-squares renormalization. J. Chem. Phys., 137, 224106, 2012.
[234] K. A. Peterson, D. E. Woon, and T. H. Dunning, Jr. Benchmark calculations with correlated
molecular wave functions. IV. The classical barrier height of the H + H2 → H2 + H reaction.
J. Chem. Phys., 100, 7410–7415, 1994.
[235] C. Pisani, M. Schütz, S. Casassa, D. Usvyat, L. Maschio, M. Lorenz, and A. Erba. CRYSCOR:
a program for the post-Hartree–Fock treatment of periodic systems. Phys. Chem. Chem. Phys.,
14, 7615–7628, 2012.
[236] E. L. Pollock and J. Glosli. Comments on p(3)m, fmm and the Ewald method for large periodic
Coulombic systems. Comput. Phys. Commun., 95, 93–110, 1996.
[237] R. Polly, H.-J. Werner, F. R. Manby, and P. J. Knowles. Fast Hartree–Fock theory using density
fitting approximations. Mol. Phys., 102, 2311–2321, 2004.
[238] P. Pulay. Improved SCF convergence acceleration. J. Comput. Chem., 3, 556–560, 1982.
[239] M. Rakhuba and I. Oseledets. Fast multidimensional convolution in low-rank tensor formats
via cross approximation. SIAM J. Sci. Comput., 37 (2), A565–A582, 2015.
[240] M. Rakhuba and I. Oseledets. Grid-based electronic structure calculations: the tensor
decomposition approach. J. Comput. Phys., 312, 19–30, 2016.
[241] G. Rauhut, P. Pulay, H.-J. Werner. Integral transformation with low-order scaling for large local
second-order Mollet–Plesset calculations. J. Comput. Chem., 19, 1241–1254, 1998.
282 | Bibliography

[242] H. Rauhut, R. Schneider, and Z. Stojanac. Low rank tensor recovery via iterative hard
thresholding. Linear Algebra Appl., 523, 220–262, 2017.
[243] E. Rebolini, J. Toulouse, and A. Savin. Electronic excitation energies of molecular systems
from the Bethe–Salpeter equation: Example of H2 molecule. In S. Ghosh, P. Chattaraj, eds,
Concepts and Methods in Modern Theoretical Chemistry, vol. 1: Electronic Structure and
Reactivity, p. 367, 2013.
[244] E. Rebolini, J. Toulouse, and A. Savin. Electronic excitations from a linear-response
range-separated hybrid scheme. Mol. Phys., 111, 1219, 2013.
[245] E. Rebolini, J. Toulouse, A. M. Teale, T. Helgaker, and A. Savin. Calculating excitation
energies by extrapolation along adiabatic connections. Phys. Rev. A, 91, 032519,
2015.
[246] M. Reed and B. Simon. Functional Analysis. Academic Press, San Diego, 1972.
[247] S. Reine, T. Helgaker, and R. Lindh. Multi-electron integrals. WIREs Comput. Mol. Sci., 2,
290–303, 2012.
[248] L. Reining, V. Olevano, A. Rubio, and G. Onida. Excitonic effects in solids described by
time-dependent density-functional theory. Phys. Rev. Lett., 88 (6), 66404, 2002.
[249] T. Rohwedder and R. Schneider. Error estimates for the coupled cluster method. ESAIM:
M2AN, 47 (6), 1553–1582, 2013.
[250] T. Rohwedder and A. Uschmajew. On local convergence of alternating schemes for
optimization of convex problems in the tensor train format. SIAM J. Numer. Anal., 51 (2),
1134–1162, 2013.
[251] E. Runge and E. K. U. Gross. Density-functional theory for time-dependent systems. Phys. Rev.
Lett., 52 (12), 997, 1984.
[252] E. E. Salpeter and H. A. Bethe. A relativistic equation for bound-state problems. Phys. Rev.,
84 (6), 1951.
[253] G. Sansone, B. Civalleri, D. Usvyat, J. Toulouse, K. Sharkas, and L. Maschio. Range-separated
double-hybrid density-functional theory applied to periodic systems. J. Chem. Phys., 143,
102811, 2015.
[254] B. Savas and L.-H. Lim. Quasi-Newton methods on Grassmanians and multilinear
approximations of tensors. SIAM J. Sci. Comput., 32 (6), 3352–3393, 2010.
[255] D. V. Savostianov. Fast revealing of mode ranks of tensor in canonical form. Numer. Math.,
Theory Methods Appl., 2 (4), 439–444, 2009.
[256] D. V. Savostyanov and I. V. Oseledets. Fast adaptive interpolation of multi-dimensional
arrays in tensor train format. In Multidimensional (ND) Systems, 7th International Workshop,
University of Poitiers, France, 2011, doi:10.1109/nDS.2011.6076873
[257] D. V. Savostyanov, S. V. Dolgov, J. M. Werner, and I. Kuprov. Exact NMP simulation of
protein-size spin systems using tensor train formalism. Phys. Rev. B, 90, 085139,
2014.
[258] G. Schaftenaar and J. H. Noordik. Molden: a pre- and post-processing program for molecular
and electronic structures. J. Comput.-Aided Mol. Des., 14, 123–134, 2000.
[259] W. G. Schmidt, S. Glutsch, P. H. Hahn, and F. Bechstedt. Efficient O(N 2) method to solve the
Bethe–Salpeter equation. Phys. Rev. B, 67, 085307, 2003.
[260] R. Schneider. Analysis of the projected coupled cluster method in electronic structure
calculation. Numer. Math., 113, (3), 433–471, 2009.
[261] R. Schneider and A. Uschmajew. Approximation rates for the hierarchical tensor format in
periodic Sobolev spaces. J. Complex., 30 (2), 56–71, 2014.
[262] R. Schneider and A. Uschmajew. Convergence results for projected line-search methods on
varieties of low-rank matrices via Lojasiewicz inequality. SIAM J. Optim., 25 (1), 622–646,
2015.
Bibliography | 283

[263] R. Schneider, Th. Rohwedder, J. Blauert, and A. Neelov. Direct minimization for calculating
invariant subspaces in density functional computations of the electronic structure. J. Comput.
Math., 27 (2–3), 360–387, 2009.
[264] U. Schollwöck. The density-matrix renormalization group in the age of matrix product states,
Ann. Phys., 326 (1), 96–192, 2011.
[265] K. L. Schuchardt, B. T. Didier, T. Elsethagen, L. Sun, V. Gurumoorthi, J. Chase, J. Li, and
T. L. Windus. Basis set exchange: a community database for computational sciences, J. Chem.
Inf. Model., 47, 1045–1052, 2007.
[266] C. Schwab and R.-A. Todor, Karhunen–Loéve approximation of random fields by generalized
fast multipole methods. J. Comput. Phys., 217, 100–122, 2006.
[267] H. Sekino, Y. Maeda, T. Yanai, and R. J. Harrison. Basis set limit Hartree Fock and density
functional theory response property evaluation by multiresolution multiwavelet basis.
J. Chem. Phys., 129, 034111, 2008.
[268] Y. Shao, L. F. Molnar, Y. Jung, J. Kussmann, C. Ochsenfeld, S. T. Brown, et al. Advances in
methods and algorithms in a modern quantum chemistry program package. Phys. Chem.
Chem. Phys., 8 (27), 3172–3191.
[269] J. Sherman, W. Morrison. Adjustment of an inverse matrix corresponding to a change in one
element of a given matrix. Ann. Math. Stat., 21 (1), 124–127, 1950.
[270] A. Smilde, R. Bro, and P. Geladi. Multi-Way Analysis. Wiley, New York, 2004.
[271] F. Stenger. Numerical Methods Based on Sinc and Analytic Functions. Springer-Verlag, Berlin,
1993.
[272] G. Strang. Introduction to Linear Algebra, 5th edn. Wellesley–Cambridge Press, Wellesley,
2016.
[273] G. Strang and G. J. Fix. An Analysis of the Finite Element Method. Prentice-Hall, Inc., NJ,
1973.
[274] R. E. Stratmann, G. E. Scuseria, and M. J. Frisch. An efficient implementation of
time-dependent density-functional theory for the calculation of excitation energies of large
molecules. J. Chem. Phys., 109, 8218, 1998.
[275] E. Süli and D. F. Mayers. An Introduction to Numerical Analysis. Cambridge University Press,
Cambridge, 2003.
[276] D. Sundholm, P. Pyykkö, and L. Laaksonen. Two-dimensional fully numerical molecular
calculations. X. Hartree–Fock results for He2 , Li1 2, Be2 , HF, OH− , N2 , CO, BF, NO+ , and CN− .
Mol. Phys., 56, 1411–1418, 1985.
[277] A. Szabo and N. Ostlund. Modern Quantum Chemistry. Dover Publication, New York, 1996.
[278] A. Y. Toukmaji, and J. Board Jr. Ewald summation techniques in perspective: a survey. Comput.
Phys. Commun., 95, 73–92, 1996.
[279] J. Toulouse, A. Savin. Local density approximation for long-range or for short-range energy
functionals? J. Mol. Struct., Theochem, 762, 147, 2006.
[280] J. Toulouse, F. Colonna, and A. Savin. Long-range – short-range separation of the
electron–electron interaction in density-functional theory. Phys. Rev. A, 70, 062505, 2004.
[281] L. N. Trefethen. Spectral Methods in MATLAB. SIAM, Philadelphia, 2000.
[282] L. N. Trefethen and D Bau III. Numerical Linear Algebra. SIAM, Philadelphia, 1997.
[283] L. N. Trefethen and M. Embree. Spectra and Pseudospectra: The Behavior of Nonnormal
Matrices and Operators. Princeton University Press, Princeton and Oxford, 2005.
[284] L. R. Tucker. Some mathematical notes on three-mode factor analysis. Psychometrika, 31,
279–311, 1966.
[285] I. Turek. A maximum-entropy approach to the density of states within the recursion method.
J. Phys. C, 21, 3251–3260, 1988.
[286] E. E. Tyrtyshnikov. Mosaic-skeleton approximations. Calcolo, 33, 47–57, 1996.
284 | Bibliography

[287] E. E. Tyrtyshnikov. Incomplete cross approximation in the mosaic-skeleton method.


Computing, 64, 367–380, 2000.
[288] E. E. Tyrtyshnikov. Tensor approximations of matrices generated by asymptotically smooth
functions. Sb. Math., 194 (5–6), 941–954, 2003 (translated from Mat. Sb., 194 (6), 146–160,
2003).
[289] E. E. Tyrtyshnikov. Kronecker-product approximations for some function-related matrices.
Linear Algebra Appl., 379, 423–437, 2004.
[290] J. L. M. Van Dorsselaer and M. E. Hoschstenbach. Computing probabilistic bounds for extreme
eigenvalues of symmetric matrices with the Lanczos method. SIAM J. Matrix Anal. Appl., 22,
837–852, 2000.
[291] C. F. Van Loan and J. P. Vokt. Approximating matrices with multiple symmetries. SIAM J. Matrix
Anal. Appl., 36 (3), 974–993, 2015.
[292] J. VandeVondele, M. Krack, F. Mohamed, M. Parinello, Th. Chassaing, and J. Hutter.
QUICKSTEP: fast and accurate density functional calculations using a mixed Gaussian and
plane waves approach. Comput. Phys. Commun., 167, 103–128, 2005.
[293] F. Verstraete, D. Porras, and J. I. Cirac. DMRG and periodic boundary conditions: a quantum
information perspective. Phys. Rev. Lett., 93 (22), 227205, 2004.
[294] G. Vidal. Efficient classical simulation of slightly entangled quantum computations. Phys. Rev.
Lett. 91 (14), 147902, 2003.
[295] E. Voloshina, D. Usvyat, M. Schütz, Y. Dedkov, and B. Paulus. On the physisorption of water on
graphene: a CCSD(T) study. Phys. Chem. Chem. Phys., 13, 12041–12047, 2011.
[296] L.-W. Wang. Calculating the density of states and optical-absorption spectra of large quantum
systems by the plane-wave moments method. Phys. Rev. B, 49, 10154–10158, 1994.
[297] H. Wang, and M. Thoss. Multilayer formulation of the multiconfiguration time-dependent
Hartree theory. J. Chem. Phys., 119, 1289–1299, 2003.
[298] H.-J. Werner, F. R. Manby, and P. J. Knowles. Fast linear scaling second order Møller–Plesset
perturbation theory (MP2) using local and density fitting approximations. J. Chem. Phys., 118,
8149–8160, 2003.
[299] H.-J. Werner, P. J. Knowles, G. Knozia, F. R. Manby, and M. Schuetz. Molpro: a general-purpose
quantum chemistry program package. WIREs Comput. Mol. Sci., 2, 242–253, 2012.
[300] H.-J. Werner, P. J. Knowles, et al. MOLPRO, Version 2002.10, A Package of Ab Initio Programs
for Electronic Structure Calculations.
[301] J. C. Wheeler and C. Blumstein. Modified moments for harmonic solids. Phys. Rev. B, 6,
4380–4382, 1972.
[302] S. R. White. Density-matrix algorithms for quantum renormalization groups. Phys. Rev. B,
48 (14), 10345–10356, 1993.
[303] S. Wilson. Universal basis sets and Cholesky decomposition of the two-electron integral
matrix. Comput. Phys. Commun., 58, 71–81, 1990.
[304] P. Wind, W. Klopper, and T. Helgaker. Second order Møller–Plesset perturbation theory with
terms linear in interelectronic coordinates and exact evaluation of three-electron integrals.
Theor. Chem. Acc., 107, 173–179, 2002.
[305] T. Yanai, G. Fann, Z. Gan, R. Harrison, and G. Beylkin. Multiresolution quantum chemistry:
Hartree–Fock exchange. J. Chem. Phys., 121 (14), 6680–6688, 2004.
[306] Y. Yang, Y. Kurashige, F. R. Manby, and G. K. L. Chan. Tensor factorizations of local
second-order Møller–Plesset theory. J. Chem. Phys., 134, 044123, 2011.
[307] H. Yserentant. The hyperbolic cross space approximation of electronic wavefunctions. Numer.
Math., 105, 659–690, 2007.
[308] H. Yserentant. Regularity and Approximability of Electronic Wave Functions. Lecture Notes in
Mathematics Series, Springer-Verlag, Berlin, 2010.
Bibliography | 285

[309] E. Zeidler. Applied Functional Analysis: Applications to Mathematical Physics. Springer,


Berlin, 1995.
[310] T. Zhang and G. Golub. Rank-0ne approximation to high order tensors. SIAM J. Matrix Anal.
Appl., 23, 534–550, 2001.
[311] J. Zienau, L. Clin, B. Doser, and C. Ochsenfeld. Cholesky-decomposed densities in
Laplace-based second-order Møller–Plesset perturbation theory. J. Chem. Phys., 130, 204112,
2009.
[312] M. Zuzovski, A. Boag, and A. Natan. An auxilliary grid method for the calculation of
electrostatic terms in density functional theory on a real-space grid. Phys. Chem. Chem.
Phys., 17, 31550–31557, 2015.
Index
1D density fitting 145 direct tensor summation of potentials 132
3D integral-differential operator 105 Dirichlet boundary conditions 157
3D lattices with multiple defects 230 discrete-tensor product convolution 90
3D tensor product convolution 119 DOS calculations 208
double amplitudes tensor 173
adaptive cross approximation 211
algorithm of fast TESC Hartree–Fock solver 163 electron density 107
ALS iteration 30, 54, 55 electron repulsion integrals 141
analytic approximation methods 39 electrostatic potential 99
assembled canonical vectors 220 Ewald summation method 216
assembled tensor summation of potentials 215, exchange operator 107, 112, 117, 162
217 exchange potential operator 120
assembled Tucker tensor summation of excitation energies 179, 184
potentials 222 exponential convergence 45
assembled Tucker vectors 223 extended systems 168
average QTT rank bounds 228
fast Fourier transform (FFT) 250
average QTT ranks 147, 154, 177, 198, 210
fast TESC Hartree–Fock solver 157
finite 3D lattices 7, 217, 231
best rank r approximation 18
free-space electrostatic potential 270
Bethe–Salpeter equation (BSE) 6, 179
Frobenius norm 20, 44
block-circulant structure 170
function-related tensor 37, 45, 54
BSE matrix 182
BSE system matrix 180
Gaussian basis functions 129, 158
Gaussian basis set 117
canonical tensor format 24, 63, 100, 248
Gaussian function 100, 249
canonical vectors 259
Gaussian-type orbitals 114
canonical-to-Tucker approximation 65
generalized RHOSVD approximation 233
canonical-to-Tucker (C2T) transform 62, 236
Grassman manifold 27
Cholesky decomposition 15, 173
ground-state energy calculations 166
Cholesky factorization 173
collective electrostatic potential 215, 241 Hadamard product 34, 35
collocation-projection discretization 88 Hardy space 41, 43
compact molecules 165 Hartree potential 107, 112, 118
compatibility condition 66 Hartree–Fock (HF) equation 105
computational box 117, 158, 217 Helmholtz potential 46
contracted product 21 hexagonal lattice structure 235
contracted product tensor representation 63 hierarchical dimension splitting 79
convolution integrals 87, 112 hierarchical Tucker (HT) 79
convolution matrix 148 hierarchical Tucker tensor format 5
core Hamiltonian 106, 129, 159 higher order singular value decomposition
Coulomb matrix 112, 119 (HOSVD) 2, 19, 28
Coulomb operator 116, 161 homo-lumo gap 183
cumulated canonical tensors (CCT) 255, 258
curse of dimensionality 1, 20, 63 initial guess 54, 64
interaction energy 215
density of states (DOS) 6, 201 interaction energy of multiparticle systems 237,
dielectric function 183 264
288 | Index

Kronecker product 12 Poisson–Boltzmann equation (PBE) 268


problem-adapted small basis 186
Laplace operator 130, 159
Laplace transform 99, 103 q-adic folding (reshaping) 83
large finite lattice clusters 215 QTT interpolant 209
Lattice-type systems 171 QTT interpolation of DOS 211
local basis functions 158 QTT tensor approximation 209, 227
long-range canonical vectors 244 QTT tensor format 159, 252
long-range electrostatic potentials 7, 241 QTT-rank estimates 227
Lorentzians broadening 204 QTT-Tucker format 79
quantics tensor train (QTT) 5, 82
many-particle systems 216, 241
matrix product states 79 radial basis functions 261
matrix trace 205 random pertubation 62
matrix–matrix multiplication 11 range-separated canonical/Tucker tensor
maximum energy principle 70 formats 254
mixed Tucker-to-canonical approximation 74 range-separated (RS) tensor format 7, 241
mixed two-level Tucker-canonical transform 73 rank-structured TEI 161
modeling of multi-particle systems 241 rank-structured tensor 23
molecular orbitals 106 recompression of sinc-approximation 94
molecular orbitals basis 181 reduced basis approach 185
reduced basis model 187
Møller–Plesset (MP2) energy correction 171
reduced higher order singular value
most important fibers 70
decomposition (RHOSVD) 2, 62, 64, 79, 145
multidimensional long-range interaction
redundancy-free factorization of TEI 148
potentials 241
redundant free basis for TEI 148
multidimensional scattered data 260
reference canonical tensor 264
multidimensional tensor-product convolution
response function 183
87, 90, 92
RHOSVD 248
multigrid canonical-to-Tucker algorithm 69
RHOSVD approximation 231, 259
multigrid canonical-to-Tucker transform 248
RHOSVD stability condition 232
multigrid Tucker tensor decomposition 3, 54
RHOSVD-type factorization 150
multilevel Hartree–Fock solver 110, 112
RHOSVD-type Tucker approximation 66
multilevel SCF 122
Richardson extrapolation 96, 119, 137, 226
multilevel tensor-structured Hartree–Fock solver
RS-canonical tensor 257
4
RS-Tucker tensor format 257
multilevel tensor-truncated DIIS 124
multiplicative tensor formats 65
scalar product 9, 20, 34
multivariate Newton kernel 99
scalar product of canonical tensors 35
self-consistent field (SCF) iteration 107, 121, 164
Newton kernel 94, 141
separable approximation of the 3D Newton
nonlinear eigenvalue problem 106
kernel 99
nuclear potential operator 134, 160
Sherman–Morrison–Woodbury formula 180,
186, 194
orthogonal side matrices 26 shift-and-windowing transform 248
orthogonal Tucker matrices 65 shifting-windowing operator 218
short-range canonical vectors 245
periodic cell 224 simplified BSE problem 187
piecewise constant basis functions 218 sinc-quadrature approximation 41, 95, 99, 103
Index | 289

sinc-quadrature based 37, 259 tensor train (TT) format 5


sinc-quadrature methods 93 tensor-based Hartree–Fock calculations 109
single-hole tensor 30, 32, 65, 67 tensor-based Hartree–Fock solver 4
singular value decomposition (SVD) 13 three-dimensional convolution operators 117
skeleton vectors 145 total electrostatic potential 248
Slater function 43, 45 truncated Cholesky decomposition 146
splitting of a reference potential 242 truncated Cholesky factorization of TEI 152
static screened interaction matrix 183 truncated HOSVD 29
Stiefel manifold 27 Tucker core tensor 26, 30, 64, 68, 76, 78
storage demands 33, 53, 67, 72, 80, 83, 102, Tucker decomposition algorithm 30
149, 151, 177, 221, 258 Tucker tensor approximation 101
summation of potentials 215 Tucker tensor decomposition 2, 19
summation of potentials on composite lattices Tucker tensor decomposition algorithm 48
233 Tucker tensor format 25
Tucker tensor ranks 251
Tamm–Dancoff approximation (TDA) 180, 186 Tucker-to-canonical (T2C) transform 73, 76
TEI in molecular orbital basis 172 two-electron integrals (TEI) 4, 108, 141
tensor decomposition 248 two-level Tucker tensor format 91
tensor numerical methods 4, 157
tensor product 9 unfolding of a tensor 20
tensor representation of a Newton kernel 102
tensor train (TT) 79 Yukawa potential 43, 94

You might also like