SlideShare a Scribd company logo
Computing f-Divergences and Distances of
High-Dimensional Probability Density Functions
Alexander Litvinenko (RWTH Aachen, Germany),
joint work with
Youssef Marzouk, Hermann G. Matthies, Marco Scavino, Alessio Spantini
Plan
1. Motivating examples, working diagram
2. Basics: pdf, pcf, FFT
3. Theoretical background
4. Computation of moments and divergences
5. Tensor formats
6. Algorithms
7. Numerics
1 / 43
Motivation: How to compute in high dimensions?
Let ξ ∈ Rd
be a random vector ξ = (ξ1, ..., ξd) with pdf pξ.
Entropy is the expectation of the negative logarithm of the pdf :
h(pξ) := E (− ln(pξ(y))) :=
Z
Rd
− ln(pξ(y))pξ(y) dy. (1)
How to compute KLD and other divergences? Classical result is
given only for pdfs, which are usually unknown.
Divergence D•(pkq)
KLD
Z
(log(p(x)/q(x))) p(x) dx
Hellinger dist.
1
2
Z q
p(x) −
q
q(x)
2
dx
Bhattacharyya − log
Z q
(p(x)q(x)) dx

2 / 43
Motivation: stochastic PDEs
−∇ · (κ(x, ω)∇u(x, ω)) = f (x, ω), x ∈ G ⊂ Rd
, ω ∈ Ω
Write first Karhunen-Loeve Expansion and then for uncorrelated
random variables the Polynomial Chaos Expansion
u(x, ω) =
K
X
i=1
p
λiφi(x)ξi(ω) =
K
X
i=1
p
λiφi(x)
X
α∈J
ξ
(α)
i Hα(θ(ω))
(2)
=
K
X
i=1
p
λiφi(x)
p1
X
α1=1
...
pM
X
αM=1
ξ
(α1,...,αM)
i
M
Y
j=1
hαj
(θj) (3)
where α = (α1, α2, ..., αM, ...).
How to compute pdf of u(x, ω) from Eq. 2 ?
3 / 43
How to compute KLD, divergences if pdf is not available?
Two ways to compute f -divergence, KLD, entropy.
4 / 43
Connection of pcf and pdf
The probability characteristic function (pcf ) ϕξ defined:
ϕξ(t) := E (exp(ihξ|ti)) :=
Z
Rd
pξ(y) exp(ihy|ti) dy =: F[d]
(pξ)(t),
where t = (t1, t2, ..., td) ∈ Rd
,
hy|ti =
Pd
j=1 yjtj, and
F[d]
is the probabilist’s d-dimensional Fourier transform.
pξ(y) =
1
(2π)d
Z
Rd
exp(−iht|yi)ϕξ(t) dt = F[−d]
(ϕξ)(y). (4)
5 / 43
Discrete low-rank representation of pcf
We try to find an approximation
ϕξ(t) ≈ e
ϕξ(t) =
R
X
`=1
d
O
ν=1
ϕ`,ν(tν), (5)
where the ϕ`,ν(tν) are one-dimensional functions.
Then we can get
pξ(y) ≈ e
pξ(y) = F[−d]
(e
ϕξ)y =
R
X
`=1
d
O
ν=1
F−1
1 (ϕ`,ν)(yν),
where F−1
1 is the one-dimensional inverse Fourier transform.
6 / 43
Discrete representation of the pdf
Discrete representation of pdf and the pcf is based on equidistant
grid vectors
x̂iν,ν = x̂1,ν + (iν − 1)∆xν
(with increment ∆xν
) of size Mν in each dimension 1 ≤ ν ≤ d of Rd
.
V =
Qd
ν=1 Mν∆xν
, trapezoidal integration rule with weights V
N
.
The whole grid is
X̂ =
d
×
ν=1
x̂ν
P := pξ(X̂) denotes the tensor P ∈
Nd
ν=1 RMν
=: T ,
dim T =
Qd
ν=1 Mν =: N,
the components of which are the evaluation of the pdf pξ on the
grid X̂.
7 / 43
Discrete representation of the pcf
Dual grid
T̂ =
d
×
ν=1
t̂ν
t̂ν := (ˆ
t1,ν, . . . , ˆ
tMν,ν), ˆ
tMν,ν = π/∆xν
, the equi-distant spacing of the
dual grid in dimension ν is 2π/Lν..
0 ∈ T̂, j0
= (j0
1 , . . . , j0
d ), i.e. (ˆ
tj0
1 ,1, . . . , ˆ
tj0
d ,d) = 0 = (0, . . . , 0).
The pcf on the dual grid is represented through the tensor
Φ := φξ(T̂) ∈ T .
Thus, we deal with
P := pξ(X̂) of the pdf
and Φ := φξ(T̂) of the pcf .
8 / 43
Notation, Moments and Covariance
x ∈ Rd
, x⊗k
=
Nk
j=1 x.
RV ξ : Ω → V = Rd
.
Expectation operator is denoted by E (·), the mean is
ξ̄ := E (ξ) =
R
Ω ξ(ω) P(dω) ∈ Rd
,
ξ̃ := ξ − ξ̄.
The moments Xk and the central moments Ξk of ξ of order k:
Xk = E ξ⊗k

∈ (Rd
)⊗k
, Ξk = E

ξ̃⊗k

∈ (Rd
)⊗k
.
The covariance matrix Σξ = cov ξ = Ξ2 = X2 − ξ̄ ⊗ ξ̄ ∈ (Rd
)⊗2
.
The mixed and mixed central moments are denoted by
Y k,` = E

ξ⊗k
⊗ η⊗`

and Υk,` = E

ξ̃⊗k
⊗ η̃⊗`

∈ (Rd
)⊗k
⊗(Rn
)⊗
The covariance is also denoted as cov(ξ, η) = Υ1,1 = Y 1,1 − ξ̄ ⊗ η̄.
9 / 43
Higher order moments
(−i ∂tk
) ϕξ(t) =
R
Rd xk exp(iht|xi)pξ(x) dx = F[d]
(xkpξ(x)) (t).
Further, denoting the tensor of k-th derivatives by
Dk
ϕξ(t) =

∂k
∂ti1
... ∂tik
ϕξ(t)

, and
(−i)k
Dk
ϕξ(0) =
Z
Rd
x⊗k
pξ(x) dx = F[d]
x⊗k
pξ(x)

(0) = Xk
(6)
Second characteristic function (cumulant generating function) whose
derivative tensors of order k are essentially the cumulants Kk of ξ, is
defined as the point-wise logarithm of the pcf :
χξ(t) := log(ϕξ(t)) = log (E (exp(iht|ξi))) , (7)
with (−i)k
Dk
χξ(0) =: Kk.
10 / 43
Moment generating function
Is defined as:
Mξ(t) := E (exp(ht|ξi)) =
Z
Rd
exp(ht|xi)pξ(x) dx
= Ld(pξ)(−t) = ϕξ(−i t),
where Ld(pξ)(t) =
R
exp(h−t|xi)pξ(x) dx is the two-sided
d-dimensional Laplace transform of pξ. Then
Dk
Mξ(0) =
Z
Rd
x⊗k
pξ(x) dx = Xk, k ∈ N0. (8)
Cumulant generating function is the point-wise logarithm of the
moment generating function Mξ:
Kξ(t) := log(Mξ(t)) = log (E (exp(ht|ξi))) , (9)
with Dk
Kξ(0) = Kk.
11 / 43
Representation of a 3D tensor in the CP tensor format
A full tensor w ∈ Rn1×n2×n3
is represented as a sum of tensor
products.
The lines on the right denote vectors wi,k ∈ Rnk
, i = 1, . . . , r,
k = 1, 2, 3.
12 / 43
Computation of QoIs
For tensors P, F representing pdf p(x) and a function f (·) evaluated
on the grid, obtain
Z
p(x) dx ≈ S(P) :=
V
N
hP|iT , (10)
where  = (1i1,...,id
) — the tensor with all ones — satisfying
r  = r for any r ∈ T .
E (f (ξ)) =
Z
Rd
f (x)pξ(x) dx ≈ S(F P) =
V
N
hF|PiT
13 / 43
Computation of QoIs
Differential entropy, requiring the point-wise logarithm of P:
h(p) := E (− log(p))p ≈ E (− log(P))P = −
V
N
hlog(P)|Pi,
Then the f -divergence of p from q and its discrete approximation is
defined as
Df (pkq) := E

f

p
q

q
≈ E f (P Q −1
)

Q
=
V
N
hf (P Q −1
)|Qi.
14 / 43
List of some typical divergences and distances.
Divergence D•(pkq)
KLD — DKL:
Z
(log(p(x)/q(x))) p(x) dx = Ep(log(p/q))
Hellinger,(DH)2
:
1
2
Z q
p(x) −
q
q(x)
2
dx
Bregman, Dφ:
Z
[(φ(p(x)) − φ(q(x))) − (p(x) − q(x))φ0
(q(x))] dx
Bhattach., DBh: − log
Z q
(p(x)q(x)) dx

15 / 43
Discrete approximations for divergences above
Divergence Approx. D•(pkq)
KLD
V
N
(hlog(P)|Pi − hlog(Q)|Pi)
(DH)2
:
V
2N
hP 1/2
− Q 1/2
|P 1/2
− Q 1/2
i
Dφ: S ((φ(P) − φ(Q)) − (P − Q) φ0
(Q))
DBh: − log

V
N
hP 1/2
|Q 1/2
i

16 / 43
Algorithms
Below we will list algorithms, which approximate f (pξ(y)) by f (P),
where the f ’s considered are
f (·) = {sign(·), (·)−1
,
√
·, m
√
·, (·)k
, log (·), exp (·), (·)2
, | · |}, (11)
k  0,
P = pξ(X̂) =
Prp
j=1
Nd
ν=1 pj,ν.
Available methods:
1. TT-cross
2. iterative methods (e.g., Newton algorithm)
3. power series
4. quadrature rule to compute the Dunford-Cauchy contour integral
5. others (like sinc quadrature)
17 / 43
Iterative methods
We want to compute f (w) for some function f : T → T .
We have an iteration function Ψf ,
which only uses operations from the Hadamard algebra on T , and
which is iterated,
vi+1 = Ψf (vi)
and converges to a fixed point
Ψf (v∗) = v∗
When started with a v0 depending on w,
the fixed point is
lim
i→∞
vi = v∗ = Ψf (v∗) = f (w)
18 / 43
Computing pointwise inverse w −1
.
Let F(x) := w − x −1
.
Applying Newton’s method to F(x) for approximating the inverse of
a given tensor w, one obtains the following iteration function Ψ −1
with the i.c. v0 = α · w to bring v0 close to va = :
Ψ −1(v) = v (2 ·  − w v).
The iteration converges if the initial iterate v0 satisfies
k − w v0k∞  1.
A possible candidate for the starting value is v0 = αw with
α  (1/kwk∞)2
.
For such a v0, the convergence initial condition k − αw 2
k∞  1 is
always satisfied.
19 / 43
Computing pointwise
√
w via Newton iteration
Let F(x) := x 2
− w = 0.
The Newton iteration
Ψ√(v) =
1
2
· (v + v −1
w). (12)
with i.c. v0 = (w + )/2.
20 / 43
Computing pointwise
√
w via Newton-Schulz iteration
Let F(x) := x 2
− w = 0.
An alternative is Newton-Schulz iteration, which computes
v+
∗ =
√
w = w 1/2
and v−
∗ = (
√
w) −1
= w −1/2
.
We set V 0 = [y0, z0] = [α · w, ] ∈ T 2
, and the auxiliary function
A(y, z) = 3 ·  − z y:
Ψ√

y
z

=
1
2

y A(y, z)
A(y, z) z

. (13)
The iteration converges to V ∗ = [v+
∗ , v−
∗ ] = [
√
y0, (
√
y0) −1
] if
k − y0k∞  1, which can be achieved with a scaling factor
α  1/kwk∞.
As the initial iterate was scaled, the fixed point of the iteration is
v+
∗ =
√
α ·
√
w and v−
∗ = (1/
√
α) · (
√
w) −1
.
Thus the final result is
√
w = (1/
√
α)·v+
∗ and (
√
w) −1
=
√
α ·v−
∗ .
21 / 43
Computing log(w)
Assume w  0.
See review of methods in Higham’01 and Higham’12.
For the algorithms to work well w has to be close to the identity ,
which can be achieved by taking roots: for λ  0 one has
log wλ

= λ log w.
Truncated Taylor series (radius of convergence kxk∞  1):
log( − x) = −
∞
X
n=1
1
n
· x n
where x :=  − w. If w is not near to the identity, then one may
use the relation log(w) = 2k
log(w 1/2k
), where w 1/2k
→  as k
increases.
22 / 43
Computing power function by w 7→ w m
Denoting the power function by w 7→ w m
= Ψpow(m, w), one also
wants to use it for negative powers; for m  0 this is simply
Ψpow(m, w) = Ψpow(−m, w −1
).
The recursive formula:
Ψpow(m, w) =





m  1 and odd : w Ψpow(m − 1, w);
m even : Ψpow(m
2 , w) Ψpow(m
2 , w);
m = 1 : w;
(14)
23 / 43
Computing w
1
m
See Section 7 in N.Higham’s book.
Assume w ≥ 
Newton’s method for F(x) = x m
− w = . The iteration function
with v0 = w looks like
Ψm−root(v) =
1
m
((m − 1) · v + Ψpow(1 − m, v) v0) . (15)
If m ≥ 2, this involves a negative power v (1−m)
= Ψpow(1 − m, v).
Algorithm converges for all w ≥ .
24 / 43
Computing w
1
m
Auxiliary function A(y, z) = (1/m) · ((m + 1) ·  − z):
Ψm−root =

y
z

=

y A(y, z)
Ψpow(m, A(y, z)) z

, (16)
where yi → w − 1
m and zi → w
1
m .
The starting values are V 0 = [y0, z0] = [α · , (α)m
w] ∈ T 2
,
with α  (kwk∞/
√
2)− 1
m .
For scaling purposes it is best used with m = 2k
.
25 / 43
Computing w
1
m
Another way of computing the m-th root is Tsai’s algorithm(Tsai’88,
Lorin’21), which uses the auxiliary function
B(y) = (2 ·  + (m − 2) · y) ( + (m − 1) · y) −1
:
ΨTsai =

y
z

=

y Ψpow(m, B(y))
z (B(y))

, (17)
with starting value V 0 = [w, ]. Then zi → w
1
m .
26 / 43
Computing log(w) via Gregory’s series
Converges for all w  0.
Setting z = ( − w) ( + w) −1
, one has
log w = −2
∞
X
k=0
1
2k + 1
· z (2k+1)
. (18)
27 / 43
Computing exp w.
See book of N. Higham, Chapter 10:
ur,s =
r
X
k=0
1
k!sk
w k
! s
. (19)
Here limr→∞ ur,s = lims→∞ ur,s = exp w.
It is of advantage to use s from the series of powers of 2,
s = 1, 2, 4, . . . , 2k
,
then the s-th power can be computed by squaring.
For the scaling the best choice is α  kwk∞.
28 / 43
Series of numerical examples
1. KLD is computed with the analytical formula and the amen cross
algorithm from TT-toolbox
2. Hellinger distances is computed with well-known analytical
formulas and the amen cross algorithm.
3. (pdf is not known analytically), the d-variate elliptically contoured
α-stable distributions are chosen and accessed via their pcfs ,
4. KLD and Hellinger distances for different value of d, n and the
parameter α.
29 / 43
Example 1: KLD for two Gaussian distributions
N1 := N(µ1, C1) and N2 := N(µ2, C2), where
C1 := σ2
1I, C2 := σ2
2I,
µ1 = (1.1 . . . , 1.1) and µ2 = (1.4, . . . , 1.4) ∈ Rd
,
d = {16, 32, 64}, and σ1 = 1.5, σ2 = 22.1.
The well known analytical formula is
2DKL(N1kN2) = tr(C−1
2 C1) + (µ2 − µ1)T
C−1
2 (µ2 − µ1) − d + log
|C2|
|C1|
30 / 43
Comparison of KLDs computed via two methods
DKL computed via TT tensors (AMEn algorithm) and the analytical
formula for various values of d.
TT tolerance = 10−6
, the stopping difference between consecutive
iterations.
d 16 32 64
n 2048 2048 2048
DKL (exact) 35.08 70.16 140.32
e
DKL 35.08 70.16 140.32
erra 4.0e-7 2.43e-5 1.4e-5
errr 1.1e-8 3.46e-8 8.1e-8
comp. time, sec. 1.0 5.0 18.7
31 / 43
Example 2 — Hellinger distance
(for Gaussian distributions)
DH(N1, N2)2
= 1 − K1
2
(N1, N2), where
K1
2
(N1, N2) =
det(C1)
1
4 det(C2)
1
4
det

C1+C2
2
1
2
·
· exp −
1
8
(µ1 − µ2)

C1 + C2
2
−1
(µ1 − µ2)
!
32 / 43
Hellinger distance DH
is computed via TT tensors (AMEn) and the analytical formula. TT
tolerance = 10−6
.
d 16 32 64
n 2048 2048 2048
DH (exact) 0.99999 0.99999 0.99999
e
DH 0.99992 0.99999 0.99999
erra 3.5e-5 7.1e-5 1.4e-4
errr 2.5e-5 5.0e-5 1.0e-4
comp. time, sec. 1.7 7.5 30.5
The AMEn algorithm is able to compute the Hellinger distance DH
between two multiv. Gaussian distribs for d = {16, 32, 64}, and
n = 2048. The exact and approximate values are identical, and the
error is small.
33 / 43
Example 3: α-stable distribution
The pcf of a d-variate elliptically contoured α-stable distribution is
given by
ϕξ(t) = exp

iht|µi − ht|Cti
α
2

.
AMEn tol.= 10−9
.
34 / 43
Example 3: KLD between two α-stable distributions
with α1 = 2.0, α2 = 1.9 (µ1 = µ2 = 0, C1 = C2 = I).
d 16 16 16 16 16 16 16 32 32 32
n 8 16 32 64 128 256 512 64 128 256
DKL(2.0, 1.9) 0.016 0.06 0.06 0.062 0.06 0.06 0.06 0.09 0.14 0.12
time, sec. 0.8 3 8.9 14 22 61 207 46 100 258
maxTT rank 40 57 79 79 59 79 77 80 78 79
mem., MB 1.8 7 34 54 73 158 538 160 313 626
AMEn tol.= 10−9
.
35 / 43
Full storage vs low-rank
d = 32 and n = 256 mean that the amount of data in full storage
mode would be
N = nd
= 26532
≈ 1.16E77 ≈ 1E78 bytes.
In TT-low-rank approximation it is ca. 626MB, and fits on a laptop.
Assuming 1GHz notebook, the KLD computation in full mode would
require ca. 1.2E68sec,
or more than 3E60 years,
and even with a perfect speed-up on a parallel super-computer with
say 1, 000, 000 processors,
this would require still more than 3E54 years;
compare this with the estimated age of the universe of ca. 1.4E10
years.
36 / 43
Example 4: DKL(α1, α2) between two α-stable distributions
for (α1, α2) and fixed d = 8 and n = 64.
(α1, α2) (2.0, 0.5) (2.0, 1.0) (2.0, 1.5) (2.0, 1.9) (1.5, 1.4) (1.0, 0.4)
DKL(α1, α2) 2.27 0.66 0.3 0.03 0.031 0.6
comp. time, sec. 8.4 7.8 7.5 8.5 11 8.7
max. TT rank 78 74 76 76 80 79
memory, MB 28.5 28.5 27.1 28.5 35 29.5
µ1 = µ2 = 0, C1 = C2 = I.
AMEn tol.= 10−12
.
37 / 43
Example 3: Hellinger distance DH(α1, α2)
for the d-variate elliptically contoured α-stable distribution for
α = 1.5 and α = 0.9 for different d and n.
µ1 = µ2 = 0, C1 = C2 = I.
d 16 16 16 16 16 16 32 32 32 32
n 8 16 32 64 128 256 16 32 64 128
DH(1.5, 0.9) 0.218 0.223 0.223 0.223 0.219 0.223 0.180 0.176 0.175 0.176
comp. time, sec. 2.8 3.7 7.5 19 53 156 11 21 62 117
max. TT rank 79 76 76 76 79 76 75 71 75 74
memory, MB 7.7 17 34 71 145 283 34 66 144 285
AMEn tolerance is 10−9
.
38 / 43
Example 6: DH vs. TT (AMEn) tolerances
TT(AMEn) tolerance 10−7
10−8
10−9
10−10
10−14
DH(1.5, 0.9) 0.1645 0.1817 0.176 0.1761 0.1802
comp. time, sec. 43 86 103 118 241
max. TT rank 64 75 75 78 77
memory, MB 126 255 270 307 322
Computation of DH(α1, α2) between two α-stable distributions
(α = 1.5 and α = 0.9) for different AMEn tolerances.
n = 128, d = 32, µ1 = µ2 = 0, C1 = C2 = I.
39 / 43
Conclusion
Provided numerical ways to compute
1. entropy, KLD, and f-divergences in low-rank tensor format
2. functions
f (·) = {sign(·), (·)−1
,
√
·, m
√
·, (·)k
, log (·), exp (·), (·)2
, | · |},
of pcf and pdf
3. low-rank approximations, which help to reduce the complexity and
storage from exponential O(nd
) to linear, e.g. O dnr2

for the TT
format.
40 / 43
Literature
1. A. Litvinenko, Y. Marzouk, H.G. Matthies, M. Scavino, A.
Spantini, Computing f-Divergences and Distances of
High-Dimensional Probability Density Functions – Low-Rank
Tensor Approximations, Numer Linear Algebra Appl. 2022;e2467.
https://doi.org/10.1002/nla.2467
2. M. Espig, W. Hackbusch, A. Litvinenko, H.G. Matthies, E Zander,
Iterative algorithms for the post-processing of high-dimensional
data, JCP 410, 109396, 2020,
https://doi.org/10.1016/j.jcp.2020.109396
3. S. Dolgov, A. Litvinenko, D. Liu, KRIGING IN TENSOR TRAIN
DATA FORMAT, Conf. Proc., 3rd Int. Conf. on Uncertainty
Quantification in CSE, https://files.eccomasproceedia.
org/papers/e-books/uncecomp_2019.pdf, pp 309-329, 2019
41 / 43
Literature
4. A. Litvinenko, D. Keyes, V. Khoromskaia, B.N. Khoromskij, H.G.
Matthies, Tucker tensor analysis of Matérn functions in spatial
statistics, Computational Methods in Applied Mathematics, vol.
19, no 1, 2019, pp 101-122,
https://doi.org/10.1515/cmam-2018-0022
5. A. Litvinenko, R. Kriemann, M.G. Genton, Y. Sun, D.E. Keyes,
HLIBCov: Parallel hierarchical matrix approximation of large
covariance matrices and likelihoods with applications in parameter
identification, MethodsX 7, 100600, 2020,
https://doi.org/10.1016/j.mex.2019.07.001
6. A. Litvinenko, Y. Sun, M.G. Genton, D.E. Keyes, Likelihood
approximation with hierarchical matrices for large spatial datasets,
Computational Statistics  Data Analysis 137, pp 115-132, 2019,
https://doi.org/10.1016/j.csda.2019.02.002
42 / 43
Acknowledgement
Funding information:
Alexander von Humboldt-Stiftung, Deutsche
Forschungsgemeinschaft, Gay-Lussac Humboldt Research Award
43 / 43

More Related Content

PDF
Low rank tensor approximation of probability density and characteristic funct...
PDF
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
PDF
Hierarchical matrices for approximating large covariance matries and computin...
PDF
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
PDF
Fourier transforms & fft algorithm (paul heckbert, 1998) by tantanoid
PDF
Applications of Differential Calculus in real life
PDF
Hyperfunction method for numerical integration and Fredholm integral equation...
PDF
PCA on graph/network
Low rank tensor approximation of probability density and characteristic funct...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Hierarchical matrices for approximating large covariance matries and computin...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
Fourier transforms & fft algorithm (paul heckbert, 1998) by tantanoid
Applications of Differential Calculus in real life
Hyperfunction method for numerical integration and Fredholm integral equation...
PCA on graph/network

Similar to Litvinenko_RWTH_UQ_Seminar_talk.pdf (20)

PDF
Lecture5
PDF
Nested sampling
PPT
Image trnsformations
PDF
A numerical method to solve fractional Fredholm-Volterra integro-differential...
PDF
Wavelet Tour of Signal Processing 3rd Edition Mallat Solutions Manual
PDF
Geometric and viscosity solutions for the Cauchy problem of first order
PDF
Statistical Hydrology for Engineering.pdf
PDF
Dynamical systems solved ex
PDF
Testing for mixtures by seeking components
PDF
Wavelet Tour of Signal Processing 3rd Edition Mallat Solutions Manual
PDF
Wavelet Tour of Signal Processing 3rd Edition Mallat Solutions Manual
PDF
Maximum likelihood estimation of regularisation parameters in inverse problem...
PPTX
AEM Integrating factor to orthogonal trajactories
PDF
On Twisted Paraproducts and some other Multilinear Singular Integrals
PPTX
Resurgence2020-Sueishi análise wkb com in
PDF
Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)
PPTX
Lesson 6 differentials parametric-curvature
PDF
Numerical solution of the Schr¨odinger equation
PPTX
group3ppt-240508074515-Engineering Mathematics II Presentation.pptx
PDF
1 hofstad
Lecture5
Nested sampling
Image trnsformations
A numerical method to solve fractional Fredholm-Volterra integro-differential...
Wavelet Tour of Signal Processing 3rd Edition Mallat Solutions Manual
Geometric and viscosity solutions for the Cauchy problem of first order
Statistical Hydrology for Engineering.pdf
Dynamical systems solved ex
Testing for mixtures by seeking components
Wavelet Tour of Signal Processing 3rd Edition Mallat Solutions Manual
Wavelet Tour of Signal Processing 3rd Edition Mallat Solutions Manual
Maximum likelihood estimation of regularisation parameters in inverse problem...
AEM Integrating factor to orthogonal trajactories
On Twisted Paraproducts and some other Multilinear Singular Integrals
Resurgence2020-Sueishi análise wkb com in
Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)
Lesson 6 differentials parametric-curvature
Numerical solution of the Schr¨odinger equation
group3ppt-240508074515-Engineering Mathematics II Presentation.pptx
1 hofstad

More from Alexander Litvinenko (20)

PDF
Poster_density_driven_with_fracture_MLMC.pdf
PDF
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
PDF
litvinenko_Intrusion_Bari_2023.pdf
PDF
Density Driven Groundwater Flow with Uncertain Porosity and Permeability
PDF
litvinenko_Gamm2023.pdf
PDF
Litvinenko_Poster_Henry_22May.pdf
PDF
Uncertain_Henry_problem-poster.pdf
PDF
Litv_Denmark_Weak_Supervised_Learning.pdf
PDF
Computing f-Divergences and Distances of High-Dimensional Probability Density...
PDF
Identification of unknown parameters and prediction of missing values. Compar...
PDF
Computation of electromagnetic fields scattered from dielectric objects of un...
PDF
Identification of unknown parameters and prediction with hierarchical matrice...
PDF
Low-rank tensor approximation (Introduction)
PDF
Computation of electromagnetic fields scattered from dielectric objects of un...
PDF
Application of parallel hierarchical matrices for parameter inference and pre...
PDF
Computation of electromagnetic fields scattered from dielectric objects of un...
PDF
Propagation of Uncertainties in Density Driven Groundwater Flow
PDF
Simulation of propagation of uncertainties in density-driven groundwater flow
PDF
Approximation of large covariance matrices in statistics
PDF
Semi-Supervised Regression using Cluster Ensemble
Poster_density_driven_with_fracture_MLMC.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Intrusion_Bari_2023.pdf
Density Driven Groundwater Flow with Uncertain Porosity and Permeability
litvinenko_Gamm2023.pdf
Litvinenko_Poster_Henry_22May.pdf
Uncertain_Henry_problem-poster.pdf
Litv_Denmark_Weak_Supervised_Learning.pdf
Computing f-Divergences and Distances of High-Dimensional Probability Density...
Identification of unknown parameters and prediction of missing values. Compar...
Computation of electromagnetic fields scattered from dielectric objects of un...
Identification of unknown parameters and prediction with hierarchical matrice...
Low-rank tensor approximation (Introduction)
Computation of electromagnetic fields scattered from dielectric objects of un...
Application of parallel hierarchical matrices for parameter inference and pre...
Computation of electromagnetic fields scattered from dielectric objects of un...
Propagation of Uncertainties in Density Driven Groundwater Flow
Simulation of propagation of uncertainties in density-driven groundwater flow
Approximation of large covariance matrices in statistics
Semi-Supervised Regression using Cluster Ensemble

Recently uploaded (20)

PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
DOCX
UPPER GASTRO INTESTINAL DISORDER.docx
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Introduction and Scope of Bichemistry.pptx
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Module 3: Health Systems Tutorial Slides S2 2025
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
UNDER FIVE CLINICS OR WELL BABY CLINICS.pptx
PDF
English Language Teaching from Post-.pdf
PPTX
Cell Structure & Organelles in detailed.
PPTX
Nursing Management of Patients with Disorders of Ear, Nose, and Throat (ENT) ...
PDF
PSYCHOLOGY IN EDUCATION.pdf ( nice pdf ...)
PDF
Mga Unang Hakbang Tungo Sa Tao by Joe Vibar Nero.pdf
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
UPPER GASTRO INTESTINAL DISORDER.docx
O5-L3 Freight Transport Ops (International) V1.pdf
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
human mycosis Human fungal infections are called human mycosis..pptx
Introduction and Scope of Bichemistry.pptx
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Module 3: Health Systems Tutorial Slides S2 2025
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Week 4 Term 3 Study Techniques revisited.pptx
Anesthesia in Laparoscopic Surgery in India
UNDER FIVE CLINICS OR WELL BABY CLINICS.pptx
English Language Teaching from Post-.pdf
Cell Structure & Organelles in detailed.
Nursing Management of Patients with Disorders of Ear, Nose, and Throat (ENT) ...
PSYCHOLOGY IN EDUCATION.pdf ( nice pdf ...)
Mga Unang Hakbang Tungo Sa Tao by Joe Vibar Nero.pdf
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Renaissance Architecture: A Journey from Faith to Humanism
school management -TNTEU- B.Ed., Semester II Unit 1.pptx

Litvinenko_RWTH_UQ_Seminar_talk.pdf

  • 1. Computing f-Divergences and Distances of High-Dimensional Probability Density Functions Alexander Litvinenko (RWTH Aachen, Germany), joint work with Youssef Marzouk, Hermann G. Matthies, Marco Scavino, Alessio Spantini
  • 2. Plan 1. Motivating examples, working diagram 2. Basics: pdf, pcf, FFT 3. Theoretical background 4. Computation of moments and divergences 5. Tensor formats 6. Algorithms 7. Numerics 1 / 43
  • 3. Motivation: How to compute in high dimensions? Let ξ ∈ Rd be a random vector ξ = (ξ1, ..., ξd) with pdf pξ. Entropy is the expectation of the negative logarithm of the pdf : h(pξ) := E (− ln(pξ(y))) := Z Rd − ln(pξ(y))pξ(y) dy. (1) How to compute KLD and other divergences? Classical result is given only for pdfs, which are usually unknown. Divergence D•(pkq) KLD Z (log(p(x)/q(x))) p(x) dx Hellinger dist. 1 2 Z q p(x) − q q(x) 2 dx Bhattacharyya − log Z q (p(x)q(x)) dx 2 / 43
  • 4. Motivation: stochastic PDEs −∇ · (κ(x, ω)∇u(x, ω)) = f (x, ω), x ∈ G ⊂ Rd , ω ∈ Ω Write first Karhunen-Loeve Expansion and then for uncorrelated random variables the Polynomial Chaos Expansion u(x, ω) = K X i=1 p λiφi(x)ξi(ω) = K X i=1 p λiφi(x) X α∈J ξ (α) i Hα(θ(ω)) (2) = K X i=1 p λiφi(x) p1 X α1=1 ... pM X αM=1 ξ (α1,...,αM) i M Y j=1 hαj (θj) (3) where α = (α1, α2, ..., αM, ...). How to compute pdf of u(x, ω) from Eq. 2 ? 3 / 43
  • 5. How to compute KLD, divergences if pdf is not available? Two ways to compute f -divergence, KLD, entropy. 4 / 43
  • 6. Connection of pcf and pdf The probability characteristic function (pcf ) ϕξ defined: ϕξ(t) := E (exp(ihξ|ti)) := Z Rd pξ(y) exp(ihy|ti) dy =: F[d] (pξ)(t), where t = (t1, t2, ..., td) ∈ Rd , hy|ti = Pd j=1 yjtj, and F[d] is the probabilist’s d-dimensional Fourier transform. pξ(y) = 1 (2π)d Z Rd exp(−iht|yi)ϕξ(t) dt = F[−d] (ϕξ)(y). (4) 5 / 43
  • 7. Discrete low-rank representation of pcf We try to find an approximation ϕξ(t) ≈ e ϕξ(t) = R X `=1 d O ν=1 ϕ`,ν(tν), (5) where the ϕ`,ν(tν) are one-dimensional functions. Then we can get pξ(y) ≈ e pξ(y) = F[−d] (e ϕξ)y = R X `=1 d O ν=1 F−1 1 (ϕ`,ν)(yν), where F−1 1 is the one-dimensional inverse Fourier transform. 6 / 43
  • 8. Discrete representation of the pdf Discrete representation of pdf and the pcf is based on equidistant grid vectors x̂iν,ν = x̂1,ν + (iν − 1)∆xν (with increment ∆xν ) of size Mν in each dimension 1 ≤ ν ≤ d of Rd . V = Qd ν=1 Mν∆xν , trapezoidal integration rule with weights V N . The whole grid is X̂ = d × ν=1 x̂ν P := pξ(X̂) denotes the tensor P ∈ Nd ν=1 RMν =: T , dim T = Qd ν=1 Mν =: N, the components of which are the evaluation of the pdf pξ on the grid X̂. 7 / 43
  • 9. Discrete representation of the pcf Dual grid T̂ = d × ν=1 t̂ν t̂ν := (ˆ t1,ν, . . . , ˆ tMν,ν), ˆ tMν,ν = π/∆xν , the equi-distant spacing of the dual grid in dimension ν is 2π/Lν.. 0 ∈ T̂, j0 = (j0 1 , . . . , j0 d ), i.e. (ˆ tj0 1 ,1, . . . , ˆ tj0 d ,d) = 0 = (0, . . . , 0). The pcf on the dual grid is represented through the tensor Φ := φξ(T̂) ∈ T . Thus, we deal with P := pξ(X̂) of the pdf and Φ := φξ(T̂) of the pcf . 8 / 43
  • 10. Notation, Moments and Covariance x ∈ Rd , x⊗k = Nk j=1 x. RV ξ : Ω → V = Rd . Expectation operator is denoted by E (·), the mean is ξ̄ := E (ξ) = R Ω ξ(ω) P(dω) ∈ Rd , ξ̃ := ξ − ξ̄. The moments Xk and the central moments Ξk of ξ of order k: Xk = E ξ⊗k ∈ (Rd )⊗k , Ξk = E ξ̃⊗k ∈ (Rd )⊗k . The covariance matrix Σξ = cov ξ = Ξ2 = X2 − ξ̄ ⊗ ξ̄ ∈ (Rd )⊗2 . The mixed and mixed central moments are denoted by Y k,` = E ξ⊗k ⊗ η⊗` and Υk,` = E ξ̃⊗k ⊗ η̃⊗` ∈ (Rd )⊗k ⊗(Rn )⊗ The covariance is also denoted as cov(ξ, η) = Υ1,1 = Y 1,1 − ξ̄ ⊗ η̄. 9 / 43
  • 11. Higher order moments (−i ∂tk ) ϕξ(t) = R Rd xk exp(iht|xi)pξ(x) dx = F[d] (xkpξ(x)) (t). Further, denoting the tensor of k-th derivatives by Dk ϕξ(t) = ∂k ∂ti1 ... ∂tik ϕξ(t) , and (−i)k Dk ϕξ(0) = Z Rd x⊗k pξ(x) dx = F[d] x⊗k pξ(x) (0) = Xk (6) Second characteristic function (cumulant generating function) whose derivative tensors of order k are essentially the cumulants Kk of ξ, is defined as the point-wise logarithm of the pcf : χξ(t) := log(ϕξ(t)) = log (E (exp(iht|ξi))) , (7) with (−i)k Dk χξ(0) =: Kk. 10 / 43
  • 12. Moment generating function Is defined as: Mξ(t) := E (exp(ht|ξi)) = Z Rd exp(ht|xi)pξ(x) dx = Ld(pξ)(−t) = ϕξ(−i t), where Ld(pξ)(t) = R exp(h−t|xi)pξ(x) dx is the two-sided d-dimensional Laplace transform of pξ. Then Dk Mξ(0) = Z Rd x⊗k pξ(x) dx = Xk, k ∈ N0. (8) Cumulant generating function is the point-wise logarithm of the moment generating function Mξ: Kξ(t) := log(Mξ(t)) = log (E (exp(ht|ξi))) , (9) with Dk Kξ(0) = Kk. 11 / 43
  • 13. Representation of a 3D tensor in the CP tensor format A full tensor w ∈ Rn1×n2×n3 is represented as a sum of tensor products. The lines on the right denote vectors wi,k ∈ Rnk , i = 1, . . . , r, k = 1, 2, 3. 12 / 43
  • 14. Computation of QoIs For tensors P, F representing pdf p(x) and a function f (·) evaluated on the grid, obtain Z p(x) dx ≈ S(P) := V N hP|iT , (10) where  = (1i1,...,id ) — the tensor with all ones — satisfying r  = r for any r ∈ T . E (f (ξ)) = Z Rd f (x)pξ(x) dx ≈ S(F P) = V N hF|PiT 13 / 43
  • 15. Computation of QoIs Differential entropy, requiring the point-wise logarithm of P: h(p) := E (− log(p))p ≈ E (− log(P))P = − V N hlog(P)|Pi, Then the f -divergence of p from q and its discrete approximation is defined as Df (pkq) := E f p q q ≈ E f (P Q −1 ) Q = V N hf (P Q −1 )|Qi. 14 / 43
  • 16. List of some typical divergences and distances. Divergence D•(pkq) KLD — DKL: Z (log(p(x)/q(x))) p(x) dx = Ep(log(p/q)) Hellinger,(DH)2 : 1 2 Z q p(x) − q q(x) 2 dx Bregman, Dφ: Z [(φ(p(x)) − φ(q(x))) − (p(x) − q(x))φ0 (q(x))] dx Bhattach., DBh: − log Z q (p(x)q(x)) dx 15 / 43
  • 17. Discrete approximations for divergences above Divergence Approx. D•(pkq) KLD V N (hlog(P)|Pi − hlog(Q)|Pi) (DH)2 : V 2N hP 1/2 − Q 1/2 |P 1/2 − Q 1/2 i Dφ: S ((φ(P) − φ(Q)) − (P − Q) φ0 (Q)) DBh: − log V N hP 1/2 |Q 1/2 i 16 / 43
  • 18. Algorithms Below we will list algorithms, which approximate f (pξ(y)) by f (P), where the f ’s considered are f (·) = {sign(·), (·)−1 , √ ·, m √ ·, (·)k , log (·), exp (·), (·)2 , | · |}, (11) k 0, P = pξ(X̂) = Prp j=1 Nd ν=1 pj,ν. Available methods: 1. TT-cross 2. iterative methods (e.g., Newton algorithm) 3. power series 4. quadrature rule to compute the Dunford-Cauchy contour integral 5. others (like sinc quadrature) 17 / 43
  • 19. Iterative methods We want to compute f (w) for some function f : T → T . We have an iteration function Ψf , which only uses operations from the Hadamard algebra on T , and which is iterated, vi+1 = Ψf (vi) and converges to a fixed point Ψf (v∗) = v∗ When started with a v0 depending on w, the fixed point is lim i→∞ vi = v∗ = Ψf (v∗) = f (w) 18 / 43
  • 20. Computing pointwise inverse w −1 . Let F(x) := w − x −1 . Applying Newton’s method to F(x) for approximating the inverse of a given tensor w, one obtains the following iteration function Ψ −1 with the i.c. v0 = α · w to bring v0 close to va = : Ψ −1(v) = v (2 ·  − w v). The iteration converges if the initial iterate v0 satisfies k − w v0k∞ 1. A possible candidate for the starting value is v0 = αw with α (1/kwk∞)2 . For such a v0, the convergence initial condition k − αw 2 k∞ 1 is always satisfied. 19 / 43
  • 21. Computing pointwise √ w via Newton iteration Let F(x) := x 2 − w = 0. The Newton iteration Ψ√(v) = 1 2 · (v + v −1 w). (12) with i.c. v0 = (w + )/2. 20 / 43
  • 22. Computing pointwise √ w via Newton-Schulz iteration Let F(x) := x 2 − w = 0. An alternative is Newton-Schulz iteration, which computes v+ ∗ = √ w = w 1/2 and v− ∗ = ( √ w) −1 = w −1/2 . We set V 0 = [y0, z0] = [α · w, ] ∈ T 2 , and the auxiliary function A(y, z) = 3 ·  − z y: Ψ√ y z = 1 2 y A(y, z) A(y, z) z . (13) The iteration converges to V ∗ = [v+ ∗ , v− ∗ ] = [ √ y0, ( √ y0) −1 ] if k − y0k∞ 1, which can be achieved with a scaling factor α 1/kwk∞. As the initial iterate was scaled, the fixed point of the iteration is v+ ∗ = √ α · √ w and v− ∗ = (1/ √ α) · ( √ w) −1 . Thus the final result is √ w = (1/ √ α)·v+ ∗ and ( √ w) −1 = √ α ·v− ∗ . 21 / 43
  • 23. Computing log(w) Assume w 0. See review of methods in Higham’01 and Higham’12. For the algorithms to work well w has to be close to the identity , which can be achieved by taking roots: for λ 0 one has log wλ = λ log w. Truncated Taylor series (radius of convergence kxk∞ 1): log( − x) = − ∞ X n=1 1 n · x n where x :=  − w. If w is not near to the identity, then one may use the relation log(w) = 2k log(w 1/2k ), where w 1/2k →  as k increases. 22 / 43
  • 24. Computing power function by w 7→ w m Denoting the power function by w 7→ w m = Ψpow(m, w), one also wants to use it for negative powers; for m 0 this is simply Ψpow(m, w) = Ψpow(−m, w −1 ). The recursive formula: Ψpow(m, w) =      m 1 and odd : w Ψpow(m − 1, w); m even : Ψpow(m 2 , w) Ψpow(m 2 , w); m = 1 : w; (14) 23 / 43
  • 25. Computing w 1 m See Section 7 in N.Higham’s book. Assume w ≥  Newton’s method for F(x) = x m − w = . The iteration function with v0 = w looks like Ψm−root(v) = 1 m ((m − 1) · v + Ψpow(1 − m, v) v0) . (15) If m ≥ 2, this involves a negative power v (1−m) = Ψpow(1 − m, v). Algorithm converges for all w ≥ . 24 / 43
  • 26. Computing w 1 m Auxiliary function A(y, z) = (1/m) · ((m + 1) ·  − z): Ψm−root = y z = y A(y, z) Ψpow(m, A(y, z)) z , (16) where yi → w − 1 m and zi → w 1 m . The starting values are V 0 = [y0, z0] = [α · , (α)m w] ∈ T 2 , with α (kwk∞/ √ 2)− 1 m . For scaling purposes it is best used with m = 2k . 25 / 43
  • 27. Computing w 1 m Another way of computing the m-th root is Tsai’s algorithm(Tsai’88, Lorin’21), which uses the auxiliary function B(y) = (2 ·  + (m − 2) · y) ( + (m − 1) · y) −1 : ΨTsai = y z = y Ψpow(m, B(y)) z (B(y)) , (17) with starting value V 0 = [w, ]. Then zi → w 1 m . 26 / 43
  • 28. Computing log(w) via Gregory’s series Converges for all w 0. Setting z = ( − w) ( + w) −1 , one has log w = −2 ∞ X k=0 1 2k + 1 · z (2k+1) . (18) 27 / 43
  • 29. Computing exp w. See book of N. Higham, Chapter 10: ur,s = r X k=0 1 k!sk w k ! s . (19) Here limr→∞ ur,s = lims→∞ ur,s = exp w. It is of advantage to use s from the series of powers of 2, s = 1, 2, 4, . . . , 2k , then the s-th power can be computed by squaring. For the scaling the best choice is α kwk∞. 28 / 43
  • 30. Series of numerical examples 1. KLD is computed with the analytical formula and the amen cross algorithm from TT-toolbox 2. Hellinger distances is computed with well-known analytical formulas and the amen cross algorithm. 3. (pdf is not known analytically), the d-variate elliptically contoured α-stable distributions are chosen and accessed via their pcfs , 4. KLD and Hellinger distances for different value of d, n and the parameter α. 29 / 43
  • 31. Example 1: KLD for two Gaussian distributions N1 := N(µ1, C1) and N2 := N(µ2, C2), where C1 := σ2 1I, C2 := σ2 2I, µ1 = (1.1 . . . , 1.1) and µ2 = (1.4, . . . , 1.4) ∈ Rd , d = {16, 32, 64}, and σ1 = 1.5, σ2 = 22.1. The well known analytical formula is 2DKL(N1kN2) = tr(C−1 2 C1) + (µ2 − µ1)T C−1 2 (µ2 − µ1) − d + log |C2| |C1| 30 / 43
  • 32. Comparison of KLDs computed via two methods DKL computed via TT tensors (AMEn algorithm) and the analytical formula for various values of d. TT tolerance = 10−6 , the stopping difference between consecutive iterations. d 16 32 64 n 2048 2048 2048 DKL (exact) 35.08 70.16 140.32 e DKL 35.08 70.16 140.32 erra 4.0e-7 2.43e-5 1.4e-5 errr 1.1e-8 3.46e-8 8.1e-8 comp. time, sec. 1.0 5.0 18.7 31 / 43
  • 33. Example 2 — Hellinger distance (for Gaussian distributions) DH(N1, N2)2 = 1 − K1 2 (N1, N2), where K1 2 (N1, N2) = det(C1) 1 4 det(C2) 1 4 det C1+C2 2 1 2 · · exp − 1 8 (µ1 − µ2) C1 + C2 2 −1 (µ1 − µ2) ! 32 / 43
  • 34. Hellinger distance DH is computed via TT tensors (AMEn) and the analytical formula. TT tolerance = 10−6 . d 16 32 64 n 2048 2048 2048 DH (exact) 0.99999 0.99999 0.99999 e DH 0.99992 0.99999 0.99999 erra 3.5e-5 7.1e-5 1.4e-4 errr 2.5e-5 5.0e-5 1.0e-4 comp. time, sec. 1.7 7.5 30.5 The AMEn algorithm is able to compute the Hellinger distance DH between two multiv. Gaussian distribs for d = {16, 32, 64}, and n = 2048. The exact and approximate values are identical, and the error is small. 33 / 43
  • 35. Example 3: α-stable distribution The pcf of a d-variate elliptically contoured α-stable distribution is given by ϕξ(t) = exp iht|µi − ht|Cti α 2 . AMEn tol.= 10−9 . 34 / 43
  • 36. Example 3: KLD between two α-stable distributions with α1 = 2.0, α2 = 1.9 (µ1 = µ2 = 0, C1 = C2 = I). d 16 16 16 16 16 16 16 32 32 32 n 8 16 32 64 128 256 512 64 128 256 DKL(2.0, 1.9) 0.016 0.06 0.06 0.062 0.06 0.06 0.06 0.09 0.14 0.12 time, sec. 0.8 3 8.9 14 22 61 207 46 100 258 maxTT rank 40 57 79 79 59 79 77 80 78 79 mem., MB 1.8 7 34 54 73 158 538 160 313 626 AMEn tol.= 10−9 . 35 / 43
  • 37. Full storage vs low-rank d = 32 and n = 256 mean that the amount of data in full storage mode would be N = nd = 26532 ≈ 1.16E77 ≈ 1E78 bytes. In TT-low-rank approximation it is ca. 626MB, and fits on a laptop. Assuming 1GHz notebook, the KLD computation in full mode would require ca. 1.2E68sec, or more than 3E60 years, and even with a perfect speed-up on a parallel super-computer with say 1, 000, 000 processors, this would require still more than 3E54 years; compare this with the estimated age of the universe of ca. 1.4E10 years. 36 / 43
  • 38. Example 4: DKL(α1, α2) between two α-stable distributions for (α1, α2) and fixed d = 8 and n = 64. (α1, α2) (2.0, 0.5) (2.0, 1.0) (2.0, 1.5) (2.0, 1.9) (1.5, 1.4) (1.0, 0.4) DKL(α1, α2) 2.27 0.66 0.3 0.03 0.031 0.6 comp. time, sec. 8.4 7.8 7.5 8.5 11 8.7 max. TT rank 78 74 76 76 80 79 memory, MB 28.5 28.5 27.1 28.5 35 29.5 µ1 = µ2 = 0, C1 = C2 = I. AMEn tol.= 10−12 . 37 / 43
  • 39. Example 3: Hellinger distance DH(α1, α2) for the d-variate elliptically contoured α-stable distribution for α = 1.5 and α = 0.9 for different d and n. µ1 = µ2 = 0, C1 = C2 = I. d 16 16 16 16 16 16 32 32 32 32 n 8 16 32 64 128 256 16 32 64 128 DH(1.5, 0.9) 0.218 0.223 0.223 0.223 0.219 0.223 0.180 0.176 0.175 0.176 comp. time, sec. 2.8 3.7 7.5 19 53 156 11 21 62 117 max. TT rank 79 76 76 76 79 76 75 71 75 74 memory, MB 7.7 17 34 71 145 283 34 66 144 285 AMEn tolerance is 10−9 . 38 / 43
  • 40. Example 6: DH vs. TT (AMEn) tolerances TT(AMEn) tolerance 10−7 10−8 10−9 10−10 10−14 DH(1.5, 0.9) 0.1645 0.1817 0.176 0.1761 0.1802 comp. time, sec. 43 86 103 118 241 max. TT rank 64 75 75 78 77 memory, MB 126 255 270 307 322 Computation of DH(α1, α2) between two α-stable distributions (α = 1.5 and α = 0.9) for different AMEn tolerances. n = 128, d = 32, µ1 = µ2 = 0, C1 = C2 = I. 39 / 43
  • 41. Conclusion Provided numerical ways to compute 1. entropy, KLD, and f-divergences in low-rank tensor format 2. functions f (·) = {sign(·), (·)−1 , √ ·, m √ ·, (·)k , log (·), exp (·), (·)2 , | · |}, of pcf and pdf 3. low-rank approximations, which help to reduce the complexity and storage from exponential O(nd ) to linear, e.g. O dnr2 for the TT format. 40 / 43
  • 42. Literature 1. A. Litvinenko, Y. Marzouk, H.G. Matthies, M. Scavino, A. Spantini, Computing f-Divergences and Distances of High-Dimensional Probability Density Functions – Low-Rank Tensor Approximations, Numer Linear Algebra Appl. 2022;e2467. https://doi.org/10.1002/nla.2467 2. M. Espig, W. Hackbusch, A. Litvinenko, H.G. Matthies, E Zander, Iterative algorithms for the post-processing of high-dimensional data, JCP 410, 109396, 2020, https://doi.org/10.1016/j.jcp.2020.109396 3. S. Dolgov, A. Litvinenko, D. Liu, KRIGING IN TENSOR TRAIN DATA FORMAT, Conf. Proc., 3rd Int. Conf. on Uncertainty Quantification in CSE, https://files.eccomasproceedia. org/papers/e-books/uncecomp_2019.pdf, pp 309-329, 2019 41 / 43
  • 43. Literature 4. A. Litvinenko, D. Keyes, V. Khoromskaia, B.N. Khoromskij, H.G. Matthies, Tucker tensor analysis of Matérn functions in spatial statistics, Computational Methods in Applied Mathematics, vol. 19, no 1, 2019, pp 101-122, https://doi.org/10.1515/cmam-2018-0022 5. A. Litvinenko, R. Kriemann, M.G. Genton, Y. Sun, D.E. Keyes, HLIBCov: Parallel hierarchical matrix approximation of large covariance matrices and likelihoods with applications in parameter identification, MethodsX 7, 100600, 2020, https://doi.org/10.1016/j.mex.2019.07.001 6. A. Litvinenko, Y. Sun, M.G. Genton, D.E. Keyes, Likelihood approximation with hierarchical matrices for large spatial datasets, Computational Statistics Data Analysis 137, pp 115-132, 2019, https://doi.org/10.1016/j.csda.2019.02.002 42 / 43
  • 44. Acknowledgement Funding information: Alexander von Humboldt-Stiftung, Deutsche Forschungsgemeinschaft, Gay-Lussac Humboldt Research Award 43 / 43