0% found this document useful (0 votes)

67 views

Learning Multidimensional Fourier Series With Tensor Trains

Tensor

Uploaded by

Jonathan paulino castillo

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views

Learning Multidimensional Fourier Series With Tensor Trains

Tensor

Uploaded by

Jonathan paulino castillo

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Learning Multidimensional Fourier

Series With Tensor Trains

Sander Wahls, Visa Koivunen,

H. Vincent Poor and Michel Verhaegen

This paper has been accepted for publication in the proceedings of the 2nd
IEEE Global Conference on Signal and Information Processing (GlobalSIP)
which was held in Atlanta, GA, USA in December 2014.

Copyright 2014 IEEE. Published in the 2nd IEEE Global Conference on Sig-
nal and Information Processing (GlobalSIP 2014), scheduled for 3-5 Decem-
ber 2014 in Atlanta, GA, USA. Personal use of this material is permitted.
However, permission to reprint/republish this material for advertising or
promotional purposes or for creating new collective works for resale or re-
distribution to servers or lists, or to reuse any copyrighted component of
this work in other works, must be obtained from the IEEE. Contact: Man-
ager, Copyrights and Permissions / IEEE Service Center / 445 Hoes Lane /
P.O. Box 1331 / Piscataway, NJ 08855-1331, USA. Telephone: + Intl. 908-
562-3966.
Learning Multidimensional Fourier Series With Tensor Trains
Sander Wahls , Visa Koivunen: , H. Vincent Poor; and Michel Verhaegen
Delft Center for Systems and Control, TU Delft, The Netherlands. Email: {s.wahls,m.verhaegen}@tudelft.nl
: Department of Signal Processing and Acoustics, Aalto University, Finland. Email: [email protected]
; Department of Electrical Engineering, Princeton University, USA. Email: [email protected]

Abstract—How to learn a function from observations of inputs and where Σ diagpσ1 , . . . , σD q ¡ 0 is a positive-definite weighting
noisy outputs is a fundamental problem in machine learning. Often, matrix and }x}2Σ : xT Σx, is a popular example for a space of
an approximation of the desired function is found by minimizing
approximation functions. The space GΣ is N -dimensional, which is
a risk functional over some function space. The space of candidate
functions should contain good approximations of the true function, but why the minimization of the risk (2) becomes infeasible for large-
it should also be such that the minimization of the risk functional is scale data sets. In order to reduce the computational complexity,
computationally feasible. In this paper, finite multidimensional Fourier Rahimi and Recht [5] have proposed to replace GΣ with a lower-
series are used as candidate functions. Their impressive approximative dimensional space that is in some sense close to GΣ . With samples
capabilities are illustrated by showing that Gaussian-kernel estimators
can be approximated arbitrarily well over any compact set of bandwidths ω 1 , . . . , ω D?taken from a normal distribution with zero mean and
with a fixed number of Fourier coefficients. However, the solution of the covariance 2Σ, and samples b1 , . . . , bD taken from the uniform
associated risk minimization problem is computationally feasible only if distribution on r0, 2π s, they proposed to replace GΣ with [6, p. 3]
the dimension d of the inputs is small because the number of required
# +
Fourier coefficients grows exponentially with d. This problem is addressed ¸
D
by using the tensor train format to model the tensor of Fourier coefficients RΣ,D : f pxq αi cosp ω Ti x bi q : αi P R, i 1, . . . , D ,
under a low-rank constraint. An algorithm for least-squares regression
i 1
is derived and the potential of this approach is illustrated in numerical
¸
αi2 e}xris}Σ .
D
experiments. The computational complexity of the algorithm grows only
linearly both with the number of observations N and the input dimension
}f }2R :
2

i 1
Σ,D

d, making it feasible also for large-scale problems.

Index Terms—Kernels, Risk Minimization, Tensor Train Format, Low-
Rank Constraints, Large-Scale Learning (Function spaces of this form have already been used earlier [7], [8],
but Rahimi and Recht established the connection between the random
I. I NTRODUCTION distribution of the weights and kernels, including approximation
Many problems in machine learning can be formulated as risk guarantees.) In many large-scale learning problems, D can be chosen
minimization problems [1], [2]. Consider an unknown function much smaller than N without sacrificing performance. Another
popular approach that can be fit into this framework is based on
F : Rd X Ñ Y R, X : r0.5, 0.5sd , (1) Nyström sampling [6], [9]. Both approaches, random Fourier features
which has been sampled at randomly chosen nodes xr1s, . . . , xrN s and Nyström sampling, are randomized techniques with probabilistic
under additive noise: y ris : F pxrisq εris. Let F denote a space guarantees of being close to GΣ that hold for a specific, fixed Σ.
of functions mapping X to Y and let ` : Y Y Ñ R denote a loss See, e.g., [5, Claim 1] or [10, Theorem 3] for examples.
function. Then, one wants to minimize the (frequentist) expected risk In this paper, we propose finite multidimensional Fourier series
» as approximation functions. We will demonstrate that finite multidi-
R p f q : ` pf pxq, y q dppx, y q, f P F. mensional Fourier series are well-suited for function approximation
X Y by establishing deterministic guarantees of closeness to GΣ . In
Since the expected risk is often not available, in practice usually the contrast to the approaches mentioned above, these guarantees are
following regularized empirical risk is minimized: not restricted to a specific choice of Σ. More precisely, we show
1 ¸
N that the span of ΣPE GΣ , where E is an arbitrary compact set of
Remp pf q : `pf pxrisq, y risq λ}f }2F , λ ¥ 0, f P F. (2) positive-definite diagonal weights, can be approximated arbitrarily
N i1
well through spaces of finite multidimensional Fourier series. As
Here, }f }F is some norm for functions in F. The term λ}f }2F is one would expect, this strong property comes at a price. The total
often added in order to avoid overfitting to training data samples. number of Fourier coefficients that is needed to achieve these error
Overfitting leads to a reduced generalization capacity for new inputs bounds grows exponentially with the dimension d of the inputs, which
not included in the training set. seems to render numerical solution of the risk minimization problem
The space of Gaussian-kernel estimators equipped with the norm infeasible in this case. However, we will demonstrate that the tensor
of the surrounding reproducing kernel Hilbert space [3], [4] train format [11] can be exploited to beat the curse of dimensionality
# + by applying a low-rank constraint to the tensor of Fourier coefficients.
¸
αi e}xrisx}Σ : αi
N
GΣ : f pxq P R, i 1, . . . , N
2
, (3) The paper is structured as follows. In the next section, spaces of

i 1 finite multidimensional Fourier series are introduced and guarantees
¸
αi αj e}xrisxrj s}Σ ,
N of closeness to GΣ are established. Then, in Section III, a numerically
}f }2G : 2
efficient algorithm that addresses the minimization of the risk (2) with

Σ
i,j 1
quadratic loss based on low-rank constraints for the tensor of Fourier
This work was supported in part by the German Research Foundation (DFG) coefficients is presented. Numerical experiments are presented after-
under Grant WA 3139/1-1. wards in Section IV. The paper is finally concluded in Section V.
° °
II. U NIFORM A PPROXIMATION OF G AUSSIAN -K ERNEL The function f : M j 1 βj
N
i1 αi,j fi,j belongs to Tm,p because
E STIMATORS T HROUGH M ULTIDIMENSIONAL F OURIER S ERIES Tm,p forms a vector space. The claim now follows using (9) after
In this section, first spaces of finite multidimensional Fourier series the triangle inequality has been applied to |g pxq f pxq|.
are introduced. Then,

guarantees of closeness between these spaces III. B REAKING T HE C URSE O F D IMENSIONALITY
and the span of ΣPE GΣ are derived.
Let x P X denote an input, and define the index set The dimension of the spaces Tm,p defined in (5) corresponds to
! ) the number of coefficients in the tensor C, which is |I d | md . The
I : , 1 , m P 2N. minimization of the risk (2) for F Tm,p is therefore not feasible for
m m m
1, . . . ,
2 2 2
high-dimensional inputs. However, motivated by the success of low-
Then, the tensor Fourier feature associated with x is given by rank approximation in the matrix case, in this section an algorithm
Dpxq : rdl pxqslPI d , dl pxq : e2π i l
T
{ .
x p that addresses the minimization of the risk (2) for the quadratic loss
`px, y q : |x y |2 under a rank-constraint with respect to the tensor
The number of indices per dimension m P 2N and the scaling factor train format is proposed. Tensor trains have recently attracted much
p ¥ 1 are two fixed parameters that have to be chosen a priori. We attention [13] because they combine the advantages of the canonical
propose to use finite multidimensional Fourier series tensor format (no curse of dimensionality) and the Tucker tensor
¸
f pxq cl d¯l pxq : JC, DpxqK,
format (reliable numerical algorithms) [11], [14], [15].
(4)
P
l Id A. Functions In Tm,p With Low-Rank Coefficient Tensors
where J, K is the tensor dot product and C rcl slPI d is a parameter A tensor C rcl slPI d is said to be a tensor train of rank at most
tensor, for risk minimization. The space of all such functions is r if, for any l rl1 , . . . , ld sT P I d , there exist matrices1
! )
Tm,p : f pxq JC, DpxqK : C rcl slPI d P C m G1 pl1 q P C1r , G2 pl2 q, . . . , Gd1 pld1 q P Crr , Gpld q P Cr1
d
i 1 . (5)

Using the multidimensional Fourier series of the indicator function such that cl G1 pl1 q Gd pld q [11]. We denote the set of all
IX of X [12, Ch. 8.1], we find that the quadratic norm satisfies such tensor trains by Trm . The corresponding subset of Tm,p is
» »
}f }2T : f pxqf¯pxqdx
¸
cl c̄k e2π ipklq
T
{
x p
r
Tm,p : tf pxq JC, DpxqK : C P Trm u .
m,p
P At this point, note that Dpxq is a tensor train of rank one because
X l,k I d Rd

dl pxq e2π i l1 x1 {p e2π i ld xd {p . The next lemma thus shows

¸ ¹ k i li
d
IX pxqdx cl c̄k sinc . (6) how functions in Tm,p r
can be evaluated efficiently.
P
l,k I d
i 1
p
Lemma 3 (In part from [11], p. 2309). Consider two tensor trains
The following proposition demonstrates that Tm,p provides arbi-
trarily good approximations of whole families of Gaussian kernels if C rcl slPI d P Trm,p , cl G1 pl1 q Gd pld q, (10)
the parameters m and p are chosen large enough. Z rzl slPI P T1m,p , zl z1 pl1 q zd pld q,
d

Proposition 1. Let ¡ 0 and 0 σ ¤ σ̄. Furthermore, denote the °

and define matrices Γk pZq : lPI Gk plqz̄k plq,
set of all Σ diagpσ1 , . . . , σd q such that σi P rσ, σ̄ s for all i by #
rσI, σ̄Is. Then, there exist parameters m and p such that for each Γ1 pZq Γk1 pZq , k ¡ 1
Σ P rσI, σ̄Is there is a function fΣ P Tm,p that satisfies Lk pZq :
,k 1
,
1
#
sup | e}xy} fΣ px yq| ¤ .
2
Γk 1 pZq Γd pZq , k d
R k p Z q :
Σ (7)
P
,k d
x,y X ,
1
Proof: Please see the appendix. $
& z̄k p m q . . . z̄k p m 1q b Ir , k ¡ 1
Hk pZq :
The next corollary extends this result to linear combinations of 2 2 ,
Gaussians and arbitrary compacts sets of weight matrices. % z̄k p m q . . . z̄k p m 1q ,k 1
Corollary 2. Let ¡ 0 and let E Rdd denote a compact
2 2

GL k : Gk p m 2
qT . . . Gk p m2 1qT T , (11)
set of diagonal positive-definite weight matrices. Then, there exist
parameters m and p such that for any finite linear combination where b denotes the Kronecker product [16] and Ir denotes the
¸ ¸ r r identity matrix. Then, with vecpq being the operator that maps
M N
}xrisx}2Σj
g pxq βj αi,j e , αi,j , βj P R, Σj P E, matrices to vectors by stacking their columns,

JC, ZK RTk pZq b Lk pZqHk pZq vecpGL
k q.
j 1 i 1
(12)
of elements in ΣPE GΣ , there exists a function f P Tm,p that
approximates the function g up to the following error: Proof: By definition, Γk Hk GL k . The Kronecker product

¸
M ¸
N has the well-known property that AXB C for arbitrary ma-
sup |g pxq f pxq| ¤ |αi,j ||βj |. (8) trices A, B, C and X of compatible dimensions if and only if
x XP
j 1i 1 pBT b Aq vecpXq vecpCq [16]. Thus, with B Ir , one obtains
Proof: Since all elements in E are positive-definite and E is vecpΓk q pIr b Hk q vecpGL k q. We find that

compact, there exist 0 σ ¤ σ̄ such that E rσI, σ̄Is. Proposition

¸ ¸
JC, ZK G1 pl1 qz̄1 pl1 q Gd pld qz̄d pld q
1 thus shows that there are m and p such that, for all i and j, P
l1 I P
ld I
}xrisx}2Σj
Dfi,j P Tm,p : sup | e fi,j pxris xq| ¤ . (9) 1 The
P
x X
definition of a tensor train given here has been simplified [11].

Γ1 Γd Lk Γk Rk RTk b Lk vecpΓk q Algorithm 1 Alternating minimization of Remp for F Tm,p
r

Input: Left-orthonormal initial guess cl G1 pl1 q Gd pld q

RTk b Lk pIr b Hk q vecpGLk q RHS of (12). Repeat until convergence:
The norm of a function in r
Tm,p can be computed as follows. for k 1, . . . , d:
Compute tLk pZi quNi1 , tRk pZi qui1 , Ak and Bk
N

Update tGk plqulPI by minimizing (16) and using (11)

Lemma 4. Let C be a tensor train as in (10), and let Ak and Bk
If k d: orthonormalize tGk plqulPI
denote matrices that satisfy (14) and (15) on top of the next page.
Furthermore, let S be such that S S rsincp lp s qsl,sPI . Then, with
GLk as in (11), the norm of f pxq JC, DpxqK is given by
2
}f }2T m,p
ATk b S b Bk vecpGL
k q .

(13) is left-orthonormal, if the left-unfolding GLk defined in (11) of each
but the last core has orthonormal rows (or columns). A common
Proof: Using (6) and [17, p. 294], one obtains approach to ensure that the tensor train is left-orthonormal after each
! ¸ ¸
}f } tr Gd psd q G1 ps1 q
2 iteration is to use a QR factorization to make the updated core left-
P
l1 ,s1 I P
ld ,sd I
orthonormal by replacing its left-unfolding with Q. The tensor train

) remains the same if the non-orthonormal part that corresponds to R is
¹ si li
d
G 1 p l1 q G d p ld q sinc shifted into the next core. However, since that core will be overwritten
i 1
p in the next step, this is not necessary. The left-unfolding GL k can

! ¸
sk lk ) instead be directly orthonormalized using, e.g., the modified Gram-
tr A Gk psk q B Bk Gk plk qAk sinc
k k Schmidt method. Algorithm 1 provides an overview of the procedure.
P
lk ,sk I

p
!¸
sl )
¸ Remark 7. Algorithm 1 converges locally around a local minimum if
tr A
k Gk psq
sinc B
k Bk Gk plqAk the Hessian of this minimum has maximal rank [15, Corollary 2.9].
P!
s I lPI
)
p
Remark 8. The minimization of (16) in a least-squares sense re-

tr Ak pGk q S S b Bk Bk GLk Ak
L quires OppN mr2 qm2 r4 q flops using standard techniques. The
2 orthonormalization of a core requires Opmr3 q flops. Remark 6
pS b Bk qGLk Ak F RHS of (13). implies that forming the coefficient matrix in (16) in general requires
OpdpN mrqmr2 q flops. In Algorithm 1 however, where the cores
Remark 5. The matrix rsincp lp s qsl,s is positive semi-definite by (6). are updated sequentially, it is possible to do this more efficiently.
Remark 6. The matrices Γ1 , . . . , Γd in Lemma 3 can be computed The matrices Lk can be updated as Lk Lk1 Γk1 . The matrices
using Opdmr2 q floating point operations (flops). Forming Lk and R1 , . . . , Rd can be efficiently precomputed at the beginning of each
Rk then takes Opdr2 q flops because Γ1 and Γd are vectors. The iteration (when k 1) using the formula Rj 1 Γj Rj because
computation of Ak and Bk in Lemma 4 requires Opdm2 r3 q flops. the Rk 1 , . . . , Rd are independent of Γk . A similar strategy may
B. Risk Minimization Over Low-rank Coefficient Tensors be used to cope with the regularization matrices Ak and Bk . In this
The risk (2) is in general not convex over F Tm,p
way, the costs of finding the coefficient matrix in (16) can be reduced
to OppN mrqmr2 q flops. Then, the total cost of updating one core
r
. We propose
to use an alternating least squares approach as in [14] and [15] to is OppN mr2 qm2 r4 q flops, and a complete iteration in Algorithm
find a local minimum. For the quadratic loss `px, y q |x y |2 , the 1 can be carried out using only OpdpN mr2 qm2 r4 q flops.
risk (2) can be rewritten as follows. Let f pxq JC, DpxqK, where
the coefficient tensor train C rcl s is given by cl G1 pl1 q IV. N UMERICAL E XPERIMENTS
Gd pld q. Then, for any k and with Zi : Dpxrisq, the Lemmas 3 Setup: We have benchmarked Algorithm 1 for minimizing (2) with
and 4 show that the risk can be written as F Tm,p r
and `px, y q |x y |2 against standard kernel ridge
1 ¸
N
regression [21] (KRR, F GΣ ) and random Fourier features (RFF,
Remp pf q |yrj s JC, Dpxrj sqK|2 λ}f }2Tm,p N1
N j 1 F RΣ,D q for several data sets that have been downloaded from
2 [22]. Each data set has first been been randomly permuted and then
Rk pZ1 qT b Lk pZ1 qHk pZ1 q

y1 rs
partitioned into a training data set (70% of the data) and a testing data
.. ..

.
. L
vec Gk p q
.
set (30% of the data). Parameters which are not given in Figure 1 have
r s Rk pZ?
N q b Lk pZN qHk pZN q
y N T been chosen by performing a grid search. Each combination of the

N λATk b S b Bk parameters was evaluated by performing a 5-fold cross validation on
0
the training data. The predictors with respect to the best parameters
(16)
were then trained on the training data and evaluated on the test data.
The size of the coefficient matrix in (16) is pN mr q mr2 2
The reported errors are average values taken over 10 experiments.
for k 1, d. Thus, a single core tGk plqulPI of the tensor C can Implementation Details: The inputs xr1s, . . . , xrN s have been
be updated by solving the linear least squares problem to minimize rescaled (all with the same scalar) such that xr1s, . . . , xrN s P X with
(16). In the alternating least squares approach, the cores are updated X as in (1). Uniform weights Σ σI have been used in order to keep
sequentially. In one iteration of the algorithm, first G1 is updated by the grid search feasible. The dimension of the random Fourier features
minimizing (16), then G2 is updated in the same way, etc., until Gd was chosen equal to the number of floats needed to store the tensor
has been updated. The iterations are repeated until convergence. train used in Algorithm 1: D 2mr pd 2qmr2 . When random
An important implementation detail arises because the representa- initializations were used (Alg. 1 and RFF), three different initializa-
tion cl G1 pl1 q Gd pld q of a tensor train is highly non-unique. tions have been evaluated and the one with the smallest training error
To avoid numerical problems, the tensor train should be stored using a was used. Algorithm 1 always performed 10 iterations. The source
canonical representation. A representation cl G1 pl1 q Gd pld q code is available online at http://bitbucket.com/wahls/mdfourier.

¸ ¸ sd ld lk
Ak A 1 p lk 1 q Gd pld qGd psd q sinc
1 psk 1 q sinc
sk 1 1
k Gk Gk , (14)
lk 1 ,sk 1 I P P
ld ,sd I
p p

¸ ¸ s1 l1 sk1 lk1
B Bk
k Gk1 psk1 q G1 ps1 q G1 pl1 q sinc Gk1 plk1 q sinc . (15)
P P p p

lk 1 ,sk 1 I l1 ,s1 I

Alg. 1 KRR RFF

data set N d Test err. CV err. m r Test err. CV err. Test err. CV err. D
airfoil [18] 1503 5 0.03 0.03 12 2 0.03 0.03 0.10 0.03 192
yacht [19] 308 6 0.06 0.05 6 1 0.04 0.05 0.26 0.25 36
concrete [20] 1030 8 0.15 0.14 8 3 0.15 0.15 0.16 0.16 480
} ŷ}{}y}. The vector y contains the stacked outputs, ŷ predicted values. Algorithm 1 performs very close to
Figure 1. Test and cross-validation errors y
kernel ridge regression (KRR) and better than random Fourier features with the same amount of memory (RFF) on all three data sets.

?
?

sup ? sup ?
2σp2 2σp2
Results: The results are reported in Figure 1. Algorithm 1 has been π
1 ¤ π
1
able to perform similarly to kernel ridge regression with less resources σ Prσ ,σ̄ s p σ mπ 2 σ Prσ ,σ̄ s σ mπ 2
c 2

c
(D floats instead of N ) on all three data sets. Algorithm 1 performed
better than random Fourier features that have been provided the same ¤ πσ 1 2σ̄p mπ 2
¤ 2
π
σ
;
amount of memory in all cases. On the airfoil and yacht data sets, the
test error could be reduced significantly by a factor of three and five,
ùñ sup E2 pm, p, σ q ¤
m2 π2
sup e 4p2 σ
respectively. The high average test error for random Fourier features
σ Prσ,σ̄s σ Prσ ,σ̄ s
on the airfoil data set was caused by a single experiment (out of ten). ?π

V. C ONCLUSION sup ? 1 2σp2
2 .
σ Prσ,σ̄s p σ mπ 2
The numerical experiments have confirmed the approximative (18)
capabilities of multidimensional Fourier series even if a low-rank
constraint is placed on the tensor of Fourier coefficients. The pro- The claim now follows from Lemma 9, (17) and (18):
sup sup | eσx °l1 bl e2π i lx{p |
2 m
posed algorithm performs as well as kernel ridge regression and better 2

σ Prσ ,σ̄ s xPR

m
2
than random Fourier features – often significantly – while having low 1 |x|¤ 2
computational costs. An improved alternating least squares method
as in [14] that adapts the dimensions of the cores of the tensor train ¤
sup E1 pp, σ q sup E2 pm, p, σ q ¤

.
σ Prσ ,σ̄ s σ Prσ ,σ̄ s 2 2
during the iterations would be an interesting topic for future research.
Proof of Proposition 1: Since any rescaling of the inputs can
A PPENDIX : P ROOF OF P ROPOSITION 1 be compensated by changing Σ, we assume X r 14 , 14 sd during
Lemma 9 ([23]). Let m P 2N, p ¥ 1, σ ¡ 0, and define this proof without loss of generality. We only discuss the case d 2.
σ p2p1q2
E1 pp, σ q : 2 e 1q and E2 pm, p, σq :
Lemma 10m shows that there are m and p such that the functions
1
1 σpp2p ° 2 1 2π i lx{p
gσ pxq : l m bl pp, σ q e
4

?π m22π2 ? l2 π2 {pσp2 q satisfy

. Then, with bl pp, σ q :
| eσpxyq gσ px yq| ¤ ?{2.
? e 4p σ 1 2σp2 ? e
π
,
2
p σ mπ 2 p σ
@σ P rσ, σ¯s : sup
2
(19)
| eσx °l1 bl pp, σq e2π i lx{p | ¤ E1 |x|,|y|¤ 41
2 m
2
sup E2 .
xP R
Let us fix an arbitrary Σ diagpσ1 , σ2 q P rσI, σ̄Is. We set CΣ :
m
2
|x|¤ 12
Lemma 10. Let ¡ 0 and 0 σ ¤ σ̄. Then, there are m and p s.t. rbl1 pp, σ1 qbl2 pp, σ2 qslrl1 ,l2 sT PI2 , and define a function fΣ P Tm,p
by fΣ pxq : JCΣ , DpxqK. This function is a product of two gσ :
1
sup | eσx l 2π i lx{p | ¤ .
°m ° m
@σ P rσ, σ̄s : xP R
2
2
m bl pp, σ q e
2 fΣ pxq
°
lP I2 bl1 bl2 e2π i l
T
x p{ 2
1
l1 m bl1 e2π i l1 x1 {p
|x|¤ 12 ° m
1 2

{ gσ px1 qgσ px2 q.

2π i l2 x2 p
Proof: First, choose p ¥ 1 large enough such that m2 bl2 e
2
l2 1 2

supσPrσ ,σ̄s exp σ p2p 1q2 {4 exp σ p2p 1q2 {4 ¤ 8ε and Let us now fix arbitrary x, y P X , and define ei px, y, Σq :
supσPrσ ,σ̄s 1{pσpp2p 1qq 1{pσpp2p 1qq ¤ 1. Then,
?
eσi pxi yi q gσi pxi yi q. Note that |ei | ¤ {2 by (19). The
2

claim (7) now follows from

σ p2p1q2
2 sup e

sup E1 pp, σ q ¤ 4 }xy}2Σ
e fΣ x y p q

σ Prσ ,σ̄ s σ Prσ ,σ̄ s

σ1 px1 y1 q2 σ2 px2 y2 q2
1
ε e e p q p y2 q
gσ1 x1 y1 gσ2 x2
σ Prσ ,σ̄ s σpp2p 1q
1 sup . (17)
σ1 px1 y1 q2 σ2 px2 y2 q2
2
e e2 e1 e
e1 e2
Next, choose m ¡ 0 large enough such that
c ¤ | eσ px y q e2 | |e1 eσ px y q | |e1 e2 |
1 1 1
2
2 2 2
2

e ¤ 4ε ¤ |e1 | |e1 e2 | |e2 | ¤ p|e1 | |e2 |q2 ¤ p2?{2q2 .

m2 π 2 m2 π 2 σ
sup e 4p2 σ 4p2 σ̄ ,
σ Prσ ,σ̄ s π
R EFERENCES
[1] K.-R. Müller, S. Mika, G. Rätsch, K. Tsuda, and B. Schölkopf, “An
introduction to kernel-based learning algorithms,” IEEE Trans. Neural
Networks, vol. 12, no. 2, pp. 181–201, Mar. 2001.
[2] G. Pillonetto, F. Dinuzzo, T. Chen, G. De Nicolao, and L. Ljung,
“Kernel methods in system identification, machine learning and function
estimation: A survey,” Automatica, vol. 50, no. 3, pp. 657–682, Mar.
2014.
[3] P. Exterkate, “Model selection in kernel ridge regression,” Comput. Stat.
Data An., vol. 68, pp. 1–16, Dec. 2013.
[4] J. Hainmueller and C. Hazlett, “Kernel regularized least squares: Re-
ducing misspecification bias with a flexible and interpretable machine
learning approach,” Political Analysis, vol. 22, no. 2, pp. 143–168, 2014.
[5] A. Rahimi and B. Recht, “Random features for large-scale kernel
machines,” in Proc. Ann. Conf. Neural Inf. Process. Syst. (NIPS),
Vancouver, Canada, Dec. 2007.
[6] T. Yang, Y.-F. Li, M. Mahdavi, R. Jin, and Z.-H. Zhou, “Nyström method
vs random Fourier features: A theoretical and empirical comparison,” in
Proc. Ann. Conf. Neural Inf. Process. Syst. (NIPS), Lake Tahoe, NV,
Dec. 2012.
[7] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine:
A new learning scheme of feedforward neural networks,” in Proc. IEEE
Int. Joint Conf. Neural Networks (IJCNN), Budapest, Hungary, Jul. 2004.
[8] ——, “Extreme learning machine: Theory and applications,” Neurocom-
put., vol. 70, no. 1–3, pp. 489–501, 2006.
[9] G.-X. Yuan, C.-H. Ho, and C.-J. Lin, “Recent advances of large-scale
linear classification,” Proc. IEEE, vol. 100, no. 9, pp. 2584–2603, Sep.
2012.
[10] P. Drineas and M. W. Mahoney, “On the Nyström method for approx-
imating a Gram matrix for improved kernel-based learning,” J. Mach.
Learn. Res., vol. 6, pp. 2153–2175, 2005.
[11] I. V. Oseledets, “Tensor-train decomposition,” SIAM J. Sci. Comput.,
vol. 33, no. 5, pp. 2295–2317, 2011.
[12] B. Osgood, “The Fourier transform and its applications,” Stanford
Univ., Lect. Not. EE 261, Fall 2007, http://see.stanford.edu/see/materials/
lsoftaee261/handouts.aspx.
[13] L. Grasedyck, D. Kressner, and C. Tobler, “A literature survey of low-
rank tensor approximation techniques,” GAMM-Mitteilungen, vol. 36,
no. 1, pp. 53–78, Aug. 2013.
[14] S. Holtz, T. Rohwedder, and R. Schneider, “The alternating linear
scheme for tensor optimization in the tensor train format,” SIAM J. Sci.
Comput., vol. 34, no. 2, pp. A683–A713, 2012.
[15] T. Rohwedder and A. Uschmajew, “On local convergence of alternating
schemes for optimization of convex problems in the tensor train format,”
SIAM J. Numer. Anal., vol. 51, no. 2, pp. 1134–1162, 2013.
[16] H. V. Henderson and S. R. Searle, “The vec-permutation matrix, the vec
operator and Kronecker products: A review,” Linear Multilinear Algebra,
vol. 9, no. 4, pp. 271 – 288, 1981.
[17] C. Van Loan and N. P. Pitsianis, “Approximation with Kronecker
products,” in Linear Algebra for Large Scale and Real Time Applications,
M. S. Moonen, G. H. Golub, and B. R. L. De Moor, Eds. Dordrecht,
Netherlands: Kluwer Pub., 1993, pp. 293–314.
[18] T. F. Brooks, D. S. Pope, and M. A. Marcolini, “Airfoil self-noise and
prediction,” NASA, Tech. Rep. RP-1218, 1989.
[19] J. Gerritsma, R. Onnink, and A. Versluis, “Geometry, resistance and
stability of the Delft systematic yacht hull series,” Int. Shipbuilding
Progr., vol. 28, pp. 276–297, 1981.
[20] I.-C. Yeh, “Modeling of strength of high-performance concrete using
artificial neural networks,” Cem. Concr. Res., vol. 28, no. 12, pp. 1797–
1808, Dec. 1998.
[21] C. Saunders, A. Gammerman, and V. Vovk, “Ridge regression learning
algorithm in dual variables,” in Proc. Int. Conf. Mach. Learning (ICML),
Madison, WI, Jul. 1998.
[22] K. Bache and M. Lichman, “UCI machine learning repository,” UC
Irvine, School of Inf. Comput. Sci., 2014, http://archive.ics.uci.edu/ml.
[23] S. Kunis, D. Potts, and G. Steidl, “Fast Gauss transforms with complex
parameters using NFFTs,” J. Numer. Math., vol. 14, pp. 295–303, 2006.

PF2 S01-00 - Origin of The Open Road v2
100% (1)
PF2 S01-00 - Origin of The Open Road v2
30 pages
Fluid Mechanics 2 Experiment 1: Aerofoil Test in Wind Tunnel at Different Angles of Attack.
No ratings yet
Fluid Mechanics 2 Experiment 1: Aerofoil Test in Wind Tunnel at Different Angles of Attack.
8 pages
Data Fitting and Uncertainty (A Practical Introduction To Weighted Least Squares and Beyond)
No ratings yet
Data Fitting and Uncertainty (A Practical Introduction To Weighted Least Squares and Beyond)
6 pages
Optimization Theory with Applications
From Everand
Optimization Theory with Applications
Donald A. Pierre
4/5 (4)
UM6WG1 Marine Gear Housing
No ratings yet
UM6WG1 Marine Gear Housing
12 pages
Six Lectures On NN - Montanari
No ratings yet
Six Lectures On NN - Montanari
77 pages
Notes On Deep Learning Theory
No ratings yet
Notes On Deep Learning Theory
68 pages
Prof. Richardson Neuralnetworks
No ratings yet
Prof. Richardson Neuralnetworks
61 pages
MAI Lecture 07 RBFN
No ratings yet
MAI Lecture 07 RBFN
23 pages
Vahid
No ratings yet
Vahid
18 pages
Lecture3 2015
No ratings yet
Lecture3 2015
38 pages
Neural Network Lectures RBF 1
No ratings yet
Neural Network Lectures RBF 1
44 pages
Statistical Learning Theory for Neural Operators
No ratings yet
Statistical Learning Theory for Neural Operators
68 pages
Practice 1130
No ratings yet
Practice 1130
20 pages
Lecture Notes For Machine Learning Theory
No ratings yet
Lecture Notes For Machine Learning Theory
167 pages
Online Learning Lecture Notes 2011 Oct 20
No ratings yet
Online Learning Lecture Notes 2011 Oct 20
125 pages
Introduction To Machine Learning Lecture 2: Linear Regression
No ratings yet
Introduction To Machine Learning Lecture 2: Linear Regression
38 pages
Montanari
No ratings yet
Montanari
10 pages
Dommel and Pichler - 2024 - On the Approximation of Kernel functions
No ratings yet
Dommel and Pichler - 2024 - On the Approximation of Kernel functions
27 pages
Machine Learning Lecture Notes
No ratings yet
Machine Learning Lecture Notes
119 pages
Mathematical Theory of Deep
No ratings yet
Mathematical Theory of Deep
275 pages
Amath/Math 516 Second Homework Set Linear Least Squares
No ratings yet
Amath/Math 516 Second Homework Set Linear Least Squares
6 pages
Gaussian Processes For Machine
No ratings yet
Gaussian Processes For Machine
62 pages
Fundations Data Science
No ratings yet
Fundations Data Science
16 pages
NNLS1 2019 HW4 Solutions
No ratings yet
NNLS1 2019 HW4 Solutions
11 pages
SkriptOptMach
No ratings yet
SkriptOptMach
49 pages
RBF.ppt
No ratings yet
RBF.ppt
45 pages
Advanced Machine Learning
No ratings yet
Advanced Machine Learning
74 pages
Fit without fear- remarkable mathematical phenomena of deep learning through the prism of interpolation
No ratings yet
Fit without fear- remarkable mathematical phenomena of deep learning through the prism of interpolation
51 pages
18.657: Mathematics of Machine Learning: N I I H H I 1
No ratings yet
18.657: Mathematics of Machine Learning: N I I H H I 1
6 pages
Mathematical_Foundations_of_Deep_Learning
No ratings yet
Mathematical_Foundations_of_Deep_Learning
174 pages
Klqgceb Ewvhja SC
No ratings yet
Klqgceb Ewvhja SC
8 pages
2501.10465v1
No ratings yet
2501.10465v1
10 pages
1-s2.0-S1474667017477378-main
No ratings yet
1-s2.0-S1474667017477378-main
24 pages
Applying statistical learning theory to deep learning
No ratings yet
Applying statistical learning theory to deep learning
51 pages
Theory of Deep Learning 1652786371
No ratings yet
Theory of Deep Learning 1652786371
118 pages
Bishop Solutions PDF
No ratings yet
Bishop Solutions PDF
87 pages
randomproj
No ratings yet
randomproj
17 pages
Function Approximation and RBF Neural Networks
No ratings yet
Function Approximation and RBF Neural Networks
16 pages
Introduction To Radial Basis Function Networks
No ratings yet
Introduction To Radial Basis Function Networks
45 pages
Batlle et al. - 2023 - Kernel Methods are Competitive for Operator Learning
No ratings yet
Batlle et al. - 2023 - Kernel Methods are Competitive for Operator Learning
36 pages
Intro&NP Stat
No ratings yet
Intro&NP Stat
122 pages
Lect5 Reg
No ratings yet
Lect5 Reg
16 pages
MA4K0 Notes
No ratings yet
MA4K0 Notes
189 pages
RBF
No ratings yet
RBF
45 pages
cs229 Notes3
No ratings yet
cs229 Notes3
30 pages
Machine Learning - The Science of Selection under Uncertainty
No ratings yet
Machine Learning - The Science of Selection under Uncertainty
85 pages
Function Approximation
No ratings yet
Function Approximation
35 pages
Kernel Adaptive Filtering PDF
No ratings yet
Kernel Adaptive Filtering PDF
124 pages
lec11_ntk
No ratings yet
lec11_ntk
28 pages
Regression Using LS Handout
No ratings yet
Regression Using LS Handout
21 pages
Lecturenote - COL341 - 2010
No ratings yet
Lecturenote - COL341 - 2010
116 pages
04 TemporalDiffHung Hung
No ratings yet
04 TemporalDiffHung Hung
93 pages
_RBF.ppt
No ratings yet
_RBF.ppt
6 pages
Ee227c Notes 2 PDF
No ratings yet
Ee227c Notes 2 PDF
122 pages
Ee227c Notes PDF
No ratings yet
Ee227c Notes PDF
122 pages
Sigmoidal Approximation Costarelli
No ratings yet
Sigmoidal Approximation Costarelli
165 pages
Hilbert Space Methods For Reduced-Rank Gaussian Process Regression
No ratings yet
Hilbert Space Methods For Reduced-Rank Gaussian Process Regression
32 pages
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
No ratings yet
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
20 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
111 pages
Selected theoretical aspects of ML and deep learning
No ratings yet
Selected theoretical aspects of ML and deep learning
46 pages
deep-learning
No ratings yet
deep-learning
10 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Tecniplast Sustainability Report
No ratings yet
Tecniplast Sustainability Report
11 pages
RM Rural Retailing
No ratings yet
RM Rural Retailing
41 pages
WEM Documentation WEO2020
No ratings yet
WEM Documentation WEO2020
85 pages
Quotation Nurco_Magnum Sanur (13 Dec) V1.0 - Sanur Nurco Draft 1
No ratings yet
Quotation Nurco_Magnum Sanur (13 Dec) V1.0 - Sanur Nurco Draft 1
2 pages
PBD TRANSIT FORM SCIENCE DLP YEAR 5 (Version 2)
No ratings yet
PBD TRANSIT FORM SCIENCE DLP YEAR 5 (Version 2)
19 pages
Cover Letter For Journal Submission
100% (1)
Cover Letter For Journal Submission
7 pages
Air Conditioning Circuit and Cycle Diagram
100% (3)
Air Conditioning Circuit and Cycle Diagram
4 pages
AL 4043 Vs Al 5356 Limitations
No ratings yet
AL 4043 Vs Al 5356 Limitations
2 pages
05AADCK2960P1Z7 - GSTR-2A - 19feb24021450 - FY 2023-2024
No ratings yet
05AADCK2960P1Z7 - GSTR-2A - 19feb24021450 - FY 2023-2024
111 pages
Session - 8 in Line Inspection
No ratings yet
Session - 8 in Line Inspection
38 pages
Notes On Coimbatore District
No ratings yet
Notes On Coimbatore District
40 pages
Influence of Nano Priming On Seed Germination and Plant Growth of Forage and Medicinal Plants
No ratings yet
Influence of Nano Priming On Seed Germination and Plant Growth of Forage and Medicinal Plants
16 pages
Acknowledgement 1
No ratings yet
Acknowledgement 1
4 pages
Structure
No ratings yet
Structure
5 pages
Lesson Plans Grade 8 Ems 2015
No ratings yet
Lesson Plans Grade 8 Ems 2015
20 pages
CA Lecture 13
No ratings yet
CA Lecture 13
27 pages
IMIA-WGP 50 (07) : IMIA Conference Tokyo, 2007 Prepared by
No ratings yet
IMIA-WGP 50 (07) : IMIA Conference Tokyo, 2007 Prepared by
35 pages
PP Woven Bags
No ratings yet
PP Woven Bags
4 pages
HW2 Answer Key
100% (2)
HW2 Answer Key
18 pages
TCW l2.3 Global Divides - Locating The Global South
No ratings yet
TCW l2.3 Global Divides - Locating The Global South
18 pages
CASE 2 - SuperMart EPPA 4713
No ratings yet
CASE 2 - SuperMart EPPA 4713
21 pages
Whitespace Template
No ratings yet
Whitespace Template
3 pages
Republic of The Philippines Department of The Interior and Local Government
No ratings yet
Republic of The Philippines Department of The Interior and Local Government
9 pages
Dream School Proposal
No ratings yet
Dream School Proposal
9 pages
36 PDF
No ratings yet
36 PDF
9 pages
LG l1917s
No ratings yet
LG l1917s
30 pages
Cis Guide v8
No ratings yet
Cis Guide v8
51 pages

Learning Multidimensional Fourier Series With Tensor Trains

Uploaded by

Learning Multidimensional Fourier Series With Tensor Trains

Uploaded by

Learning Multidimensional Fourier

Series With Tensor Trains

Sander Wahls, Visa Koivunen,

d, making it feasible also for large-scale problems.

dl pxq  e2π i l1 x1 {p e2π i ld xd {p . The next lemma thus shows

Proposition 1. Let  ¡ 0 and 0 σ ¤ σ̄. Furthermore, denote the °

compact, there exist 0 σ ¤ σ̄ such that E  rσI, σ̄Is. Proposition

Input: Left-orthonormal initial guess cl  G1 pl1 q Gd pld q

Update tGk plqulPI by minimizing (16) and using (11)

Alg. 1 KRR RFF

σ Prσ ,σ̄ s xPR

?π m22π2 ? l2 π2 {pσp2 q satisfy

{  gσ px1 qgσ px2 q.

claim (7) now follows from

 e ¤ 4ε ¤ |e1 | |e1 e2 | |e2 | ¤ p|e1 | |e2 |q2 ¤ p2?{2q2  .

You might also like

dl pxq e2π i l1 x1 {p e2π i ld xd {p . The next lemma thus shows

Proposition 1. Let ¡ 0 and 0 σ ¤ σ̄. Furthermore, denote the °

compact, there exist 0 σ ¤ σ̄ such that E rσI, σ̄Is. Proposition

Input: Left-orthonormal initial guess cl G1 pl1 q Gd pld q

{ gσ px1 qgσ px2 q.

e ¤ 4ε ¤ |e1 | |e1 e2 | |e2 | ¤ p|e1 | |e2 |q2 ¤ p2?{2q2 .