0% found this document useful (0 votes)

10 views

Neumayer 2023 A

Uploaded by

Micool

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Neumayer 2023 A

Uploaded by

Micool

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

SIAM J. MATH. DATA SCI.

© 2023 Society for Industrial and Applied Mathematics

Vol. 5, No. 2, pp. 306--322

Approximation of Lipschitz Functions Using Deep Spline Neural Networks*

Sebastian Neumayer\dagger , Alexis Goujon\dagger , Pakshal Bohra\dagger , and Michael Unser\dagger
Downloaded 09/06/23 to 128.178.48.127 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Abstract. Although Lipschitz-constrained neural networks have many applications in machine learning, the
design and training of expressive Lipschitz-constrained networks is very challenging. Since the pop-
ular rectified linear-unit networks have provable disadvantages in this setting, we propose using
learnable spline activation functions with at least three linear regions instead. We prove that our
choice is universal among all componentwise 1-Lipschitz activation functions in the sense that no
other weight-constrained architecture can approximate a larger class of functions. Additionally, our
choice is at least as expressive as the recently introduced non-componentwise Groupsort activation
function for spectral-norm-constrained weights. The theoretical findings of this paper are consistent
with previously published numerical results.

Key words. deep learning, learnable activations, universality, robustness, Lipschitz continuity, linear splines

MSC codes. 26A16, 26B40, 41A15, 41A29, 65D07, 68T01, 94A15

DOI. 10.1137/22M1504573

1. Introduction. Lipschitz-constrained neural networks (NNs) have proven to be useful

in several areas of machine learning, for instance in the context of provably convergent Plug-
and-Play algorithms [18, 24, 28, 31, 34, 37], to obtain robustness guarantees [16, 26, 35],
or in Wasserstein generative adverserial networks (GANs) [2, 15]. However, the design and
training of Lipschitz-constrained NNs is difficult, as the computation of the Lipschitz constant
of multilayer models is known to be NP-hard. A simple upper bound is given by the product
of the Lipschitz constant of each layer, but it is usually very coarse. There exist more precise
estimators based on semidefinite programming [13, 21], adversarial training [8, 27], or the
derivation of sharper estimates for the composition of layers [38]. Unfortunately, these methods
are either computationally expensive or do not provide a proper upper bound.
A possible strategy to address these shortcomings is to design the model architecture so
that the fast-to-evaluate bounds become sharper. A general overview of NN architectures and,
in particular, Lipschitz-constrained ones can be found in [9]. The most common approach to-
ward Lipschitz-constrained architectures is to control the norm of each linear layer, typically
with the spectral or other p-norms [14, 25, 29], or by enforcing orthogonality of the weight
matrices [17, 18, 19]. In combination with 1-Lipschitz activations, this results in architectures

*
Received by the editors June 27, 2022; accepted for publication (in revised form) January 19, 2023; published
electronically May 15, 2023. Sebastian Neumayer and Alexis Goujon are contributed equally to this work.
https://doi.org/10.1137/22M1504573
Funding: The research leading to these results was supported by the European Research Council (ERC) under
European Union's Horizon 2020 (H2020), grant agreement - Project 101020573 FunLearn and by the Swiss National
Science Foundation, grant 200020 184646/1.
\dagger
\'
Biomedical Imaging Group, Ecole polytechnique f\'ed\'erale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
([email protected], [email protected], [email protected], [email protected]).
306

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

LIPSCHITZ CONSTRAINED DEEP SPLINE NEURAL NETWORKS 307

with a Lipschitz constant bounded by the product of the norms of the weights. However,
this estimate is, in general, quite pessimistic, especially for deep models. Consequently, this
additional structural constraint often leads to vanishing gradients [22] and a seriously reduced
expressivity of the model. Remarkably, the commonly used rectified linear-unit (ReLU) acti-
Downloaded 09/06/23 to 128.178.48.127 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

vation aggravates the situation. For instance, it is shown in [20] that ReLU NNs with \infty -norm
weight constraints have a second-order total variation that is bounded independently of the
depth. Further, it is proven in [1] that, under spectral norm constraints, any scalar-valued
ReLU NN \Phi with \| \nabla \Phi \| 2 = 1 a.e. is necessarily linear. To circumvent the described issues, sev-
eral new activation functions have been proposed recently, such as Groupsort [1] or the related
Householder [30] activation functions. Note that, contrary to ReLU, all of these activation
functions are multivariate. Analyzing the expressivity of the resulting NNs and determining
their applicability in practice is an active area of research.
It is by no means trivial to specify which class of functions can be approximated by a
generic NN with 1-Lipschitz layers. Ideally, given a compact set D \subset \BbbR d equipped with the p-
norm, it is desirable to approximate all scalar-valued 1-Lipschitz functions, which are denoted
by Lip1,p (D). The first result in this direction was provided in [1], where the authors show
that the use of the Groupsort activation function and \infty -norm-constrained weights indeed al-
lows for the universal approximation of Lip1,p (D). The behavior of such NNs was then further
investigated in [11, 32]. Unfortunately, the proof strategies published so far cannot be general-
ized to other norms and not even partial results are known for this very challenging problem.
Therefore, being able to compare the approximation capabilities of different architectures is an
important first step. For example, the approximation of the absolute value function, for which
an exact representation with ReLU is impossible, provides a classic benchmark to compare
architectures. From a practical perspective, Groupsort NNs have yielded promising results
and compare favorably against ReLU NNs with similar architectures [1].
Currently, the most substantial results in this area rely on multivariate activation func-
tions. Although the ReLU activation function is indeed too limiting, we claim that the class of
componentwise activation functions ought not to be dismissed off-hand. Following this idea,
we analyze deep spline NNs, whose activation functions are learnable linear splines [3, 5, 36].
Since bounds on the Lipschitz constant of compositions are usually too pessimistic, our ratio-
nale is to increase the expressivity of the activation function while still being able to efficiently
control its Lipschitz constant. As reported first in [6], Lipschitz-constrained deep spline NNs
perform well in practice and a more systematic comparison against other frameworks can be
found in [12]. In this work, we shed light on the theoretical benefits of these NNs over ReLU-
like NNs. In particular, we prove that the choice of learnable linear spline activation functions
with three regions is universal among all componentwise 1-Lipschitz activation functions. In
other words, no other weight-constrained NN with componentwise activation functions can
approximate a larger class of functions. Moreover, for the spectral-norm constraint, which
is commonly used in practice, we show that deep spline NNs are at least as expressive as
Groupsort NNs.
Outline and contributions. In section 2, we revisit 1-Lipschitz continuous piecewise-linear
(CPWL) functions and 1-Lipschitz NNs In particular, we show that they can approximate
any function in Lip1,p (D). Since the construction of 1-Lipschitz NNs is nontrivial, we briefly
discuss two architectures for this task, namely deep spline and Groupsort NNs. Then, in

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

308 S. NEUMAYER, A. GOUJON, P. BOHRA, AND M. UNSER

section 3, we extend some known results on the limitations of weight-constrained NNs with
ReLU activation functions. More precisely, we show that ReLU-like NNs cannot represent
certain simple functions for any p-norm weight constraint. Based on a second-order to-
tal variation argument, we further show that they cannot be universal approximators for
Downloaded 09/06/23 to 128.178.48.127 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

\infty -norm weight constraints. Next, in section 4, we study the approximation properties of
deep spline NNs. Here, we prove our main result, according to which deep spline NNs with
three linear regions achieve the maximum expressivity among NNs with componentwise acti-
vation functions. Further, we discuss the relation between deep spline and Groupsort NNs.
Finally, we draw conclusions in section 5.
2. Lipschitz-constrained NNs. In this paper, we investigate feedforward NN architectures
that consist of K \in \BbbN layers with widths n1 , . . . , nK that are given by mappings \Phi : \BbbR d \rightarrow \BbbR nK
of the form
(2.1) \Phi (x) := AK \circ \sigma K - 1,\alpha K - 1 \circ AK - 1 \circ \sigma K - 2,\alpha K - 2 \circ \cdot \cdot \cdot \circ \sigma 1,\alpha 1 \circ A1 (x).
Here, the affine functions Ak : \BbbR nk - 1 \rightarrow \BbbR nk are given by
(2.2) Ak (x) := Wk x + bk , k = 1, . . . , K,
with weight matrices Wk \in \BbbR nk ,nk - 1 , n0 = d and bias vectors bk \in \BbbR nk . For multilayer per-
ceptrons, Wk is learned as a full matrix, while for convolutional NNs, Wk is parametrized via
a convolution operator whose kernel is learned. The model includes parameterized nonlinear
activation functions \sigma k,\alpha k : \BbbR nk \rightarrow \BbbR nk with corresponding parameters \alpha k , k = 1, . . . , K - 1.
For the case of componentwise activation functions, we have that \sigma k,\alpha k (x) = (\sigma k,\alpha k ,j (xj ))nj=1 k
.
We sometimes drop the index k in the activation function \sigma k,\alpha k to simplify the notation. The
complete parameter set of the NN is denoted by u := (Wk , bk , \alpha k )K k=1 and the NN by \Phi (\cdot , u)
whenever the dependence on the parameters is explicitly needed. For an illustration, see
Figure 2.1. Architecture (2.1) results in a CPWL function whenever the activation functions
themselves are CPWL functions such as the ReLU. Next, we investigate the approximation
properties of this architecture under Lipschitz constraints on \Phi (\cdot , u).
2.1. Universality of 1-Lipschitz ReLU networks. First, we briefly revisit the approxima-
tion of Lipschitz function by CPWL functions, for which we give a precise definition.
Definition 2.1. A continuous function f : \BbbR d \rightarrow \BbbR n is called continuous and piecewise linear
if there exist a finite set \{ f m : m = 1, . . . , M \} of affine functions, also called affine pieces, and
d
closed sets (\Omega m )M
m=1 \subset \BbbR with nonempty and pairwise-disjoint interiors, also called projection
d
regions [33], such that \cup M m
m=1 \Omega m = \BbbR and f| \Omega m = f| \Omega m .

Assume that we are given a collection of tuples (xi , yi ) \in \BbbR d \times \BbbR , i = 1, . . . , N , which can
be interpreted as samples from a function f : \BbbR d \rightarrow \BbbR . Let
| yi - yj |
(2.3) Lpx,y := max
i\not =j \| xi - xj \| p
denote the Lipschitz constant associated with these points. Then, a first natural question is
whether it is always possible to find an interpolating CPWL function g with p-norm Lipschitz
constant Lipp (g) = Lpx,y .

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

LIPSCHITZ CONSTRAINED DEEP SPLINE NEURAL NETWORKS 309

Input Hidden Output

layer layer layer

Input 1
Downloaded 09/06/23 to 128.178.48.127 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Input 2 Output 1

Input 3 Output 2

Input 4

Figure 2.1. Model of a feedforward NN with three hidden layers, where d = 4, K = 4, n1 = n2 = n3 = 5, n4 = 2.

Proposition 2.2. For the tuples (xi , yi ) \in \BbbR d \times \BbbR , i = 1, . . . , N and p \in [1, +\infty ], there exists
a CPWL function f with Lipp (f ) = Lpx,y such that g(xi ) = yi for all i = 1, . . . , N .
Since we are unaware of a proof for general p, we provide one below.
Proof. Let q be such that 1/p + 1/q = 1. For p < +\infty , define ui,j \in \BbbR d as the vector
given by

(2.4) (ui,j )k = sgn((xi - xj )k )| (xi - xj )k | p/q .

(2.7) fi (xj ) \geq fi,j (xj ) = yi + yj - yi = yj ,

which then implies that f (xj ) = yj for any j = 1, . . . , N . Further, we directly get that
Lipp (f ) = Lpx,y . Finally, by recalling that the maximum and the minimum of any num-
ber of CPWL functions is CPWL as well [33], we conclude that f is CPWL and the claim
follows.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

310 S. NEUMAYER, A. GOUJON, P. BOHRA, AND M. UNSER
Downloaded 09/06/23 to 128.178.48.127 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Figure 2.2. Interpolation based on a triangulation: Let x1 , x2 , x3 \in \BbbR 2 be input data points (blue dots)
with corresponding target values y1 = 0, y2 = 1, and y3 = 1. The gray curves depict the \ell p unit balls for
p \in \{ 1, 2, 3, 4, +\infty \} . For the left plot, we set p > 1 and get Lpx,y = 1. In the right, we set p = 1 and also get
Lpx,y = 1. The unique affine function g : \BbbR 2 \rightarrow \BbbR interpolating the data is the simplest CPWL function that fits
the data. On any point x lying between x2 and x3 (red dot), it holds that g(x) = 1, hence | g(x1 ) - g(x)| = 1.
However, in both settings x is in the unit ball for the according p which implies that \| x1 - x\| p < 1. Hence,
Lipp (g) > Lpx,y and g does not interpolate the data with the minimal Lipschitz constant.

Remark 2.3. The d-dimensional construction is more involved than the one-dimensional
(1D) case, for which a simple interpolation is sufficient. A natural way to fit the data in
any dimension is to form a triangulation with vertices (xi )N i=1 . Then, with the use of the
CPWL hat basis functions of the triangulation, one can directly form an interpolating CPWL
function. Unfortunately, the Lipschitz constant of this function can exceed Lpx,y . An example
of this issue is provided in Figure 2.2.
Since the maximum and minimum of finitely many affine functions can be represented by
ReLU NNs, the same holds true for the CPWL function constructed in Proposition 2.2. This
directly leads us to a well-known corollary.
Corollary 2.4. Let D \subset \BbbR d be compact, and let p \in [1, +\infty ]. Then, the ReLU NNs \Phi : D \rightarrow \BbbR
with Lipp (\Phi ) \leq 1 are dense in Lip1,p (D).
Since computing the Lipschitz constant of a generic NN is NP-hard, Corollary 2.4 has
limited practical relevance. To circumvent this issue, either algorithms that provide tight
estimates, or special architectures with simple yet sharp bounds, are necessary. In this paper,
we pursue the second direction. To this end, we introduce tools to build Lipschitz-constrained
architectures in the remainder of this section and investigate the universality of these archi-
tectures in section 4.
2.2. 1-Lipschitz network architectures. A first step toward Lipschitz-constrained NNs
is to constrain the norm of the weights. As we are aiming for 1-Lipschitz NNs, we always
constrain them by one, but remark that other values are possible as well. If we further impose
that all activation functions \sigma k,\alpha are 1-Lipschitz, then the resulting NN is also 1-Lipschitz.
Operator-norm constraints. The p \rightarrow q operator norm is given for W \in \BbbR n,m and p, q \in
[1, +\infty ] by

(2.8) \| W \| p,q := max \| W x\| q

x\in \BbbR m ,\| x\| p =1

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

LIPSCHITZ CONSTRAINED DEEP SPLINE NEURAL NETWORKS 311

and \| \cdot \| p := \| \cdot \| p,p . Note that \| \cdot \| 1 and \| \cdot \| \infty correspond to the maximum \ell 1 norm of
the columns and rows of W , respectively. The norm \| \cdot \| 2 , also known as spectral norm,
corresponds to the largest singular value of W . To obtain a nonexpansive NN of the form
(2.1) in the p-norm sense, the weight matrices can be constrained as
Downloaded 09/06/23 to 128.178.48.127 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

(2.9) \| Wk \| p \leq 1, k = 1, . . . , K,
which we shall henceforth refer to as p-norm-constrained weights. For matrices W \in \BbbR 1,n it
holds that \| W \| p = \| W T \| q with 1/p + 1/q = 1. In other words, if we interpret these matrices
as vectors, then we have to constrain the q-norm instead. In the case of scalar-valued NNs,
we can also constrain the weights as \| Wk \| q \leq 1, k = 2, . . . , K, and \| W1 \| p,q \leq 1, since all
standard norms are identical in \BbbR . There exist several methods to enforce such constraints in
the training stage [14, 25, 29]; see Remark 2.5 for more details.
Orthonormality constraints. Instead of imposing \| W \| 2 \leq 1, we can also require that either
W W = Id or W W T = Id, depending on the shape of W . This constraint corresponds to
T

imposing that either W or W T lie in the so-called Stiefel manifold. Compared to the spectral-
norm constraint, the orthonormality constraint enforces all singular values of W to be unity.
From a computational perspective, this approach is more challenging than the previous one
but helps to mitigate the problem of vanishing gradients in deep NNs. For more details,
including possible implementations, we refer to [17, 18, 19].
Remark 2.5. Many of the implementations for the schemes of section 2.2 enforce the p-norm
constraint or orthonormality only approximately. For theoretical guarantees, it is, however,
necessary to ensure that the constraint is satisfied exactly. In practice, this means that
sufficient numerical accuracy or additional postprocessing after training might be necessary.
2.3. Special activation functions. While the quest for optimal activation functions in the
last decade leaves us with many choices, the 1-Lipschitz constraint is the game-changer and the
relevance of each activation function must be reassessed. In section 3, we provide results that
explain why the ReLU activation function is actually not suited in a Lipschitz-constrained set-
ting. Hence, we need to resort to other activation functions that lead to increased expressivity
of the resulting NN. There is a fundamental conceptual difference between componentwise and
general multivariate activation functions. In particular, finding a good trade-off in terms of
representational power and computational complexity is necessary. In the following, we briefly
discuss two corresponding families of activation functions, which have been shown experimen-
tally to be well suited in the constrained setting. Then, we further explore their usability in
the norm-constrained case and investigate the relations between the two approaches.
Deep spline NNs. A deep spline NN [4, 5, 36] uses learnable componentwise linear-spline
activation functions; see Figure 2.3. It is known that deep spline NNs are solutions of a
functional optimization problem; namely, the training of a neural network with free-form
activation functions whose second-order total-variation is regularized [36]. A linear-spline
activation function is fully characterized by its linear regions and the corresponding values at
the boundaries. In the unconstrained setting, any linear spline can be implemented by means
of a scalar one-hidden-layer ReLU NN as
M
\sum
(2.10) x \mapsto \rightarrow um ReLU(vm x + bm ),
m=1

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

312 S. NEUMAYER, A. GOUJON, P. BOHRA, AND M. UNSER
Downloaded 09/06/23 to 128.178.48.127 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Figure 2.3. Linear spline with seven knots (also known as breakpoints) and eight linear regions.

where um , vm , bm \in \BbbR and M \in \BbbN . This parameterization, however, lacks expressivity under
p-norm constraints on the weights, as it is not able to produce linear splines with second-
order total variation greater than 1, as discussed in Lemma 3.2 and section 3.2. Instead, it is
more convenient to rely on local B-spline atoms [5]. In practice, the linear-spline activation
functions have a fixed number of uniformly spaced breakpoints---typically between 10 and
50---and are expressed as a weighted sum of cardinal B-splines. This amounts to adding a
learnable parameter for each breakpoint and two additional ones to set the slope at both ends
for a linear extrapolation. This local parameterization yields an evaluation complexity that
remains independent of the number of breakpoints, in contrast with (2.10). The B-spline
framework can easily be adapted to learn 1-Lipschitz activation functions via the use of a
suitable projector on the B-splines coefficients [12].
Among weight-constrained NNs with componentwise activation functions, deep spline NNs
achieve the optimal representational power.
Lemma 2.6. Let (xn , yn ) \in (\BbbR d , \BbbR p ), n = 1, . . . , N , and \Phi a NN with K layers, parameter
set u, p-norm weight constraints and 1-Lipschitz activation functions. Then, there exists a
deep spline NN denoted by DS with the same architecture, where the activation functions are
replaced by 1-Lipschitz linear splines with no more than (N - 1) linear regions such that
(2.11) \Phi (xn , u) = DS(xn , u) f or n = 1, . . . , N.
Proof. On the data points (xn , yn )N
n=1 , the activation functions of \Phi are evaluated for at
most N different values. Hence, the result directly follows by interpolating these values using
a linear spline, which yields 1-Lipschitz linear-spline activation functions.
This result is somehow still unsatisfying as the number of linear regions grows with the
number of training points. Later, we show that linear-spline activation functions with three
linear regions are actually sufficient. This amounts to six tunable parameters per activation
function.
Groupsort. The sort operation takes a vector of dimension n and simply outputs its compo-
nents sorted in ascending order. This operation has complexity \scrO (n log(n)), which is slightly
worse than the linear complexity of componentwise activation functions. The Groupsort acti-
vation function [1] is a generalization of this operation: it splits the preactivation into groups

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

LIPSCHITZ CONSTRAINED DEEP SPLINE NEURAL NETWORKS 313

of prescribed length and performs the sort operation within each group. This results in near-
linear complexity when the group length are small enough. If the group length is two, then
the activation function is known as the MaxMin or norm-preserving orthogonal-permutation
linear unit [10]. Let us remark that any arbitrary Groupsort activation function can be written
Downloaded 09/06/23 to 128.178.48.127 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

as composition of MaxMin activation functions, i.e., larger group lengths do not increase the
theoretical expressivity. Although not obvious at first glance, the Groupsort activation func-
tion is actually a CPWL operation. The rationale for this activation function is to perform a
nonlinear and norm-preserving operation, which mitigates the issue of vanishing gradients in
deep constrained architectures. More precisely, we have that the Jacobian of the Groupsort
activation function is a.e. given by a permutation matrix, which is indeed an orthogonal ma-
trix. Motivated by this observation, this approach was recently generalized [30] to yield the
Householder activation functions \sigma v : \BbbR d \rightarrow \BbbR d with v \in \BbbR d , \| v\| 2 = 1, given by
\Biggl\{
z if v T z > 0,
(2.12) \sigma v (z) =
(Id - 2vv T )z otherwise.

On the hyperplane that separates the two cases (i.e., v T z = 0) we have that (I - 2vv T )z = z -
2(v T z)v = z. Thus, \sigma v is continuous and, moreover, the Jacobian is either I or (I - 2vv T ), which
are both square orthogonal matrices. For practical purposes, the authors of [30] recommend
using groups of length 2. This construction can be iterated to obtain higher-order Householder
activation functions with more linear regions.
3. Limitations of certain architectures. In this section, we provide results that explain
why the use of activation functions that are more complex than the ReLU is indeed necessary
for weight-constrained NNs.
3.1. Diminishing Jacobians. Componentwise and monotone activation functions are detri-
mental to the expressivity of NNs with spectral-norm-constrained weights [1, Thm. 1]. Here,
we generalize this result to NNs with p-norm-constrained weights and certain CPWL activa-
tion functions, along with a more precise characterization. In particular, we also cover the
case where \| J\Phi \| p is not 1 a.e.
Proposition 3.1. Let p \in (1, +\infty ], let I \subset \BbbR be a closed interval, and let \sigma : \BbbR \rightarrow \BbbR be a
CPWL activation function satisfying
\bullet \sigma (x) = x + b, b \in \BbbR , for x \in I,
\bullet | \sigma \prime (x)| < 1 for x \in
/ I.
d
Then, any NN \Phi : \BbbR \rightarrow \BbbR of the form (2.1) with p-norm-constrained weights and activation
function \sigma has at most one affine region \Omega i with \| J\Phi | \Omega i \| p = 1.
Proof. We proceed via induction over the number K of layers of \Phi . For K = 1, the
mapping is affine and the statement holds trivially. Now, assume that the result holds for
some K > 1. Let

(3.1) \Phi K+1 = AK+1 \circ \sigma \circ AK \circ \cdot \cdot \cdot \circ \sigma \circ A1 ,

which we decompose as \Phi K+1 = \Phi K \circ h with \Phi K = AK+1 \circ \sigma \circ AK \circ \cdot \cdot \cdot \circ \sigma \circ A2 and h = \sigma \circ A1 .
The induction assumption implies that \| J\Phi K \| p < 1 on all affine regions except possibly one.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

314 S. NEUMAYER, A. GOUJON, P. BOHRA, AND M. UNSER

The corresponding affine function fK 1 : \BbbR n1 \rightarrow \BbbR with projection region \Omega \subset \BbbR n1 takes the
K
T n
form x \mapsto \rightarrow v x + c, where v \in \BbbR 1 is such that \| v\| q \leq 1, 1/p + 1/q = 1, and c \in \BbbR . Now, we
define the set
(3.2) \Omega K+1 = \{ x \in \BbbR d : (A1 (x))l \in I for any l s.t. vl \not = 0\} \cap h - 1 (\Omega K ).
Downloaded 09/06/23 to 128.178.48.127 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

By construction, \Phi K+1 is affine on \Omega K+1 and coincides with \Phi K \circ (A1 + b) on this set. Any
other affine piece of \Phi K+1 can be written in the form of fK i \circ hj , where f i and hj are affine
K
pieces of \Phi K and h, respectively. For this composition, either of the following holds:
(i) It holds that fK i \not = f 1 , which results in \| J(f i \circ hj ))\| < 1 due to \| Jf i \| < 1.
K K p K p
(ii) It holds that fK i = f 1 . Further, note that Jhj = diag(d)W for some d \in \BbbR n1 with
K 1
entries | dl | \leq 1. Due to the definition of \Omega K+1 , there exists l\ast such that vl\ast \not = 0 and
| dl\ast | < 1. Hence, the Jacobian of the affine piece is given by v\~T W1 with v\~ = diag(d)v.
Since p \not = 1, we get that q < +\infty and \| \~ v\| q < \| v\| q \leq 1. Consequently, \| J(fK i \circ hj )\| =
p
v T W1 \| p \leq \| \~
\| \~ v\| q \| W1 \| p < 1.
This concludes the induction argument.
For p > 1, Proposition 3.1 implies that ReLU NNs with p-norm constraints on the weights
can reproduce neither the absolute value nor a whole family of simple functions, including
the triangular hat function (also known as the B-spline of degree 1) and the soft-thresholding
function. Further, this result suggests that activation functions with more than one region with
maximal slope are better suited within the scope of this approximation framework. Typically,
learnable spline activation functions are capable of having this property.
3.2. Limited expressivity. A meaningful metric for the expressivity of a model is its ability
to produce functions with high variations. In this section, we investigate the impact of the
Lipschitz constraint on the maximal second-order total variation of such an NN. Note that
we partially rely on results from [20] for our proofs. The second-order total variation of a
function f : \BbbR \rightarrow \BbbR is defined as TV(2) (f ) := \| D2 f \| \scrM , where \| \cdot \| \scrM is the total-variation norm
related to the space \scrM of bounded Radon measures, and D is the distributional derivative
operator. The space of functions with bounded second-order total variation is denoted by
(3.3) BV(2) (\BbbR ) = \{ f : \BbbR \rightarrow \BbbR s.t. TV(2) (f ) < +\infty \} .
For more details, we refer the reader to [7, 36]. Further, we recall that TV(2) is a seminorm
that, for a CPWL function on the real line, is given by the finite sum of its absolute slope
changes. Based on Lemma 3.2, we infer for the p-norm-constrained setting that, in general, a
linear-spline activation function cannot be replaced with a one-layer ReLU NN without losing
expressivity.
Lemma 3.2. Let f : \BbbR \rightarrow \BbbR be parameterized by a one-hidden-layer NN with componentwise
activation function \sigma and p-norm-constrained weights, p \in [1, +\infty ]. If \sigma \in BV(2) (\BbbR ), then
(3.4) TV(2) (f ) \leq TV(2) (\sigma ).
Proof. Let f be given by x \mapsto \rightarrow uT \sigma (wx + b) = N
\sum
n=1 un \sigma (wn x + bn ) with u := (u1 , . . . , uN ) \in
\BbbR N , w := (w1 , . . . , wN ) \in \BbbR N , and b := (b1 , . . . , bN ) \in \BbbR N . The p-norm weight constraints imply
that \| w\| p \leq 1 and \| u\| q \leq 1 with 1/p + 1/q = 1. Since TV(2) is a seminorm, we get

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

LIPSCHITZ CONSTRAINED DEEP SPLINE NEURAL NETWORKS 315

N
\sum N
\sum
(2) (2)
(3.5) TV (f ) \leq | un | TV (\sigma (wn \cdot +bn )) \leq | un wn | TV(2) (\sigma ) \leq TV(2) (\sigma ),
n=1 n=1
Downloaded 09/06/23 to 128.178.48.127 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

where the last step follows by H\"older's inequality.

In principle, the composition operation suffices to increase the second-order total variation
of a mapping exponentially. For instance, the n-fold composition fn of f : \BbbR \rightarrow \BbbR with
x \mapsto \rightarrow 2| x - 1/2| yields the sawtooth function with 2n linear regions and

(3.6) TV(2) (fn ) = 2(2n - 1).

This highly desirable property is, however, not achievable by ReLU NNs with \infty -norm-
constrained weights [20, Thm. 1]. As shown in Proposition 3.3, this has a drastic impact
on the size of the class of functions that can be approximated by ReLU NNs.
Proposition 3.3. Let D \subset \BbbR d be compact with nonempty interior. Then, there exists f \in
Lip1,\infty (D) that cannot be approximated by ReLU NNs \Phi : \BbbR d \rightarrow \BbbR with architecture (2.1), and
\infty -norm-constrained weights.
Proof. By [20, Thm. 1], we know that for any u \in \BbbR d with \| u\| \infty = 1 and any ReLU NN
\Phi with \infty -norm weight constraint, it holds that

(3.7) TV(2) (\Phi \circ \varphi u ) \leq 2,

where \varphi u : \BbbR \rightarrow \BbbR d with t \mapsto \rightarrow tu. Let (\Phi n )n\in \BbbN be a sequence of ReLU NNs with \infty -norm-
constrained weights that converges uniformly to \Phi on D. Since D has nonempty interior, we
can pick u \in \BbbR d with \| u\| \infty = 1 such that \varphi - 1 u (D) contains an open interval I \subset \BbbR . Then,
(\Phi n \circ \varphi u )n\in \BbbN converges uniformly to \Phi \circ \varphi u on I. Since TV(2) is lower semicontinuous with
respect to uniform convergence [7, Prop. 3.14], we infer that the restriction to I satisfies

(3.8) TV(2) (\Phi \circ \varphi u ) \leq 2.

In other words, any f \in Lip1,\infty (D) with TV(2) (f \circ \varphi u ) > 2 cannot be approximated by \infty -
norm-constrained ReLU NNs. However, there exist sawtooth-like functions on I that have
this property, with an explicit example constructed in Proposition 3.4.
Unlike ReLU networks, deep spline networks can produce arbitrarily complex mappings
thanks to the composition operation, even in the norm-constrained setting.
Proposition 3.4. Let C > 0, p \in [1, +\infty ], I \subset \BbbR open, and u \in \BbbR d . Then, there exists an NN
\Phi : \BbbR d \rightarrow \BbbR with architecture (2.1), p-norm-constrained weights, and 1-Lipschitz linear-spline
activation functions with one knot such that, for \varphi u : I \rightarrow \BbbR d with \varphi (t) = tu, it holds that

(3.9) TV(2) (\Phi \circ \varphi u ) > C.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

316 S. NEUMAYER, A. GOUJON, P. BOHRA, AND M. UNSER

Proof. Pick b \in \BbbR , c > 0 such that [b - c, b + c] \subset I. Let \sigma 1 with x \mapsto \rightarrow (| x - b| - c/2), \sigma k
with x \mapsto \rightarrow (| x| - c/2k ), k = 2, . . . , m, and Fm = \sigma m \circ \cdot \cdot \cdot \circ \sigma 1 . The function Fm is a sawtooth-like
CPWL function with 2m linear regions all contained in [b - c, b + c]. Further, it holds for all
t \in \BbbR that | Fm \prime (t)| = 1, and the sign of the slope is different for neighboring regions. From
Downloaded 09/06/23 to 128.178.48.127 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

this, we directly infer that

(3.10) TV(2) (Fm ) = 2(2m - 1).
Now, we build a deep spline NN \Phi K : \BbbR d \rightarrow \BbbR with K hidden layers of widths n1 , . . . , nK = d
and nK+1 = 1. The activation function used in the kth hidden layer is \sigma k for the first neuron
and zero otherwise, the weight matrices are chosen as the identity matrix except for the last
layer, where it is chosen such that
(3.11) \Phi K (x) = FK (x1 ).
This construction results for \varphi e1 : I \rightarrow \BbbR d in
(3.12) TV(2) (\Phi K \circ \varphi e1 ) = 2(2K - 1),
and the claim follows for u = e1 by taking K sufficiently large. The general case u \not = e1 follows
by using an appropriate weight matrix in the first layer.
4. Approximation of 1-Lipschitz functions. In this section, we investigate the approxima-
tion of 1-Lipschitz functions using the NN architecture (2.1) together with different activation
functions and weight constraints. Compared to the setting in section 2.1, the situation is
much more involved.
4.1. Networks with componentwise activation functions. Here, we investigate NNs with
architecture as in (2.1), p-norm-constrained weights, and with 1-Lipschitz componentwise
activation functions. As first step toward a better understanding, we restrict our attention
to functions on the real line. In particular, we show that any CPWL activation function
\sigma : \BbbR \rightarrow \BbbR can be written as a composition of simple linear splines.
Proposition 4.1. Let g : \BbbR \rightarrow \BbbR be a 1-Lipschitz CPWL function. Then, there exist n \in \BbbN
and 1-Lipschitz CPWL functions gi : \BbbR \rightarrow \BbbR , i = 1, . . . , n, with at most three linear regions
such that g = gn \circ \cdot \cdot \cdot \circ g1 .
Proof. We proceed via induction over the number m of linear regions of g. For g with up
to three linear regions the claim is clearly true. Now, assume that it holds for some m \in \BbbN ,
and let g be linear on the m + 1 > 3 intervals [ai , ai+1 ], i = 0, . . . , m, with a0 = - \infty and
am+1 = +\infty , and let s\pm = limx\rightarrow \pm \infty g \prime (x). First, we can write g = g1 \circ g2 with
\left\{
g(a1 ) + sign(s - )(x - a1 ) for x < a1 ,
(4.1) g1 (x) = g(x) for a1 \leq x \leq am ,
g(am ) + sign(s+ )(x - am ) otherwise
and
\left\{
a1 + | s - | (x - a1 ) for x < a1 ,
(4.2) g2 (x) = x for a1 \leq x \leq am ,
am + | s+ | (x - am ) otherwise.

LIPSCHITZ CONSTRAINED DEEP SPLINE NEURAL NETWORKS 317

Since g2 has three linear regions and g1 has the same number of linear regions as g, we can
limit our discussion to functions g with limx\rightarrow \pm \infty | g \prime (x)| = 1.
Case 1. There exists some aj , j \in \{ 2, . . . , m - 1\} , such that the function g has an extremum
in aj when restricted to ( - \infty , aj ] or [aj , +\infty ). As all possible cases are similar, we only provide
Downloaded 09/06/23 to 128.178.48.127 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

the construction for g(aj ) being a maximum of g on ( - \infty , aj ). To this end, we define the
functions g\~1 , g\~2 as
\Biggl\{
g(x) for x \leq aj ,
(4.3) g\~1 (x) =
g(aj ) + (x - aj ) otherwise

and
\Biggl\{
x for x \leq g(aj ),
(4.4) g\~2 (x) =
g(x + aj - g(aj )) otherwise,

which are both 1-Lipschitz piecewise-linear functions with at most m linear regions and satis-
gi\prime (x)| = 1. Further, it holds that g = g\~2 \circ g\~1 , so that we can apply the induction
fying limx\rightarrow \pm \infty | \~
assumption to conclude the argument.
Case 2. Case 1 does not apply and limx\rightarrow +\infty g \prime (x)/g \prime ( - x) = 1. In the following, we reduce
this to Case 1. We only provide the construction for limx\rightarrow - \infty g \prime (x) = 1, the other case being
similar. Here, it holds that g(a1 ) \geq g(ai ) \geq g(am ) for all i = 1, . . . , m and we now define the
functions g\~1 , g\~2 as
\left\{
g(x) for x < a1 ,
(4.5) g\~1 (x) = 2g(a1 ) - g(x) for a1 \leq x \leq am ,
g(x) + 2(g(a1 ) - g(am )) otherwise

and
\left\{
x for x < g(a1 ),
(4.6) g\~2 (x) = 2g(a1 ) - x for g(a1 ) \leq x \leq 2g(a1 ) - g(am ),
2(g(am ) - g(a1 )) + x otherwise.

Clearly, both of the functions satisfy limx\rightarrow \pm \infty | \~ gi\prime (x)| = 1 and are 1-Lipschitz. Here, the first
function has m + 1 linear regions and the second one has three. Further, the first function
now fits Case 1 and it remains to show that g = g\~2 \circ g\~1 . However, this follows immediately
from g(a1 ) \geq g\~1 (x) \geq (g(a1 ) - g(am )) for x \in [a1 , am ].
Case 3. Case 1 does not apply and limx\rightarrow +\infty g \prime (x)/g \prime ( - x) = - 1. This case can be reduced
to either Case 1 or Case 2. We assume that limx\rightarrow - \infty g \prime (x) = 1 and note that the other case is
again similar. Then, it holds that min\{ g(a1 ), g(am )\} \geq g(ai ) for all i = 1, . . . , m and we choose
a\ast \in arg maxx\in \BbbR g(x) \in \{ a1 , am \} . Next, we define the functions g\~1 , g\~2 as
\Biggl\{
g(x) for x < a\ast ,
(4.7) g\~1 (x) =
2g(a\ast ) - g(x) otherwise

318 S. NEUMAYER, A. GOUJON, P. BOHRA, AND M. UNSER

and
\Biggl\{
x for x < g(a\ast ),
(4.8) g\~2 (x) =
2g(a\ast ) - x otherwise.
Downloaded 09/06/23 to 128.178.48.127 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

gi\prime (x)| = 1 and are 1-Lipschitz. Here, the first

Note that both functions satisfy limx\rightarrow \pm \infty | \~
function has m + 1 linear regions and the second one has 2. Further, the first function now
fits either Case 1 or Case 2 and, hence, it remains to show that g = g\~2 \circ g\~1 . However, this
follows immediately from the definition of a\ast .
Remark 4.2. The proof actually also shows that if g satisfies | g \prime (x)| = 1 a.e., then the same
also holds true for the gi . Further, the result can be interpreted as an approximation with an
NN that has only one neuron per hidden layer. Note that a similar approximation result for
ResNets without Lipschitz constraints was given in [23].
Proposition 4.1 is a strong motivation for the use of deep spline NNs. In particular, it
implies that deep spline NNs with very simple activation functions already suffice to achieve
the maximum representational power in (2.1).
Theorem 4.3. Let D \subset \BbbR d be compact. Then, NNs \Psi : D \rightarrow \BbbR n with architecture (2.1), p-
norm-constrained weights, and 1-Lipschitz spline activation functions with three linear regions
can approximate the same functions as the corresponding NNs \Phi : D \rightarrow \BbbR n with arbitrary
1-Lipschitz componentwise activation functions.
Proof. We proceed by induction over the number K of layers of \Phi . For K = 1, the NN \Phi
produces an affine mapping and there is nothing to show. Assume that the statement holds for
K layers. Let \Phi K+1 : \BbbR d \rightarrow \BbbR nK+1 be an NN of the form (2.1) with p-norm-constrained weights
and K+1 layers. Then, \Phi K+1 = AK+1 \circ \sigma \alpha K \circ \Phi K with a K-layer NN \Phi K : \BbbR d \rightarrow \BbbR nK of the same
form. By application of the induction assumption, for any \epsilon \in \BbbR >0 there exists a deep spline
NN \Psi 1 : \BbbR d \rightarrow \BbbR nK with p-norm-constrained weights such that maxx\in D \| \Phi K (x) - \Psi 1 (x)\| p \leq
\epsilon /2. Due to the finite diameter of D, the range of 1-Lipschitz functions is compact. Hence,
Proposition 4.1 implies that there exists a deep spline NN \Psi 2 : \BbbR nk \rightarrow \BbbR nk with all affine
transformations being identities such that maxx\in \Phi K (D) \| \sigma \alpha K (x) - \Psi 2 (x)\| p \leq \epsilon /2. For the deep
spline NN AK+1 \circ \Psi 2 \circ \Psi 1 with spectral-norm-constrained weights, we can bound the error as

This concludes the proof.

Theorem 4.3 tells us that, among all NNs of the form (2.1) with componentwise 1-Lipschitz
activation functions, splines with three linear regions achieve the optimal representational
power. This is corroborated by numerical experiments on function fitting, Wasserstein distance
estimation, and Plug-and-Play image reconstruction for which it was found that 1-Lipschitz
deep spline NNs match or outperform other 1-Lipschitz NNs including the Groupsort architec-
ture [12]. Meanwhile, the question of whether deep spline networks with p-norm-constrained

LIPSCHITZ CONSTRAINED DEEP SPLINE NEURAL NETWORKS 319

weights are universal approximators for Lip1,p (D) is part of ongoing research, and it appears
to be a very challenging problem.
4.2. Groupsort versus linear-spline activation functions. In this section, we discuss how
Groupsort NNs and deep spline NNs can be expressed in terms of each other. Here, the
Downloaded 09/06/23 to 128.178.48.127 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

situation differs depending on the applied weight constraint. First, we revisit a framework
specifically tailored to Groupsort NNs, where the weights in architecture (2.1) satisfy \| Wk \| \infty \leq
1, k = 2, . . . , K, and \| W1 \| p,\infty \leq 1. Then, the expression of an arbitrary deep spline NN
using a Groupsort NN is made possible due to the following universality result proved in
[1, Thm. 3].
Proposition 4.4. Let D \subset \BbbR d be compact, and let p \in [1, +\infty ]. The Groupsort NNs with
architecture (2.1), group size at least 2, and weight constraints \| Wk \| \infty \leq 1, k = 2, . . . , K, and
\| W1 \| p,\infty \leq 1 are dense in Lip1,p (D).
Proposition 4.4, according to which density holds for all p \in [1, +\infty ], can be misleading as
p only has little to do with the involved norm constraints. All weights but the first one have
to fulfill an \infty -norm constraint, which is rarely used in practice. This somehow limits the
practical relevance of the result. Nevertheless, it would be interesting if a similar result would
also hold for deep spline NNs. Let us remark that the proof of Proposition 4.4 relies heavily
on the maximum operation and the chosen norms, which makes it difficult to generalize it to
other norm constraints or activation functions.
Now, we discuss the case of spectral-norm constraints, which are the usual choice in
practice. For this setting, let us recall that it holds that
x1 + x2 + | x1 - x2 |
(4.10) max(x1 , x2 ) = .
2
Hence, in the case of the spectral-constrained weights, the MaxMin activation function can
be written as the deep spline NN MaxMin(x) = W2 \sigma 1 (W1 x), where
\biggl( \biggr) \biggl( \biggr)
1 1 1 x1
(4.11) W1 = W2 = \surd and \sigma 1 (x) = .
2 1 - 1 | x2 |

This can be extended to any Groupsort operation since the MaxMin operation has the same
expressivity as Groupsort under any p-norm constraint [1]. We are not aware of any results
for the reverse direction, i.e., to express a deep spline NN using a Groupsort NN with spectral-
norm-constrained weights.
5. Conclusions and open problems. In this paper, we have shown that neural networks
(NNs) with linear-spline activation functions with at least three linear regions can approximate
the maximal class of functions among all NNs with p-norm weight constraints and compo-
nentwise activation functions. However, it remains an open question whether these NNs are
universal approximators of Lip1,p (D), D \subset \BbbR d , compact. While this problem appears to be
very challenging, our result could be a first step toward its solution. The comparison of linear
spline to non-componentwise activation functions involves subtle considerations. It is so far
unclear which choice leads to more expressive NNs. For the spectral norm, deep spline NNs
are at least as expressive as Groupsort NNs, but for \infty -norm-constrained weights the opposite

320 S. NEUMAYER, A. GOUJON, P. BOHRA, AND M. UNSER

is true. The further investigation of the problem of universality under different constraints ap-
pears to be a promising research topic that may lead to better trainable Lipschitz-constrained
NN architectures.
Regarding the question of universality, we mainly focused on the approximation of scalar-
Downloaded 09/06/23 to 128.178.48.127 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

valued functions f : \BbbR d \rightarrow \BbbR . This also reflects the current state of research, where most results
are only formulated for scalar-valued NNs. The extension of these results to vector-valued
functions appears highly nontrivial and is a topic for future research. Finally, we want to
remark that little is known about the optimal structure for deep spline and Groupsort NNs,
namely, if it is more preferable to design either deep or wide architectures.

REFERENCES

[1] C. Anil, J. Lucas, and R. Grosse, Sorting out Lipschitz function approximation, in Proceedings of the
36th International Conference on Machine Learning, Proceedings of Machine Learning Research 97,
PMLR, 2019, pp. 291--301, https://openreview.net/pdf?id=ryxY73AcK7.
[2] M. Arjovsky, S. Chintala, and L. Bottou, Wasserstein generative adversarial networks, in Proceed-
ings of the 34th International Conference on Machine Learning, Proceedings of Machine Learning
Research 70, PMLR, 2017, pp. 214--223, https://proceedings.mlr.press/v70/arjovsky17a.html.
[3] S. Aziznejad, H. Gupta, J. Campos, and M. Unser, Deep neural networks with trainable activa-
tions and controlled Lipschitz constant, IEEE Trans. Signal Process., 68 (2020), pp. 4688--4699,
https://doi.org/10.1109/TSP.2020.3014611.
[4] S. Aziznejad and M. Unser, Deep spline networks with control of Lipschitz regularity, in Proceedings
of the IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, 2019, pp.
3242--3246, https://doi.org/10.1109/ICASSP.2019.8682547.
[5] P. Bohra, J. Campos, H. Gupta, S. Aziznejad, and M. Unser, Learning activation functions in
deep (spline) neural networks, IEEE Open J. Signal Process., 1 (2020), pp. 295--309, https://doi.org
/10.1109/OJSP.2020.3039379.
[6] P. Bohra, D. Perdios, A. Goujon, S. Emery, and M. Unser Learning Lipschitz-controlled
activation functions in neural networks for Plug-and-Play image reconstruction methods, in
NeurIPS, 2021 Workshop on Deep Learning and Inverse Problems, 2021, https://openreview.
net/forum?id=efCsbTzQTbH.
[7] K. Bredies and M. Holler, Higher-order total variation approaches and generalisations, Inverse Prob-
lems, 36 (2020), 123001, https://doi.org/10.1088/1361-6420/ab8f80.
[8] L. Bungert, R. Raab, T. Roith, L. Schwinn, and D. Tenbrinck, CLIP: Cheap Lipschitz training of
neural networks, in Scale Space and Variational Methods in Computer Vision, Springer, Cham, 2021,
pp. 307--319, https://doi.org/10.1007/978-3-030-75549-2 25.
[9] O. Calin, Deep Learning Architectures: A Mathematical Approach, Springer, Cham, 2020,
https://doi.org/10.1007/978-3-030-36721-3
[10] A. Chernodub and D. Nowicki, Norm-Preserving Orthogonal Permutation Linear Unit Activation
Functions (OPLU), preprint, https://arxiv.org/abs/1604.02313, 2016.
[11] J. E. Cohen, T. P. Huster, and R. Cohen, Universal Lipschitz Approximation in Bounded Depth
Neural Networks, preprint, https://arxiv.org/abs/1904.04861, 2019.
[12] S. Ducotterd, A. Goujon, P. Bohra, D. Perdios, S. Neumayer, and M. Unser, Improv-
ing Lipschitz-Constrained Neural Networks by Learning Activation Functions, preprint, https://
arxiv.org/abs/2210.16222, 2022.
[13] M. Fazlyab, A. Robey, H. Hassani, M. Morari, and G. Pappas, Efficient and accurate
estimation of Lipschitz constants for deep neural networks, in Advances in Neural Informa-
tion Processing Systems, Vol. 32, Curran Associates, Red Hook, NY, 2019, pp. 11427--11438,
https://openreview.net/forum?id=rkxGbHBe8S.
[14] H. Gouk, E. Frank, B. Pfahringer, and M. Cree, Regularisation of neural networks by enforcing
Lipschitz continuity, Mach. Learn., 110 (2021), pp. 393--416. https://doi.org/10.1007/s10994-020-
05929-w.

LIPSCHITZ CONSTRAINED DEEP SPLINE NEURAL NETWORKS 321

[15] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville Improved train-

ing of Wasserstein GANs, in Advances in Neural Information Processing Systems 30, Curran As-
sociates, Red Hook, NY, 2017, pp. 2644--2655, https://proceedings.neurips.cc/paper\.files/paper/
2017/file/892c3b1c6dccd52936e27cbd0ff683d6-Paper.pdf.
[16] P. Hagemann and S. Neumayer, Stabilizing invertible neural networks using mixture models, Inverse
Downloaded 09/06/23 to 128.178.48.127 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Problems, 37 (2021), 085002, https://doi.org/10.1088/1361-6420/abe928.

[17] M. Hasannasab, J. Hertrich, S. Neumayer, G. Plonka, S. Setzer, and G. Steidl, Parse-
val proximal neural networks, J. Fourier Anal., 26 (2020), 59, https://doi.org/10.1007/s00041-020-
09761-7.
[18] J. Hertrich, S. Neumayer, and G. Steidl, Convolutional proximal neural networks and plug-and-
play algorithms, Linear Algebra Appl., 631 (2021), pp. 203--234, https://doi.org/10.1016/j.laa.2021.
09.004.
[19] L. Huang, X. Liu, B. Lang, A. W. Yu, Y. Wang, and B. Li, Orthogonal weight normalization:
Solution to optimization over multiple dependent Stiefel manifolds in deep neural networks, in Pro-
ceedings of the 32nd AAAI Conference on Artificial Intelligence, AAAI Press, 2018, pp. 3271--3278,
https://doi.org/10.1609/aaai.v32i1.11768.
[20] T. Huster, C.-Y. J. Chiang, and R. Chadha, Limitations of the Lipschitz constant as a defense against
adversarial examples, in Proceedings of the Joint European Conference on Machine Learning and
Knowledge Discovery in Databases, Springer, Cham, 2018, pp. 16--29,https://doi.org/10.1007/978-3-
030-13453-2 2.
[21] F. Latorre, P. Rolland, and V. Cevher, Lipschitz constant estimation of neural networks via sparse
polynomial optimization, in International Conference on Learning Representations, 2020, pp. 1--14,
https://openreview.net/forum?id=rJe4 xSFDB.
[22] Q. Li, S. Haque, C. Anil, J. Lucas, R. Grosse, and J.-H. Jacobsen, Preventing gradi-
ent attenuation in Lipschitz constrained convolutional networks, in Advances in Neural Infor-
mation Processing Systems 32, Curran Associates, Red Hook, NY, 2019, pp. 15364--15376,
https://openreview.net/forum?id=Syx36SBeUS.
[23] H. Lin and S. Jegelka, ResNet with one-neuron hidden layers is a universal approximator , in Advances
in Neural Information Processing Systems 31, Curran Associates, Red Hook, NY, 2018, pp. 6172--6181,
https://proceedings.neurips.cc/paper/2018/file/03bfc1d4783966c69cc6aef8247e0103-Paper.pdf.
[24] T. Meinhardt, M. Moeller, C. Hazirbas, and D. Cremers, Learning proximal opera-
tors: Using denoising networks for regularizing inverse imaging problems, in Proceedings
of the IEEE International Conference on Computer Vision, IEEE, 2017, pp. 1799--1808,
https://doi.ieeecomputersociety.org/10.1109/ICCV.2017.198.
[25] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, Spectral normalization for generative ad-
versarial networks, in Sixth International Conference on Learning Representations, 2018, pp. 1--26,
https://openreview.net/forum?id=B1QRgziT-.
[26] P. Pauli, A. Koch, J. Berberich, P. Kohler, and F. Allgower, \" Training robust neural net-
works using Lipschitz bounds, IEEE Control Syst. Lett., 6 (2022), pp. 121--126, https://doi.org/
10.1109/LCSYS.2021.3050444.
[27] K. Roth, Y. Kilcher, and T. Hofmann, Adversarial training is a form of data-dependent
operator norm regularization, in Advances in Neural Information Processing Systems, Vol.
33, Curran Associates, 2020, pp. 14973--14985, https://proceedings.neurips.cc/paper/2020/file/
ab7314887865c4265e896c6e209d1cd6-Paper.pdf.
[28] E. Ryu, J. Liu, S. Wang, X. Chen, Z. Wang, and W. Yin, Plug-and-play methods provably converge
with properly trained denoisers, in International Conference on Machine Learning, 2019, PMLR, pp.
5546--5557, https://proceedings.mlr.press/v97/ryu19a.html.
[29] H. Sedghi, V. Gupta, and P. M. Long, The singular values of convolutional layers, in Interna-
tional Conference on Learning Representations, 2019, pp. 1--12, https://openreview.net/forum?id=
rJevYoA9Fm.
[30] S. Singla, S. Singla, and S. Feizi, Improved Deterministic l2 Robustness on CIFAR-10 and CIFAR-100,
preprint, https://arxiv.org/abs/2108.04062, 2021.
[31] S. Sreehariand, S. V. Venkatakrishnan, and B. Wohlberg, Plug-and-play priors for bright field
electron tomography and sparse interpolation, IEEE Trans. Comput. Imaging, 2 (2016), pp. 408--423,
https://doi.org/10.1109/TCI.2016.2599778.

322 S. NEUMAYER, A. GOUJON, P. BOHRA, AND M. UNSER

[32] U. Tanielian, M. Sangnier, and G. Biau, Approximating Lipschitz continuous functions with Group-
Sort neural networks, in Proceedings of the 24th International Conference on Artificial Intelligence
and Statistics, PMLR, 2021, pp. 442--450, http://proceedings.mlr.press/v130/tanielian21a.html.
[33] J. M. Tarela, E. Alonso, and M. V. Mart\'{\i}nez, A representation method for PWL functions oriented
to parallel processing, Math. Comput. Model., 13 (1990), pp. 75--83, https://doi.org/10.1016/0895-
Downloaded 09/06/23 to 128.178.48.127 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

7177(90)90090-A.
[34] M. Terris, A. Repetti, J. Pesquet, and Y. Wiaux, Building firmly nonexpansive convolutional
neural networks, in Proceedings of the IEEE International Conference on Acoustics, Speech and
Signal Processing, IEEE, 2020, pp. 8658--8662, https://doi.org/10.1109/ICASSP40776.2020.9054731.
[35] Y. Tsuzuku, I. Sato, and M. Sugiyama, Lipschitz-margin training: Scalable certification of
perturbation invariance for deep neural networks, in Advances in Neural Information Process-
ing Systems 31, Curran Associates, Red Hook, NY, 2018, pp. 6542--6551, https://proceedings.
neurips.cc/paper\.files/paper/2018/file/485843481a7edacbfce101ecb1e4d2a8-Paper.pdf.
[36] M. Unser, A representer theorem for deep neural networks, J. Mach. Learn. Res., 20 (2019), 110,
http://jmlr.org/papers/v20/18-418.html.
[37] S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg, Plug-and-play priors for model based
reconstruction, in Proceedings of the IEEE Global Conference on Signal and Information Processing,
IEEE, 2013, pp. 945--948, https://doi.org/10.1109/GlobalSIP.2013.6737048.
[38] A. Virmaux and K. Scaman, Lipschitz regularity of deep neural networks: Analysis and effi-
cient estimation, in Advances in Neural Information Processing Systems 31, Curran Associates,
Red Hook, NY, 2018, pp. 3839--3848, https://proceedings.neurips.cc/paper files/paper/2018/file/
d54e99a6c03704e95e6965532dec148b-Paper.pdf.

CS 107 Probability, AUA, Spring 2024, Lecture 04
No ratings yet
CS 107 Probability, AUA, Spring 2024, Lecture 04
7 pages
A Linear Program
No ratings yet
A Linear Program
58 pages
Statistics Formulae Sheet: X X N X F - X N L+ I F N - C) FM F 1) FM F 1) + (FM F 2) × I Lowest Value+highest Value
No ratings yet
Statistics Formulae Sheet: X X N X F - X N L+ I F N - C) FM F 1) FM F 1) + (FM F 2) × I Lowest Value+highest Value
4 pages
WAEC Syllabus For Technical Drawing
100% (1)
WAEC Syllabus For Technical Drawing
5 pages
Trigonometric Equation: Om Sharma
No ratings yet
Trigonometric Equation: Om Sharma
12 pages
What Kinds of Functions do Deep Neural Networks Learn -- Insights from Variational Spline Theory (2021)
No ratings yet
What Kinds of Functions do Deep Neural Networks Learn -- Insights from Variational Spline Theory (2021)
25 pages
2020 Exactly Computing The Local Lipschitz Constant
No ratings yet
2020 Exactly Computing The Local Lipschitz Constant
10 pages
NeurIPS 2018 Lipschitz Regularity of Deep Neural Networks Analysis and Efficient Estimation Paper
No ratings yet
NeurIPS 2018 Lipschitz Regularity of Deep Neural Networks Analysis and Efficient Estimation Paper
10 pages
2020 Regularisation of Neural Networks by Enforcing
No ratings yet
2020 Regularisation of Neural Networks by Enforcing
27 pages
Curs7 PDF
No ratings yet
Curs7 PDF
46 pages
Approximation Capability to Functions of Several Variables Nonlinear Functionals and Operators by Radial Basis Function Neural Networks
No ratings yet
Approximation Capability to Functions of Several Variables Nonlinear Functionals and Operators by Radial Basis Function Neural Networks
7 pages
Dasgupta and Schnitger_1992_The power of approximating a comparison of activation functions Paper
No ratings yet
Dasgupta and Schnitger_1992_The power of approximating a comparison of activation functions Paper
8 pages
Lecture Five Radial-Basis Function Networks: Associate Professor
No ratings yet
Lecture Five Radial-Basis Function Networks: Associate Professor
64 pages
14 Deep
No ratings yet
14 Deep
6 pages
A Rate of Convergence of Physics Informed Neural Networks For The Linear Second Order Elliptic PDEs
No ratings yet
A Rate of Convergence of Physics Informed Neural Networks For The Linear Second Order Elliptic PDEs
24 pages
Multidimensional Function Approximation Using Neural Networks
No ratings yet
Multidimensional Function Approximation Using Neural Networks
6 pages
Radial Basis Function
No ratings yet
Radial Basis Function
35 pages
Klqgceb Ewvhja SC
No ratings yet
Klqgceb Ewvhja SC
8 pages
On Deep Learning For Inverse Problems: Jaweria Amjad Jure Sokoli C Miguel R.D. Rodrigues
No ratings yet
On Deep Learning For Inverse Problems: Jaweria Amjad Jure Sokoli C Miguel R.D. Rodrigues
5 pages
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
No ratings yet
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
23 pages
1808.07526
No ratings yet
1808.07526
26 pages
RBF.ppt
No ratings yet
RBF.ppt
45 pages
The Functions of Deep Learning: Gilbert Strang
No ratings yet
The Functions of Deep Learning: Gilbert Strang
1 page
Kim_Improving_Accuracy_of_Binary_Neural_Networks_Using_Unbalanced_Activation_Distribution_CVPR_2021_paper
No ratings yet
Kim_Improving_Accuracy_of_Binary_Neural_Networks_Using_Unbalanced_Activation_Distribution_CVPR_2021_paper
10 pages
safran17a
No ratings yet
safran17a
9 pages
2309.13722 Deep Neural Networks With ReLU
No ratings yet
2309.13722 Deep Neural Networks With ReLU
52 pages
On The Geometry of Deep Learning
No ratings yet
On The Geometry of Deep Learning
14 pages
1710 11573 PDF
No ratings yet
1710 11573 PDF
14 pages
Understanding Deep Convolutional Networks
No ratings yet
Understanding Deep Convolutional Networks
17 pages
NeurIPS 2021 Towards Lower Bounds On The Depth of Relu Neural Networks Paper
No ratings yet
NeurIPS 2021 Towards Lower Bounds On The Depth of Relu Neural Networks Paper
13 pages
Neural Networks Five
No ratings yet
Neural Networks Five
65 pages
F - P N N S L: Unction Space Arameterization of Eural Etworks For Equential Earning
No ratings yet
F - P N N S L: Unction Space Arameterization of Eural Etworks For Equential Earning
29 pages
RBF
No ratings yet
RBF
45 pages
2501.04387v1
No ratings yet
2501.04387v1
7 pages
5 - Automatic Optimisation of Normalzed Neural Networks
No ratings yet
5 - Automatic Optimisation of Normalzed Neural Networks
13 pages
Econometrica - 2021 - Farrell - Deep Neural Networks For Estimation and Inference
No ratings yet
Econometrica - 2021 - Farrell - Deep Neural Networks For Estimation and Inference
33 pages
NNN Proect
No ratings yet
NNN Proect
12 pages
Layer-Wise Relevance Propagation For Neural Bach Et Al 2015
No ratings yet
Layer-Wise Relevance Propagation For Neural Bach Et Al 2015
8 pages
Challenging Questions
No ratings yet
Challenging Questions
2 pages
S2_7_NN
No ratings yet
S2_7_NN
39 pages
Instructor's Solution Manual For Neural Networks
No ratings yet
Instructor's Solution Manual For Neural Networks
40 pages
Instructor Solution Manual To Neural Networks and Deep Learning A Textbook Solutions 3319944622 9783319944623 - Compress
No ratings yet
Instructor Solution Manual To Neural Networks and Deep Learning A Textbook Solutions 3319944622 9783319944623 - Compress
40 pages
Wa0000.
No ratings yet
Wa0000.
14 pages
2205.11775v4
No ratings yet
2205.11775v4
16 pages
Study of Ensemble of Activation Functions in Deep Learning
No ratings yet
Study of Ensemble of Activation Functions in Deep Learning
10 pages
ActivationFun Survey Arxiv
No ratings yet
ActivationFun Survey Arxiv
49 pages
Mathematics of Deep Learning: Lecture 2 - Depth Separation
No ratings yet
Mathematics of Deep Learning: Lecture 2 - Depth Separation
13 pages
22
No ratings yet
22
17 pages
Breaking The Curse of Dimensionality With Convex Neural Networks
No ratings yet
Breaking The Curse of Dimensionality With Convex Neural Networks
53 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
ANN
No ratings yet
ANN
7 pages
Activation Functions and Their Characteristics in Deep Neural Networks
No ratings yet
Activation Functions and Their Characteristics in Deep Neural Networks
6 pages
Six Lectures On NN - Montanari
No ratings yet
Six Lectures On NN - Montanari
77 pages
My Dhsch6 RBF
No ratings yet
My Dhsch6 RBF
17 pages
cao2015
No ratings yet
cao2015
17 pages
S2_5_NN
No ratings yet
S2_5_NN
22 pages
Improved robustness to adversarial examples using Lipschitz regularization of the loss
No ratings yet
Improved robustness to adversarial examples using Lipschitz regularization of the loss
18 pages
ML Mentorship Prahitha Movva V1
No ratings yet
ML Mentorship Prahitha Movva V1
5 pages
14. Error bounds for deep ReLU networks using the Kolmogorov
No ratings yet
14. Error bounds for deep ReLU networks using the Kolmogorov
2 pages
A survey on UATs
No ratings yet
A survey on UATs
10 pages
2111.04020v4
No ratings yet
2111.04020v4
29 pages
Neural Network Theory22
No ratings yet
Neural Network Theory22
60 pages
Beyond Relu
No ratings yet
Beyond Relu
39 pages
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet
Topic 4.2 Answer Key Additional Practice
No ratings yet
Topic 4.2 Answer Key Additional Practice
1 page
Notes Topic 2.9 Key - Log Expressions
No ratings yet
Notes Topic 2.9 Key - Log Expressions
3 pages
4.1 Matrices and Transformation
No ratings yet
4.1 Matrices and Transformation
29 pages
SVM PDF
No ratings yet
SVM PDF
52 pages
Math 10C Trig Review JK
No ratings yet
Math 10C Trig Review JK
6 pages
Strategic Intervention Materials in Mathematics I (Slope of A Line)
88% (17)
Strategic Intervention Materials in Mathematics I (Slope of A Line)
7 pages
Differentiation and Its Application: 2.1 Differentiability of A Function
No ratings yet
Differentiation and Its Application: 2.1 Differentiability of A Function
18 pages
GRADE 12.excel Functions
67% (3)
GRADE 12.excel Functions
2 pages
Range: List X X X X List
No ratings yet
Range: List X X X X List
5 pages
CODING OF DATA FOR EASIER COMPUTATION
No ratings yet
CODING OF DATA FOR EASIER COMPUTATION
3 pages
Adv Math 02
No ratings yet
Adv Math 02
4 pages
Functions Assignment 2 Final
No ratings yet
Functions Assignment 2 Final
3 pages
Maths Unit Wise Questions
No ratings yet
Maths Unit Wise Questions
5 pages
Functions, Limits &amp Continuity Qns
No ratings yet
Functions, Limits &amp Continuity Qns
8 pages
EECE 5639 Computer Vision I: Hough Transform, Model Fitting, Planar Unwarping Hw3 Has Been Posted
No ratings yet
EECE 5639 Computer Vision I: Hough Transform, Model Fitting, Planar Unwarping Hw3 Has Been Posted
62 pages
Controls Engineering in FRC
No ratings yet
Controls Engineering in FRC
238 pages
10.8 Answers PDF
No ratings yet
10.8 Answers PDF
16 pages
3.2. Solutions To Quizzes: 3.1.3. Quiz Three
No ratings yet
3.2. Solutions To Quizzes: 3.1.3. Quiz Three
4 pages
Numericals Question Paper
No ratings yet
Numericals Question Paper
13 pages
Advanced Algorithms: Dr. LOUNNAS Bilal
No ratings yet
Advanced Algorithms: Dr. LOUNNAS Bilal
18 pages
094 - MA8351, MA6566 Discrete Mathematics - Question Bank 3
No ratings yet
094 - MA8351, MA6566 Discrete Mathematics - Question Bank 3
6 pages
SLP-G4-Q1-W1
No ratings yet
SLP-G4-Q1-W1
9 pages
Download full Discrete Mathematics and Its Applications 7th Edition Rosen Solutions Manual all chapters
100% (3)
Download full Discrete Mathematics and Its Applications 7th Edition Rosen Solutions Manual all chapters
59 pages
Lecture Notes Chapter 6
No ratings yet
Lecture Notes Chapter 6
50 pages
Lesson 6 - Equivalent Fractions and Lowest Term
No ratings yet
Lesson 6 - Equivalent Fractions and Lowest Term
4 pages

Neumayer 2023 A

Uploaded by

Neumayer 2023 A

Uploaded by

SIAM J. MATH. DATA SCI.

© 2023 Society for Industrial and Applied Mathematics

Approximation of Lipschitz Functions Using Deep Spline Neural Networks*

MSC codes. 26A16, 26B40, 41A15, 41A29, 65D07, 68T01, 94A15

1. Introduction. Lipschitz-constrained neural networks (NNs) have proven to be useful

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

Input Hidden Output

Figure 2.1. Model of a feedforward NN with three hidden layers, where d = 4, K = 4, n1 = n2 = n3 = 5, n4 = 2.

(2.4) (ui,j )k = sgn((xi - xj )k )| (xi - xj )k | p/q .

(2.7) fi (xj ) \geq fi,j (xj ) = yi + yj - yi = yj ,

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

(2.8) \| W \| p,q := max \| W x\| q

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

where the last step follows by H\"older's inequality.

(3.6) TV(2) (fn ) = 2(2n - 1).

(3.7) TV(2) (\Phi \circ \varphi u ) \leq 2,

(3.8) TV(2) (\Phi \circ \varphi u ) \leq 2.

(3.9) TV(2) (\Phi \circ \varphi u ) > C.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

this, we directly infer that

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

gi\prime (x)| = 1 and are 1-Lipschitz. Here, the first

This concludes the proof.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

[15] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville Improved train-

Problems, 37 (2021), 085002, https://doi.org/10.1088/1361-6420/abe928.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

You might also like