0% found this document useful (0 votes)

15 views11 pages

1251_universal_approximation_theore

This paper presents a unified approach to universal approximation theorems for equivariant maps using group convolutional neural networks (CNNs). It highlights the significance of group symmetry in data processing and demonstrates that CNNs can approximate non-linear equivariant maps in both finite and infinite-dimensional settings. The findings contribute to the theoretical understanding of CNNs and their application in various machine learning tasks involving symmetry.

Uploaded by

Gowthaman Srinivasan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views11 pages

1251_universal_approximation_theore

Uploaded by

Gowthaman Srinivasan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Under review as a conference paper at ICLR 2021

U NIVERSAL A PPROXIMATION T HEOREM

FOR E QUIVARIANT M APS BY G ROUP CNN S

Anonymous authors
Paper under double-blind review

A BSTRACT

Group symmetry is inherent in a wide variety of data distributions. Data pro-

cessing that preserves symmetry is described as an equivariant map and often
effective in achieving high performance. Convolutional neural networks (CNNs)
have been known as models with equivariance and shown to approximate equivari-
ant maps for some specific groups. However, universal approximation theorems
for CNNs have been separately derived with individual techniques according to
each group and setting. This paper provides a unified method to obtain univer-
sal approximation theorems for equivariant maps by CNNs in various settings.
As its significant advantage, we can handle non-linear equivariant maps between
infinite-dimensional spaces for non-compact groups.

1 I NTRODUCTION

Deep neural networks have been widely used as models to approximate underlying functions in
various machine learning tasks. The expressive power of fully-connected deep neural networks was
first mathematically guaranteed by the universal approximation theorem in Cybenko (1989), which
states that any continuous function on a compact domain can be approximated with any precision
by an appropriate neural network with sufficient width and depth. Beyond the classical result stated
above, several types of variants of the universal approximation theorem have also been investigated
under different conditions.
Among a wide variety of deep neural networks, convolutional neural networks (CNNs) have
achieved impressive performance for real applications. In particular, almost all of state-of-the-art
models for image recognition are based on CNNs. These successes are closely related to the prop-
erty that performing CNNs commute with translation on pixel coordinate. That is, CNNs can con-
serve symmetry about translation in image data. In general, this kind of property for symmetry is
known as the equivariance, which is a generalization of the invariance. When a data distribution
has some symmetry and the task to be solved relates to the symmetry, data processing is desired to
be equivariant on the symmetry. In recent years, different types of symmetry have been focused per
each task, and it has been proven that CNNs can approximate arbitrary equivariant data processing
for specific symmetry. These results are mathematically captured as the universal approximation for
equivariant maps and represent the theoretical validity of the use of CNNs.
In order to theoretically correctly handle symmetric structures, we have to carefully consider the
structure of data space where data distributions are defined. For example, in image recognition tasks,
image data are often supposed to have symmetry for translation. When each image data is acquired,
there are finite pixels equipped with an image sensor, and an image data is represented by a finite-
dimensional vector in a Euclidean space Rd , where d is the number of pixels. However, we note that
the finiteness of pixels stems from the limit of the image sensor and a raw scene behind the image
data is thought to be modelled by an element in RS with continuous spatial coordinates S, where RS
is a set of functions from S to R. Then, the element in RS is regarded as a functional representation
of the image data in Rd . In this paper, in order to appropriately formulate data symmetry, we
treat both typical data representation in finite-dimensional settings and functional representation in
infinite-dimensional settings in a unified manner.

1
Under review as a conference paper at ICLR 2021

1.1 R ELATED W ORKS

Symmetry and functional representation. Symmetry is mathematically described in terms of

groups and has become an essential concept in machine learning. Gordon et al. (2019) point out
that, when data symmetry is represented by a infinite group like the translation group, equivari-
ant maps, which are symmetry-preserving processing, cannot be captured as maps between finite-
dimensional spaces but can be described by maps between infinite-dimensional function spaces. As
a related study about symmetry-preserving processing, Finzi et al. (2020) propose group convolution
of functional representations and investigate practical computational methods such as discretization
and localization.
Universal approximation for continuous maps. The universal approximation theorem, which
is the main objective of this paper, is one of the most classical mathematical theorems of neu-
ral networks. The universal approximation theorem states that a feedforward fully-connected
network (FNN) with a single hidden layer containing finite neurons can approximate a contin-
uous function on a compact subset of Rd . Cybenko (1989) proved this theorem for the sig-
moid activation function. After his work, some researchers showed similar results to general-
ize the sigmoidal function to a larger class of activation functions as Barron (1994), Hornik et al.
(1989), Funahashi (1989), Kůrková (1992) and Sonoda & Murata (2017). These results were ap-
proximations to functional representations between finite-dimensional vector spaces, but recently
Guss & Salakhutdinov (2019) generalized them to continuous maps between infinite-dimensional
function spaces in Guss & Salakhutdinov (2019).
Equivariant neural networks. The concept of group-invariant neural networks was first intro-
duced in Shawe-Taylor (1989) in the case of permutation groups. In addition to the invariant case,
Zaheer et al. (2017a) designed group-equivariant neural networks for permutation groups and ob-
tained excellent results in many applications. Maron et al. (2019a; 2020) consider and develop a
theory of equivariant tensor networks for general finite groups. Petersen & Voigtlaender (2020) es-
tablished a connection between group CNNs, which are equivariant networks, and FNNs for group
finites. However, symmetry are not limited to finite groups. Convolutional neural networks (CNNs)
was designed to be equivariant for translation groups and achieved impressive performance in a
wide variety of tasks. Gens & Domingos (2014) proposed architectures that are based on CNNs
and invariant to more general groups including affine groups. Motivated by CNN’s experimental
success, many researchers have further generalized this by using group theory. Kondor & Trivedi
(2018) proved that, when a group is compact and the group action is transitive, a neural network
constrained by some homogeneous structure is equivariant if and only if it becomes a group CNN.
Universal approximation for equivariant maps. Compared to the vast studies about universal
approximation for continuous maps, there are few existing studies about universal approximation
for equivariant maps. Sannai et al. (2019); Ravanbakhsh (2020); Keriven & Peyré (2019) consid-
ered the equivariant model for finite groups and proved universal approximation property of them
by attributing it to the results of Maron et al. (2019b). Cohen et al. (2019) considered group convo-
lution on a homogeneous space and proved that a linear equivariant map is always convolution-like.
Yarotsky (2018) proved universal approximation theorems for nonlinear equivariant maps by CNN-
like models when groups are the d-dimensional translation group T (d) = Rd or the 2-dimensional
Euclidean group SE(2). However, when groups are more general, universal approximation theorems
for non-linear equivariant maps have not been obtained.

1.2 PAPER O RGANIZATION AND O UR C ONTRIBUTIONS

The paper is organized as follows. In section 2, we introduce the definition of group equivariant
maps and provide the essential property that equivariant maps have one-to-one correspondence to
theoretically tractable maps called generators. In section 3, we define fully-connected and group
convolutional neural networks between function spaces. This formulation is suitable to represent
data symmetry. Then, we provide a main theorem called the conversion theorem that can convert
FNNs to CNNs. In section 4, using the conversion theorem, we derive universal approximation
theorems for non-linear equivariant maps by group CNNs. In particular, this is the first universal
approximation theorem for equivariant maps in infinite-dimensional settings. We note that finite and
infinite groups are handled in a unified manner. In section 5, we provide concluding remarks and
mention future works.

2
Under review as a conference paper at ICLR 2021

2 G ROUP E QUIVARIANCE

2.1 P RELIMINARIES

We introduce deﬁnitions and terminology used in the later discussion.

Functional representation. In this paper, sets denoted by S, T and G are assumed to be locally
compact, σ-compact, Hausdorff spaces. When S is a set, we denote by RS the set of all maps from
S to R and by k · k∞ the supremum norm. We call S of RS the index set. We denote by C(S) the
set of all continuous maps from S to R. We denote by C0 (S) the set of continuous functions from
S to R which vanish at infinity1 . For a Borel space S with some measure µ, we denote the set of
integrable functions from S to R with respect to µ as L1µ (S). For a subset B ⊂ S, the restriction
map RB : RS → RB is defined by RB (x) = x|B , where x ∈ RS and x|B is the restriction of the
domain of x onto B.
When S is a finite set, RS is identified with the finite-dimensional Euclidean space R|S| , where |S|
is the cardinality of S. In this sense, RS for general sets S is a generalization of Euclidean spaces.
However, RS itself is often intractable for an infinite set S. In such cases, we instead consider C(S),
C0 (S) or Lp (S) as relatively tractable subspaces of RS .
Group action. We denote the identity element in a group G by 1. We assume that the action of a
group G on a set S is continuous. We denote by g · s the left action of g ∈ G toSs ∈ S. Then we
call Gs := {g · s|g ∈ G} the orbit of s ∈ S. From the definition, we have S = s∈S Gs . When a
subsetFB ⊂ S is the set of representative elements from all orbits, it satisfies the disjoint condition
S = s∈B Gs . Then, we call B a base space2 and define the projection PB : S → B by mapping
s ∈ S to the representative element in B ∩ Gs . When a group G acts on sets S and T , the action of
G on the product space S × T is defined by g · (s, t) := (g · s, g · t). When a group G acts on a index
set S, the G-translation operators Tg : RS → RS for g ∈ G are defined by Tg [x](s) := x(g −1 · s),
where x ∈ RS and s ∈ S. We often denote Tg [x] simply by g · x for brevity. Then, group translation
determine the action3 of G on RS .

2.2 G ROUP E QUIVARIANT M APS

In this section, we introduce group equivariant maps and show their basic properties. First, we deﬁne
group equivariance.
Deﬁnition 1 (Group Equivariance). Suppose that a group G acts on sets S and T . Then, a map
F : RS → RT is called G-equivariant when F [g · x] = g · F [x] holds for any g ∈ G and x ∈ RS .

An example of an equivariant map in image processing is provided in Figure 1.

To clarify the degree of freedom of equivariant maps, we deﬁne the generator of equivariant maps.
Deﬁnition 2 (Generator). Let B ⊂ T be a base space with respect to the action of G on T . For a
G-equivariant map F : RS → RT , we call FB := RB ◦ F the generator of F .

The following theorem shows that equivariant maps can be represented by their generators.
Theorem 3 (Degree of Freedom of Equivariant Maps). Let a group G act on sets S and T , and
B ⊂ T a base space. Then, a G-equivariant map F : RS → RT has one-to-one correspondence to
its generator FB .

A detailed version of Theorem 3 is proved in Section A.1.

1
A function f on a locally compact space is said to vanish at inﬁnity if, for any ϵ, there exists a compact
subset K ⊂ S such that sups∈S\K |f (s)| < ϵ.
2
The choice of the base space is not unique in general. However, the topological structure of a base space
can be induced by the quotient space S/G.
3
We note that Tg ◦ Tg′ = Tg′ g and the group translation operator is the action of G on RS from the right.

3
Under review as a conference paper at ICLR 2021

Figure 1: An example of an equivariant map from RGB images to gray-scale images. An RGB
image x is represented by values (i.e., a function) on 2-dimensional spatial coordinates with RGB
channels. This corresponds to the case where the index set is S = R2×3 = R6 . Similarly, a
gray-scale image F [x] after equivariant processing F : RS → RT is represented by values on 2-
dimensional spatial coordinates with a single gray-scale channel. This corresponds to the case where
the index set is T = R2 . In this ﬁgure, the group action is translation of G = R2 to 2-dimensional
spatial coordinates.

3 F ULLY- CONNECTED AND G ROUP C ONVOLUTIONAL N EURAL N ETWORKS

3.1 F ULLY- CONNECTED N EURAL N ETWORKS

To deﬁne neural networks, we introduce some notions. A map A : RS → RT is called a bounded

afﬁne map if there exist a bounded linear map W : RS → RT and an element b ∈ RT such that
A[x] = W [x] + b. (1)

Guss & Salakhutdinov (2019) provide the following lemma, which is useful to handle bounded
afﬁne maps.
Lemma 4 (Integral Form, Guss & Salakhutdinov (2019)). Suppose that S and T are locally com-
pact, σ-compact, Hausdorff, measurable spaces. For a bounded linear map W : C(S) →
C(T ), there exist a Borel regular measure µ on S and a weak∗ continuous family of functions
{w(t, ·)}t∈T ⊂ L1µ (S) such that the following holds for any x ∈ C(S):
Z
W [x](t) = w(t, s)x(s)dµ(s).
S

To use the integral form, we assume in the following that the input and output spaces of A are the
class of continuous maps C(S) and C(T ) instead of RS and RT , respectively. Using the integral
form, a bounded afﬁne map A is represented by
Z
Aµ,w,b [x](t) = w(t, s)x(s)dµ(s) + b(t). (2)
S

In particular, when S and T are finite sets with cardinality d and d′ , the function spaces C(S) and
′
C(T ) are identified with finite-dimensional Euclidean spaces Rd and Rd , and thus, an affine map
′ ′
A : Rd → Rd is parameterized by a weight matrix W = [w(t, s)]s∈[d],t∈[d′ ] : Rd → Rd and a
′
bias vector b = [b(t)]t∈[d′ ] ∈ Rd , and (2) induces the following form, which is often used in the
literature on neural networks:
X d
A[x](t) = w(t, s)x(s) + b(t). (3)
s=1

4
Under review as a conference paper at ICLR 2021

A continuous function ρ : R → R induces the activation map αρ : C(S) → C(S) which is defined
by αρ (x) := ρ ◦ x ∈ C(S) for x ∈ C(S). However, for brevity, we denote αρ by ρ. Then, we can
define fully-connected neural networks in general settings.
Definition 5 (Fully-connected Neural Networks). Let L ∈ N. A fully-connected neural network
with L layers is a composition map of bounded affine maps (A1 , . . . , AL ) and an activation map ρ
represented by
ϕ := AL ◦ ρ ◦ AL−1 ◦ · · · ◦ ρ ◦ A1 , (4)
where Aℓ : C(Sℓ−1 ) → C(Sℓ ) are affine maps for some sequence of sets {Sℓ }L ℓ=0 . Then, we denote
by NFNN (ρ, L; S0 , SL ) the set of all fully-connected neural networks from C(S0 ) to C(SL ) with L
layers and an activation function ρ.
We denote the measure of the affine map A1 in the first layer of a fully-connected neural network ϕ
by µϕ . This measure µϕ is used to describe a condition in the main theorem (Theorem 9).

3.2 G ROUP C ONVOLUTIONAL N EURAL N ETWORKS

We introduce the general form of group convolution.

Definition 6 (Group Convolution). Suppose that a group G acts on sets S and T . For a G-invariant
measure ν on S, G-invariant functions v : S × T → R and b ∈ C(T ), the biased G-convolution
Cν,v,b : C(S) → C(T ) is defined as
Z
Cν,v,b [x](t) := v(t, s)x(s)dν(s) + b(t). (5)
S
In the right hand side, we call the first term the G-convolution and the second term the bias term.
In the following, we denote Cν,v,b by C for brevity. When S and T are finite, we note that (5) also
can be represented as (3).
Definition 6 includes existing definitions of group convolution as follows. When S = T = G,
the group G acts on S and T by left translations. Then, (5) without the bias term (i.e., b = 0) is
described as Z Z
C[x](g) = v(g, h)x(h)dν(h) = ṽ(h−1 g)x(h)dν(h),
G G
where4 ṽ(g) := v(g, 1). This is a popular definition of group convolution between two functions on
G. Further, when S = G × B and T = G × B ′ , (5) without the bias term is described as
Z Z
C[x](g, t) = v((g, τ ), (h, ς))x(h, ς)dν(h, ς) = ṽ(h−1 g, τ, ς)x(h, ς)dν(h, ς),
G×B G×B
where ṽ(g, τ, ς) := v((g, τ ), (1, ς)). This coincides with the definition of group convolution in
Finzi et al. (2020). We note that Finzi et al. (2020) also proposes discretization and localization of
the above group convolution for implementation.
In conventional convolution used for image recognition, G represents spatial information such as
pixel coordinate, B and B ′ correspond to channels in consecutive layers ℓ and ℓ + 1 respectively,
and v corresponds to a filter. In applications, the filter v is expected to have compact support or be
short-tailed on G as in a 3 × 3 convolution filter in discrete convolution. In particular, when v is
allowed to be the Dirac delta or highly peaked around a single point in G, such convolution can be
interpreted as the 1 × 1 convolution.
Then, we define group convolutional neural networks as follows.
Definition 7 (Group Convolutional Neural Networks). Let L ∈ N. A G-convolutional neural
network with L layers is a composition map of biased convolutions Cℓ : C(Sℓ−1 ) → C(Sℓ )
(ℓ = 1, . . . , L) for some sequence of spaces {Bℓ }L
ℓ=0 and an activation map with ρ as
Φ := CL ◦ ρ ◦ CL−1 ◦ · · · ◦ ρ ◦ C1 . (6)
Then, we denote by NCNN (G, ρ, L; S0 , SL ) the set of all G-convolutional neural networks from
C(S0 ) to C(SL ) with respect to a group G with L layers and a fixed activation function ρ.
4
A bivariate G-invariant function v : G × G → R is determined by the univariate function ṽ : G → R
because v(g, h) = v(h−1 g, h−1 h) = v(h−1 g, 1) = ṽ(h−1 g).

5
Under review as a conference paper at ICLR 2021

We easily verify the following proposition.

Proposition 8. A G-convolutional neural network is G-equivariant.

In particular, each biased G-convolution Cν,v,b is G-equivariant. Conversely, Cohen et al. (2019)
showed that a G-equivariant linear map is represented by some G-convolution without the bias term
when G is locally compact and unimodular, and the action of a group is transitive (i.e., B consists of
only a single element).

3.3 C ONVERSION T HEOREM

In this section, we introduce the main theorem (Theorem 9), which is an essential part of obtaining
universal approximation theorems for equivariant maps by group CNNs.
Theorem 9 (Conversion Theorem). Suppose that the action of a group G on sets S and T . We
assume the following condition:

(C1) there exist base spaces BS ⊂ S, BT ⊂ T , and two subgroups5 HT ⩽ HS ⩽ G such that
S = G/HS × BS and T = G/HT × BT .

Further, suppose E ⊂ C0 (S) is compact and an FNN ϕ : E → C0 (BT ) with a Lipschitz activation
function ρ satisﬁes

(C2) there exists a G-left-invariant locally ﬁnite measure ν on S such that6 µϕ ν.

Then, for any ϵ > 0, there exists a CNN Φ : E → C0 (T ) with the activation function ρ such that the
number of layers of Φ equals that of ϕ and
kRBT ◦ Φ − ϕk∞ ≤ ϵ. (7)
Moreover, for any G-equivariant map F : C0 (S) → C0 (T ), the following holds:
kF |E − Φk∞ ≤ kFBT |E − ϕk∞ + ϵ. (8)

We provide the proof of Theorem 9 in Section B.

Conversion of Universal Approximation Theorems. The conversion theorem can convert a uni-
versal approximation theorem by FNNs to a universal approximation theorem for equivariant maps
by CNNs as follows. Suppose that the existence of an FNN ϕ which satisfies kFB |E − ϕk∞ ≤ ϵ
using some universal approximation theorem by FNNs. Then, Theorem 9 guarantees the existence
of a CNN Φ which satisfies kF |E − Φk∞ ≤ 2ϵ. In other words, if an FNN can approximate the
generator of the target equivariant map on E, then there exists a CNN which approximates the whole
of the equivariant map on E.
Applicable Cases. The conversion theorem can be applied to a wide range of group actions. We
explain the generality of the conversion theorem. First, sets S and T are not limited to finite sets or
Euclidean spaces, and may be more general topological spaces. Second, a group G may be discrete
(especially finite) or continuous groups. Moreover, G can be non-compact and non-commutative.
Third, the action of a group G on sets S and T may not be transitive, and thus, the sets can be
non-homogeneous spaces. In the following, we provide some concrete examples of group actions
when S = T and the actions of G on S and T are the same:

• Symmetric Group. The action of G = Sn on S = [n] as permutation has the decomposi-

tion [n] = Sn /Stab(1) × {∗}, where HS = Stab(1) is the set of all permutations on [n]
that ﬁx 1 ∈ [n] and BS = {∗} is a singleton7 . Then, the counting measure can be taken as
an invariant measure ν.
• Rotation Group. The action of G = O(d) on S = Rd \ {0} as rotation around 0 ∈ Rd
has the decomposition Rd \ {0} = O(d)/O(d − 1) × R+ The cases where G = SO(d)
or S = S d−1 have similar decomposition. Then, the Lebesgue measure can be taken as an
invariant measure ν.
5
HS and HT are not assumed to be normal subgroups.
6
µϕ ≪ ν means that µϕ is absolutely continuous with respect to ν.
7
A singleton is a set with exactly one element.

6
Under review as a conference paper at ICLR 2021

• Translation Group. The action of G = Rd on S = Rd as translation has the trivial

decomposition Rd = Rd /{0} × {∗}. Then, the Lebesgue measure can be taken as an
invariant measure ν.
• Euclidean Group. The action of G = E(d) on S = Rd as isometry has the decomposition
Rd = E(d)/O(d) × {∗}. The case where G = SE(d) has a similar decomposition. Then,
the Lebesgue measure can be taken as an invariant measure ν.
• Scaling Group. The action of G = R>0 on S = Rd \ {0} as scalar multiplication has the
decomposition Rd \{0} = R>0 /{1}×S d−1 . Then, the measure νr ×νS d−1 can be taken as
an invariant measure ν, where the measure νr on R>0 is determined by νr ([a, b]) := log ab
and νS d−1 is a uniform measure on S d−1 .
• Lorentz Group. The action of G = SO+ (d, 1), a subgroup of the Lorentz group O(d, 1),
on the upper half plane8 S = Hd+1 as matrix multiplication has the decomposition Hd+1 =
SO+ (d, 1)/SO(n) × {∗}. Then, the π# (ν + ) can be taken as a left-invariant measure ν,
where ν + is a left-invariant measure on SO+ (d, 1), π : SO+ (d, 1) → SO+ (d, 1)/SO(d) is
a canonical projection, and π# (ν + ) is the pushforward measure.

Inapplicable Cases. We explain some cases where the conversion theorem cannot be applied. First,
similar to the above discussion, we consider the setting where S = T and the actions of G on S
and T are the same. We note that, even if actions of G1 and G2 on S satisfy the conditions in
the conversion theorem, a common invariant measure for both G1 and G2 may not exist. Then,
a group G including G1 and G2 as subgroups does not satisfies (C2). For example, there does
not exist a common invariant measure about the actions of translation and scaling on a Euclidean
space. In particular, the action of the general linear group GL(d) on the Euclidean space does not
have locally-finite left-invariant measure on Rd . Thus, the conversion theorem cannot applied to the
case. Next, as we saw above, our model can handle convolutions on permutation groups, but not
on general finite groups. This depends on whether [n] can be represented by a quotient of G, as we
will see later. This is also the case for tensor expressions of permutations, which require a different
formulation.
Lastly, we consider the case where the actions of G on S and T differ. Here, S and T may and
may not be equal. As a representative case, we consider the invariant case. When the stabilizer in
T satisfies HT = G, a G-equivariant map F : C0 (S) → C0 (T ) is said to be G-invariant. However,
because of the condition HT ⩽ HS in (C1), the conversion theorem cannot apply to the invariant
case as long as HS 6= G. This kind of restriction is similar to existing studies, where the invari-
ant case is separately handled from the equivariant case (Keriven & Peyré (2019); Maehara & NT
(2019); Sannai et al. (2019)). In fact, we can show that the inequality (7) never hold for non-trivial
invariant cases (i.e., HS 6= G and HT = G) as follows: From HT = G, we have BT = T and
RBT = id, and thus, (7) reduces to kΦ − ϕk∞ ≤ ϵ. Here, we note that ϕ is an FNN, which is not
invariant in general, and Φ is a CNN, which is invariant. Thus, Φ cannot approximate non-invariant
ϕ within a small error ϵ. This implies that (7) does not hold for small ϵ. However, whether (8) holds
for the invariant case is an open problem.
Remarks on Conditions (C1) and (C2). We consider the conditions (C1) and (C2).
In (C1), the subgroup HS ⩽ G (resp. HT ) represents the stabilizer group of the action of G on S
(resp. T ). Thus, (C1) requires that the stabilizer group on every point in S (resp. T ) is isomorphic to
the common subgroup HS (resp. HT ). When the group action satisfies some moderate conditions,
such a requirement is known to be satisfied for most points in the set. As a theoretical result, the
principal orbit type theorem (cf. Theorem 1.32, Meinrenken (2003)) guarantees that, if the group
action on a manifold S is proper and S/G is connected, there exist a dense subset S ′ ⊂ S and a
subgroup HS ⊂ G called a principal stabilizer such that the stabilizer group on every point in S ′ is
isomorphic to HS .
Further, (C1) assumes that the sets S and T have the direct product form of some coset G/H and
a base space B. Then, the case where the base space B consists of a single point is equivalent to
the condition that the set is homogeneous. In this sense, (C1) can be regarded as a relaxation of the
homogeneous condition. In many practical cases, a set S on which G acts can be regarded as such
a direct product form. For example, when the action is transitive, the direct product decomposition
8
The upper half plane is defined by Hd+1 := {(x1 , . . . , xd+1 ) ∈ Rd+1 |xd+1 > 0}.

7
Under review as a conference paper at ICLR 2021

trivially holds with the base space that consists of a single point. Even when the set S itself is
not rigorously represented by the direct product form, removing some ”small” subset N ⊂ S, the
complement S \ N can be often represented by the direct form. For example, when G = O(d)
acts on the set S = Rd as rotation around the origin N = {0}, S \ N has a direct product form as
mentioned above. In applications, removing only the small subset N is expected to be negligible.
Next, we provide some remarks on the condition (C2). Let us consider two representative settings of
a set S. The first case is the setting where S is finite. When a G-invariant measure ν has a positive
value on every singleton in S, ν satisfies (C2) for an arbitrary measure µϕ on S. In particular,
the counting measure on S is invariant and satisfies (C2). The second case is the setting where S
is a Euclidean space Rd , and µϕ is the Lebesgue measure. Then, (C2) is satisfied with invariant
measures on the Euclidean space for various group actions, including translation, rotation, scaling,
and an Euclidean group.
Here, we give a general method to construct ν in (C2) for a compact-group action. When µϕ is
locally finite and continuous9 with respect to the action of a compact group G,R the measure ν := νG ∗
µϕ on S for a Haar measure νG on G satisfies (C2), where (νG ∗ µϕ )(A) := G µϕ (g −1 · A)dνG (g).

4 U NIVERSAL A PPROXIMATION T HEOREMS FOR E QUIVARIANT M APS

4.1 U NIVERSAL A PPROXIMATION T HEOREM IN F INITE D IMENSION

We review the universal approximation theorem in ﬁnite-dimensional settings. Cybenko (1989)

derived the following seminal universal approximation theorem in ﬁnite-dimensional settings.
Theorem 10 (Universal Approximation for Continuous Maps by FNNs, Cybenko (1989)). Let an
′
activation function ρ : R → R be non-constant, bounded and continuous. Let F : Rd → Rd be a
continuous map. Then, for any compact E ⊂ Rd and ϵ > 0, there exists a two-layer fully connected
neural network ϕE ∈ NFNN (ρ, 2; [d], [d′ ]) such that kF |E − ϕE k∞ < ϵ.

Since C0 (S) = R|S| for a finite set S, we obtain the following theorem by combining Theorem 9
with Theorem 10.
Theorem 11 (Universal Approximation for Equivariant Continuous Maps by CNNs). Let an acti-
vation function ρ : R → R be non-constant, bounded and Lipschitz continuous. Suppose that a
finite group G acts on finite sets S and T and (C1) in Thoerem 9 holds. Let F : R|S| → R|T | be a
G-equivariant continuous map. For any compact set E ⊂ R|S| and ϵ > 0, there exists a two-layer
convolutional neural network ΦE ∈ NCNN (ρ, 2; |S|, |T |) such that kF |E − ΦE k∞ < ϵ.

We note that Petersen & Voigtlaender (2020) obtained a similar result to Theorem 11 in the case of
ﬁnite groups.
Universality of DeepSets. DeepSets is known as invariant/equivariant models with sets as input and
is known to have universality for invariant/equivariant functions on set permutation (Zaheer et al.
(2017b); Ravanbakhsh (2020)). The equiariant model is a stack of afﬁne transformations with W =
λE + γ1 (1 is the all-one matrix) and bias b = c · (1, ..., 1)⊤ and then an activation function acted
on. Here, we prove the universality of DeepSets as a corollary of Theorem 11. Firstly, we consider
the equivariant model of DeepSets as the one we are dealing with by setting S, T G, H and B as
follows. We set S = T = [n], G = Sn , H = Stab(1) := {s ∈ Sn | s(1) = 1} and B = {∗}, where
{∗} is a singleton. Then we can see that Stab(1) is a subgroup of G and its left cosets G/H = [n].
As a set, Sn /Stab(1) is equal to [n], and the canonical Sn -action on Sn /Stab(1) is equivalent to
the permutation action on [n]. Therefore, C(G/H × B) = C([n]) = Rn holds, and the equivariant
model of our paper is equal to that of DeepSets.
Theorem 12. For any permutation equivariant function F : Rn → Rn , a compact set E ⊂ Rn and
ϵ > 0, there is an equivariant model of DeepSets (or equivalently, our model) ΦE : E → Rn such
that kΦE (x) − F |E (x)k∞ < ϵ.

The proof of Theorem 12 is provided in Section C.

9
A measure µϕ is said to be continuous with respect to the action of a group G if µϕ (g · A) is continuous
with respect to g ∈ G for all Borel set A ⊂ S.

8
Under review as a conference paper at ICLR 2021

4.2 U NIVERSAL A PPROXIMATION T HEOREM IN I NFINITE D IMENSION

Guss & Salakhutdinov (2019) derived a universal approximation theorem for continuous maps
by FNNs in inﬁnite-dimensional settings. However, the universal approximation theorem in
Guss & Salakhutdinov (2019) assumed that the index set S in the input layer and T in the out-
put layer are compact. Combining the conversion theorem with it, we can derive a corresponding
universal approximation theorem for equivariant maps with respect to compact groups. However,
the compactness condition for S and T is a crucial shortcoming to handle the action of non-compact
groups such as translation or scaling. In order to overcome the above obstacle, we can show a novel
universal approximation theorem for Lipschitz maps by FNNs as follows.
Theorem 13 (Universal Approximation for Lipschitz Maps by FNNs). Let an activation function
′
ρ : R → R be continuous and non-polynomial. Let S ⊂ Rd and T ⊂ Rd be domains. Let
F : C0 (S) → C0 (T ) be a Lipschitz map. Then, for any compact E ⊂ C0 (S) and ϵ > 0, there exist
N ∈ N and a two-layer fully connected neural network ϕE = A2 ◦ ρ ◦ A1 ∈ NFNN (ρ, 2; S, T ) such
that A1 [·] = W (1) [·] + b(1) : E → C0 ([N ]) = RN , A2 [·] = W (2) [·] + b(2) : RN → C0 (T ), µϕE is
the Lebesgue measure, and kF |E − ϕE k∞ < ϵ.
′
We provide proof of Theorem 13 in the appendix. We note that S ⊂ Rd and T ⊂ Rd in Theorem
13 are allowed to be non-compact unlike the result in Guss & Salakhutdinov (2019). Combining
Theorem 9 with Theorem 13, we obtain the following theorem.
Theorem 14 (Universal Approximation for Equivariant Lipschitz Maps by CNNs). Let an activation
function ρ : R → R be Lipschitz continuous and non-polynomial. Suppose that a group G acts on
′
S ⊂ Rd and T ⊂ Rd , and (C1) and (C2) in Thoerem 9 hold for the Lebesgue measure µϕ . Let
F : C0 (S) → C0 (T ) be a G-equivariant Lipschitz map. Then, for any compact set E ⊂ C0 (S)
and ϵ > 0, there exists a two-layer convolutional neural network ΦE ∈ NCNN (ρ, 2; S, T ) such that
kF |E − ΦE k∞ < ϵ.

Lastly, we mention some universal approximation theorems for some concrete groups. When a
group G is an Euclidean group E(d) or a special Euclidean group SE(d), Theorem 14 shows that
group CNNs are universal approximators of G-equivariant maps. Although Yarotsky (2018) showed
that group CNNs can approximate SE(2)-equivariant maps, our result for d ≥ 3 was not shown
in existing studies. Since Euclidean groups can be used to represent 3D motion and point cloud,
Theorem 14 can provide the theoretical guarantee of 3D data processing with group CNNs. As
another example, when a group G is SO+ (d, 1), G acts on the upper half plane Hd+1 , which is
shown to be suitable for word representations in NLP (Nickel & Kiela (2017)). Since the action of
G preserves the distance on Hd+1 , group convolution with SO+ (d, 1) may be useful for NLP.

5 C ONCLUSION
We have considered universal approximation theorems for equivariant maps by group CNNs. To
prove the theorems, we showed that an equivariant map is uniquely determined by its generator.
Thus, when we can take a fully-connected neural network to approximate the generator, the approx-
imator of the equivariant map can be described as a group CNN from the conversion theorem. In
this way, the universal approximation for equivariant maps by group CNNs can be obtained through
the universal approximation for the generator by FNNs. We have described FNNs and group CNNs
in an abstract way. In particular, we provided a novel universal approximation theorem by FNNs in
the inﬁnite dimension, where the support of the input functions is unbounded. Using this result, we
obtained the universal approximation theorem for equivariant maps for non-compact groups.
We mention future work. In Theorem 14, we assumed sets S and T to be subspaces of Euclidean
spaces. However, in the conversion theorem (Theorem 9), sets S and T do not need to be subspaces
of Euclidean spaces and may have a more general topological structure. Thus, if there is a universal
approximation theorem in non-Euclidean spaces (Courrieu (2005); Kratsios (2019)), we may be
able to combine it with the conversion theorem and derive its equivariant version. Next, we note the
problem of computational complexity. Although group convolution can be implemented by, e.g.,
discretization and localization as in Finzi et al. (2020), such implementation cannot be applied to
high-dimensional groups due to high computational cost. To use group CNNs for actual machine-
learning problems, it is required to construct effective architecture for practical implementation.

9
Under review as a conference paper at ICLR 2021

R EFERENCES
Andrew R Barron. Approximation and estimation bounds for artiﬁcial neural networks. Machine
learning, 14(1):115–133, 1994.
Taco S Cohen, Mario Geiger, and Maurice Weiler. A general theory of equivariant cnns on homo-
geneous spaces. In Advances in Neural Information Processing Systems, pp. 9142–9153, 2019.
Pierre Courrieu. Function approximation on non-euclidean spaces. Neural Networks, 18(1):91–102,
2005.
George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of control,
signals and systems, 2(4):303–314, 1989.
Marc Finzi, Samuel Stanton, Pavel Izmailov, and Andrew Gordon Wilson. Generalizing convolu-
tional neural networks for equivariance to lie groups on arbitrary continuous data. arXiv preprint
arXiv:2002.12880, 2020.
Ken-Ichi Funahashi. On the approximate realization of continuous mappings by neural networks.
Neural networks, 2(3):183–192, 1989.
Robert Gens and Pedro M Domingos. Deep symmetry networks. In Advances in Neural Information
Processing Systems, pp. 2537–2545, 2014.
Jonathan Gordon, Wessel P Bruinsma, Andrew YK Foong, James Requeima, Yann Dubois, and
Richard E Turner. Convolutional conditional neural processes. arXiv preprint arXiv:1910.13556,
2019.
William H Guss and Ruslan Salakhutdinov. On universal approximation by neural networks
with uniform guarantees on approximation of inﬁnite dimensional maps. arXiv preprint
arXiv:1910.01545, 2019.
Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are uni-
versal approximators. Neural networks, 2(5):359–366, 1989.
Nicolas Keriven and Gabriel Peyré. Universal invariant and equivariant graph neural networks. In
Advances in Neural Information Processing Systems, pp. 7092–7101, 2019.
Risi Kondor and Shubhendu Trivedi. On the generalization of equivariance and convolution in neural
networks to the action of compact groups. arXiv preprint arXiv:1802.03690, 2018.
Anastasis Kratsios. The universal approximation property: Characterizations, existence, and a
canonical topology for deep-learning. arXiv preprint arXiv:1910.03344, 2019.
Mateusz Krukowski. Frechet-kolmogorov-riesz-weil’s theorem on locally compact groups via
arzela-ascoli’s theorem. arXiv preprint arXiv:1801.01898, 2018.
Vvera Kůrková. Kolmogorov s theorem and multilayer neural networks. Neural networks, 5(3):
501–506, 1992.
Takanori Maehara and Hoang NT. A simple proof of the universality of invariant/equivariant graph
neural networks. arXiv preprint arXiv:1910.03802, 2019.
Haggai Maron, Heli Ben-Hamu, Nadav Shamir, and Yaron Lipman. Invariant and equivari-
ant graph networks. In International Conference on Learning Representations, 2019a. URL
https://openreview.net/forum?id=Syx72jC9tm.
Haggai Maron, Ethan Fetaya, Nimrod Segol, and Yaron Lipman. On the universality of invariant
networks. Proceedings of the 36th International Conference on Machine Learning, 97, 2019b.
Haggai Maron, Or Litany, Gal Chechik, and Ethan Fetaya. On learning sets of symmetric elements.
arXiv preprint arXiv:2002.08599, 2020.
Eckhard Meinrenken. Group actions on manifolds. Lecture Notes, University of Toronto, Spring,
2003, 2003.

10
Under review as a conference paper at ICLR 2021

Maximillian Nickel and Douwe Kiela. Poincaré embeddings for learning hierarchical representa-
tions. In Advances in neural information processing systems, pp. 6338–6347, 2017.
Philipp Petersen and Felix Voigtlaender. Equivalence of approximation by convolutional neural
networks and fully-connected networks. Proceedings of the American Mathematical Society, 148
(4):1567–1581, 2020.
Siamak Ravanbakhsh. Universal equivariant multilayer perceptrons. arXiv preprint
arXiv:2002.02912, 2020.
Akiyoshi Sannai, Yuuki Takai, and Matthieu Cordonnier. Universal approximations of permutation
invariant/equivariant functions by deep neural networks. arXiv preprint arXiv:1903.01939, 2019.
John Shawe-Taylor. Building symmetries into feedforward networks. In 1989 First IEE Interna-
tional Conference on Artiﬁcial Neural Networks,(Conf. Publ. No. 313), pp. 158–162. IET, 1989.
Sho Sonoda and Noboru Murata. Neural network with unbounded activation functions is universal
approximator. Applied and Computational Harmonic Analysis, 43(2):233–268, 2017.
Dmitry Yarotsky. Universal approximations of invariant maps by neural networks. arXiv preprint
arXiv:1804.10306, 2018. URL: https://arxiv.org/abs/1804.10306.
Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan R Salakhutdinov,
and Alexander J Smola. Deep sets. In Advances in neural information processing systems, pp.
3391–3401, 2017a.
Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Russ R Salakhutdinov, and
Alexander J Smola. Deep sets. In Advances in neural information processing systems, pp. 3391–
3401, 2017b.

Lesson 5 Theyre Growing
No ratings yet
Lesson 5 Theyre Growing
16 pages
Group Equivariant Convolutional Networks: Taco S. Cohen
No ratings yet
Group Equivariant Convolutional Networks: Taco S. Cohen
12 pages
1804.10306v1
No ratings yet
1804.10306v1
64 pages
2406.03946v3
No ratings yet
2406.03946v3
62 pages
7253_Emergence_of_Equivariance
No ratings yet
7253_Emergence_of_Equivariance
24 pages
Deep Neural Network Approximation of Invariant Functions Through Dynamical Systems
No ratings yet
Deep Neural Network Approximation of Invariant Functions Through Dynamical Systems
57 pages
Equivariant Transformer Networks: Kai Sheng Tai Peter Bailis Gregory Valiant
No ratings yet
Equivariant Transformer Networks: Kai Sheng Tai Peter Bailis Gregory Valiant
15 pages
A survey on UATs
No ratings yet
A survey on UATs
10 pages
Understanding Deep Convolutional Networks
No ratings yet
Understanding Deep Convolutional Networks
17 pages
2782_On_the_Generalization_of_
No ratings yet
2782_On_the_Generalization_of_
28 pages
Deep Neural Networks IID
No ratings yet
Deep Neural Networks IID
36 pages
Approximation by Superpositions of Sigmoidal Function
No ratings yet
Approximation by Superpositions of Sigmoidal Function
12 pages
Cybenko 1989 Aka Neural Network Can Approximate Continuous Functions PDF
No ratings yet
Cybenko 1989 Aka Neural Network Can Approximate Continuous Functions PDF
12 pages
Understanding Deep Convolutional Networks: Review
No ratings yet
Understanding Deep Convolutional Networks: Review
16 pages
Highlights
No ratings yet
Highlights
24 pages
Deep Neural Network Approximation Theory
No ratings yet
Deep Neural Network Approximation Theory
80 pages
OPERATOR LEARNING ALGORITHMS AND ANALYSIS
No ratings yet
OPERATOR LEARNING ALGORITHMS AND ANALYSIS
36 pages
EGNN综述
No ratings yet
EGNN综述
9 pages
kurkova Kolmogorov's Theorem and Multilayer Neural Networks
No ratings yet
kurkova Kolmogorov's Theorem and Multilayer Neural Networks
6 pages
36_neural_operator_graph_kernel_n
No ratings yet
36_neural_operator_graph_kernel_n
21 pages
Neural Network Theory22
No ratings yet
Neural Network Theory22
60 pages
17-AAP1328
No ratings yet
17-AAP1328
59 pages
On The Equivalence Between Graph Isomorphism Testing and Function Approximation With Gnns
No ratings yet
On The Equivalence Between Graph Isomorphism Testing and Function Approximation With Gnns
22 pages
transformer_turing
No ratings yet
transformer_turing
13 pages
Categorical Deep Learning - An Algebraic Theory of Architectures
No ratings yet
Categorical Deep Learning - An Algebraic Theory of Architectures
29 pages
Random Fully Connected Neural Networks As Perturbatively Solvable Hierarchies
No ratings yet
Random Fully Connected Neural Networks As Perturbatively Solvable Hierarchies
58 pages
Exponential Expressivity in Deep Neural Networks Through Transient Chaos
No ratings yet
Exponential Expressivity in Deep Neural Networks Through Transient Chaos
16 pages
Chapter 4
No ratings yet
Chapter 4
44 pages
AIML_ECE_UNIT-5
No ratings yet
AIML_ECE_UNIT-5
48 pages
Universal Approximation to Nonlinear Operators by Neural Networks With Arbitrary Activation Functions and Its Application to Dynamical Systems
No ratings yet
Universal Approximation to Nonlinear Operators by Neural Networks With Arbitrary Activation Functions and Its Application to Dynamical Systems
7 pages
DSA5102_lecture5
No ratings yet
DSA5102_lecture5
45 pages
Elative Representations Enable Zero Shot Latent Space Communication
No ratings yet
Elative Representations Enable Zero Shot Latent Space Communication
20 pages
Group-Equivariant Neural Networks With Fusion Diagrams
No ratings yet
Group-Equivariant Neural Networks With Fusion Diagrams
26 pages
Deep Learning of Conjugate Mappings
No ratings yet
Deep Learning of Conjugate Mappings
30 pages
Geomery of Encoder-Decoder CNN
No ratings yet
Geomery of Encoder-Decoder CNN
15 pages
Euclidean Symmetry and Equivariance in Machine Learning
No ratings yet
Euclidean Symmetry and Equivariance in Machine Learning
4 pages
Best 2a - Manifold-Net
No ratings yet
Best 2a - Manifold-Net
12 pages
dl
No ratings yet
dl
80 pages
Geometric Deep Learning With Grids Groups Graphs Geodesics and Gauges
No ratings yet
Geometric Deep Learning With Grids Groups Graphs Geodesics and Gauges
160 pages
dl_mod4
No ratings yet
dl_mod4
18 pages
NIPS 2017 Principles of Riemannian Geometry in Neural Networks Paper
No ratings yet
NIPS 2017 Principles of Riemannian Geometry in Neural Networks Paper
10 pages
VolterraNet Main PDF
No ratings yet
VolterraNet Main PDF
11 pages
Psychol Limits NN
No ratings yet
Psychol Limits NN
11 pages
Papers_unpacked
No ratings yet
Papers_unpacked
4 pages
Insights-on-the-different-convergences-in-Extreme-Learning_2024_Neurocomputi
No ratings yet
Insights-on-the-different-convergences-in-Extreme-Learning_2024_Neurocomputi
10 pages
VolterraNet A Higher Order Convolutional Network With Group Equivariance
No ratings yet
VolterraNet A Higher Order Convolutional Network With Group Equivariance
12 pages
NeurIPS 2021 Towards Lower Bounds On The Depth of Relu Neural Networks Paper
No ratings yet
NeurIPS 2021 Towards Lower Bounds On The Depth of Relu Neural Networks Paper
13 pages
Intro_DL_03
No ratings yet
Intro_DL_03
55 pages
0820_2000-benitez-NN
No ratings yet
0820_2000-benitez-NN
3 pages
E3nn Euclidean Neural Networks
No ratings yet
E3nn Euclidean Neural Networks
22 pages
Neural Operator - Learning Maps Between Function Spaces
No ratings yet
Neural Operator - Learning Maps Between Function Spaces
93 pages
嵌入原则
No ratings yet
嵌入原则
45 pages
ECE/CS 559 - Neural Networks Lecture Notes #3 Some Example Neural Networks
No ratings yet
ECE/CS 559 - Neural Networks Lecture Notes #3 Some Example Neural Networks
7 pages
The Mathematics of Kolmogorov-Arnold-Networks
No ratings yet
The Mathematics of Kolmogorov-Arnold-Networks
26 pages
Deep ONet
No ratings yet
Deep ONet
22 pages
几何深度学习
No ratings yet
几何深度学习
160 pages
2501.10465v1
No ratings yet
2501.10465v1
10 pages
Mathematics 09 03216 v2
No ratings yet
Mathematics 09 03216 v2
42 pages
Tutorial Math Deep Learning 2018 PDF
No ratings yet
Tutorial Math Deep Learning 2018 PDF
103 pages
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
From Everand
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
Fouad Sabry
No ratings yet
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
From Everand
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
Fouad Sabry
No ratings yet
Cae 1543
No ratings yet
Cae 1543
5 pages
37 B splines+NURBS
No ratings yet
37 B splines+NURBS
12 pages
Esn LBonaventura
No ratings yet
Esn LBonaventura
84 pages
Math 200 Zoom Codes: Classes
No ratings yet
Math 200 Zoom Codes: Classes
1 page
A Composite Deep Learning Access For Leaf Species Classificationinternational Journal of Innovative Technology and Exploring Engineering
No ratings yet
A Composite Deep Learning Access For Leaf Species Classificationinternational Journal of Innovative Technology and Exploring Engineering
4 pages
Solution
No ratings yet
Solution
22 pages
Hyperbolic Functions and Solutions To Second Order Odes
No ratings yet
Hyperbolic Functions and Solutions To Second Order Odes
5 pages
Lecture Notes On Sparse Color-Critical Graphs: Alexandr Kostochka
No ratings yet
Lecture Notes On Sparse Color-Critical Graphs: Alexandr Kostochka
20 pages
Topic 11 Notes: Jeremy Orloff
No ratings yet
Topic 11 Notes: Jeremy Orloff
14 pages
Anna University:: Chennai - 600 025 Knowledge Data Centre Student Data Sheet College of Engineering Guindy
No ratings yet
Anna University:: Chennai - 600 025 Knowledge Data Centre Student Data Sheet College of Engineering Guindy
2 pages
PHD Programme in Science Education 2020: Areas of Research
No ratings yet
PHD Programme in Science Education 2020: Areas of Research
1 page
Book Reviews: A Medium Between A Much
No ratings yet
Book Reviews: A Medium Between A Much
1 page
DiwheelPaper v3
No ratings yet
DiwheelPaper v3
10 pages
AOP Unit 1 chapter 1.pptx
No ratings yet
AOP Unit 1 chapter 1.pptx
58 pages
Ansys Workbook
No ratings yet
Ansys Workbook
37 pages
Surd Equations
No ratings yet
Surd Equations
12 pages
Integral Calculus
83% (6)
Integral Calculus
77 pages
12 Chinese Remainder Theorem
No ratings yet
12 Chinese Remainder Theorem
6 pages
SL RKT
No ratings yet
SL RKT
11 pages
Abu Ja'Far Muhammad Bin Musa Al-Khawarizmi (Father of Algebra and Algorithm)
No ratings yet
Abu Ja'Far Muhammad Bin Musa Al-Khawarizmi (Father of Algebra and Algorithm)
4 pages
Digital Integrated Circuits Lab: Practical FILE
No ratings yet
Digital Integrated Circuits Lab: Practical FILE
55 pages
Users Manual For VRP Spreadsheet Solver v2
No ratings yet
Users Manual For VRP Spreadsheet Solver v2
17 pages
Math Enhancement 7 2ND Quarter Exam
No ratings yet
Math Enhancement 7 2ND Quarter Exam
3 pages
IEEE Standard Test Code For Resistance Measurement
No ratings yet
IEEE Standard Test Code For Resistance Measurement
35 pages
Double Pipe Heat Exchanger
No ratings yet
Double Pipe Heat Exchanger
21 pages
Kumm 2013
No ratings yet
Kumm 2013
8 pages
DIFFERENTIAL EQUATION ENGLISH AND SWAHILI VERSION
No ratings yet
DIFFERENTIAL EQUATION ENGLISH AND SWAHILI VERSION
25 pages
EngineeringEconomicsCeng5191 ByMeleseMengistu
No ratings yet
EngineeringEconomicsCeng5191 ByMeleseMengistu
299 pages
Unit 3
No ratings yet
Unit 3
13 pages
Simulink-Based Simulation of Quadrature Amplitude Modulation (QAM) System
No ratings yet
Simulink-Based Simulation of Quadrature Amplitude Modulation (QAM) System
8 pages
Emily Dickinson’s Electric Love Letters to Susan Gilbert – The Marginalian
No ratings yet
Emily Dickinson’s Electric Love Letters to Susan Gilbert – The Marginalian
1 page
Plastic Limit Analysis
No ratings yet
Plastic Limit Analysis
46 pages
Lag and Lead Compensators
100% (7)
Lag and Lead Compensators
21 pages
CRD Method 1
No ratings yet
CRD Method 1
4 pages
CS3491 Artificial Intelligence and Machine Learning Two Mark Questions 1
No ratings yet
CS3491 Artificial Intelligence and Machine Learning Two Mark Questions 1
23 pages
6.4 - Multiplying and Dividing Rational Expressions-1
No ratings yet
6.4 - Multiplying and Dividing Rational Expressions-1
10 pages
03-01-2021 - JR - IIT - CO-SUPER CHINA & SUPER CHAINA N120 - Jee-Adv - 2017-P2 - KEY & SOL PDF
No ratings yet
03-01-2021 - JR - IIT - CO-SUPER CHINA & SUPER CHAINA N120 - Jee-Adv - 2017-P2 - KEY & SOL PDF
8 pages
Badminton Lesson Plan
No ratings yet
Badminton Lesson Plan
4 pages
Vesper 1.6 User Manual
No ratings yet
Vesper 1.6 User Manual
25 pages
Gate Solution 2023
No ratings yet
Gate Solution 2023
37 pages
Ch.2 Worksheet
No ratings yet
Ch.2 Worksheet
3 pages

1251_universal_approximation_theore

Uploaded by

1251_universal_approximation_theore

Uploaded by

Under review as a conference paper at ICLR 2021

U NIVERSAL A PPROXIMATION T HEOREM

Group symmetry is inherent in a wide variety of data distributions. Data pro-

1.1 R ELATED W ORKS

Symmetry and functional representation. Symmetry is mathematically described in terms of

1.2 PAPER O RGANIZATION AND O UR C ONTRIBUTIONS

We introduce deﬁnitions and terminology used in the later discussion.

2.2 G ROUP E QUIVARIANT M APS

An example of an equivariant map in image processing is provided in Figure 1.

A detailed version of Theorem 3 is proved in Section A.1.

3 F ULLY- CONNECTED AND G ROUP C ONVOLUTIONAL N EURAL N ETWORKS

To deﬁne neural networks, we introduce some notions. A map A : RS → RT is called a bounded

3.2 G ROUP C ONVOLUTIONAL N EURAL N ETWORKS

We introduce the general form of group convolution.

We easily verify the following proposition.

3.3 C ONVERSION T HEOREM

(C2) there exists a G-left-invariant locally ﬁnite measure ν on S such that6 µϕ  ν.

We provide the proof of Theorem 9 in Section B.

• Symmetric Group. The action of G = Sn on S = [n] as permutation has the decomposi-

• Translation Group. The action of G = Rd on S = Rd as translation has the trivial

4 U NIVERSAL A PPROXIMATION T HEOREMS FOR E QUIVARIANT M APS

We review the universal approximation theorem in ﬁnite-dimensional settings. Cybenko (1989)

The proof of Theorem 12 is provided in Section C.

4.2 U NIVERSAL A PPROXIMATION T HEOREM IN I NFINITE D IMENSION

You might also like

(C2) there exists a G-left-invariant locally ﬁnite measure ν on S such that6 µϕ ν.