SplineCNN-Fast Geometric Deep Learning with Continuous B-Spline Kernels

Uploaded by

eangmaolin

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

SplineCNN-Fast Geometric Deep Learning with Continuous B-Spline Kernels

Uploaded by

eangmaolin

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

SplineCNN: Fast Geometric Deep Learning with Continuous B-Spline Kernels

Matthias Fey∗ , Jan Eric Lenssen∗ , Frank Weichert, Heinrich Müller

Department of Computer Graphics
TU Dortmund University
{matthias.fey,janeric.lenssen}@udo.edu
∗ Both authors contributed equally to this work.
arXiv:1711.08920v2 [cs.CV] 23 May 2018

Abstract
We present Spline-based Convolutional Neural Networks
(SplineCNNs), a variant of deep neural networks for irreg-
ular structured and geometric input, e.g., graphs or meshes.
Our main contribution is a novel convolution operator
based on B-splines, that makes the computation time inde-
pendent from the kernel size due to the local support prop-
erty of the B-spline basis functions. As a result, we obtain
a generalization of the traditional CNN convolution oper- (a) Filtering of graphs (b) Filtering of meshes
ator by using continuous kernel functions parametrized by
a fixed number of trainable weights. In contrast to related Figure 1: Examples for spatial aggregation in geometric
approaches that filter in the spectral domain, the proposed deep learning with trainable, continuous kernel functions,
method aggregates features purely in the spatial domain. showing methods for (a) image graph representations and
In addition, SplineCNN allows entire end-to-end training (b) meshes.
of deep architectures, using only the geometric structure as
input, instead of handcrafted feature descriptors.
For validation, we apply our method on tasks from the Recently, a set of methods brought together under the
fields of image graph classification, shape correspondence term geometric deep learning [3] emerged, which aim to
and graph node classification, and show that it outperforms achieve this transfer by defining convolution operations for
or pars state-of-the-art approaches while being signifi- deep neural networks that can handle irregular input data.
cantly faster and having favorable properties like domain- Existing work in this field can loosely be divided into two
independence. Our source code is available on GitHub1 . different subsets: the spectral and the spatial filtering ap-
proaches. The former is based on spectral graph theory [5],
where eigenvalues of a graph’s Laplacian matrix are inter-
1. Introduction preted as frequencies of node signals [22]. They are filtered
Most achievements obtained by deep learning methods in the spectral domain, analogously to Fourier domain filter-
over the last years heavily rely on properties of the convo- ing of traditional signals. The latter subset, the spatial ap-
lution operation in convolutional neural networks [14]: lo- proaches, perform convolution in local Euclidean neighbor-
cal connectivity, weight sharing and shift invariance. Since hoods w.r.t. local positional relations between points, repre-
those layers are defined on inputs with a grid-like structure, sented for example as polar, spherical or Cartesian coordi-
they are not trivially portable to non-Euclidean domains nates, as shown as examples in Figure 1.
like discrete manifolds, or (embedded) graphs. However,
a large amount of data in practical tasks naturally comes Contribution. We present Spline-based Convolutional
in the form of such irregular structures, e.g. graph data or Neural Networks (SplineCNNs), a variant of deep neural
meshes. Transferring the high performance of traditional networks for irregular structured data. The main contribu-
convolutional neural networks to this kind of data holds the tion is a trainable, spatial, continuous convolution kernel
potential for large improvements in several relevant tasks. that leverages properties of B-spline bases to efficiently fil-
1 https://github.com/rusty1s/pytorch_geometric ter geometric input of arbitrary dimensionality. We show
that our method Local descriptors for discrete manifolds. The issue of
not representing local positional relations can be tackled
• can be applied on different kinds of irregular structured by using methods that extract representations for local Eu-
data, e.g., arbitrary (embedded) graphs and meshes, clidean neighborhoods from discrete manifolds.
• uses spatial geometric relations of the input, Based on the intrinsic shape descriptors of Kokkinos et
al. [13], Masci et al. [17] present such a method for ex-
• allows for end-to-end training without using hand-
traction of two-dimensional Euclidean neighborhoods from
crafted feature descriptors, and
meshes and propose a convolution operation locally applied
• improves or pars the state-of-the-art in geometric on these neighborhoods. Boscaini et al. [2] improve this
learning tasks. approach by introducing a patch rotation method to align
extracted patches based on the local principal curvatures of
In addition, we provide an efficient GPGPU algorithm and the input mesh.
implementation that allows for fast training and inference
Our convolution operator can but does not have to re-
computation.
ceive those local representations as inputs. Therefore, our
approach is orthogonal to improvements in this field.
2. Related work
Deep learning on graphs. The history of geometric deep
learning began with attempts to generalize convolutional Spatial continuous convolution kernels. While the first
neural networks for graph inputs. A large number of continuous convolution kernels for graphs work in the spec-
successful approaches are based on spectral graph theory. tral domain (e.g. [9, 6, 20]), spatial continuous convolu-
Bruna et al. [4] introduced convolution-like operators on tion kernels for irregular structured data were introduced
spectral graphs, interpreting the eigenvectors of the Lapla- recently as a special case in the fields of neural message
cian as Fourier basis. As an extension, Henaff et al. [9] sug- passing and self-attention mechanisms [8, 23, 18]. Further-
gest to use spline interpolation for smoothing kernels in the more, Monti et al. [18] presented the MoNet framework for
spectral domain. Defferrard et al. [6] approximates spectral interpreting different kind of inputs as directed graphs, on
filters with Chebyshev polynomials, providing a more effi- which we built upon in our work. We show that our kernels
cient filtering algorithm, whose kernel size determines the achieve the same or better accuracy as the trainable Gaus-
range of aggregated local K-neighborhoods. This approach sian mixture model (GMM) kernels of MoNet, while being
was further simplified by Kipf and Welling [12], who con- able to be trained directly on the geometric structure.
sider only the one-neighborhood for one filter application.
A filter based on the Caley transform was proposed as an 3. SplineCNN
alternative for the Chebyshev approximation by Levie et
al. [15]. Together with a trainable zooming parameter, this We define SplineCNNs as a class of deep neural net-
results in a more stable and flexible spectral filter. works that are built using a novel type of spline-based con-
It should be noted that all these spectral approaches as- volutional layer. This layer receives irregular structured
sume that information is only encoded in the connectivity, data, which is mapped to a directed graph, as input. In the
edge weights and node features of the input. While this spatial convolutional layer, node features are aggregated us-
is true for general graphs, it does not hold for embedded ing a trainable, continuous kernel function, which we define
graphs or meshes, where additional information is given in this section.
by relative positions of nodes, which we consider with our
method. 3.1. Preliminaries
A downside of many spectral approaches is the fact
that they use domain-dependent Fourier bases, which re- Input graphs. Similar to the work of Monti et al. [18],
stricts generalization to inputs with identical graph connec- we expect the input of our convolution operator to be a
tivity. Yi et al. [25] tackle this problem by applying a spec- directed graph G = (V, E, U) with V = {1, . . . , N }
tral transformer network that synchronizes the spectral do- being the set of nodes, E ⊆ V × V the set of
N ×N ×d
mains. Since our approach works directly in the spatial do- edges, and U ∈ [0, 1] containing d-dimensional
d
main, it is not prone to this problem. pseudo-coordinates u(i, j) ∈ [0, 1] for each directed edge
For the shape correspondence task on meshes, which (i, j) ∈ E. Note that U can be interpreted as an ad-
we also analyze in this work, Litany et al. [16] present jacency matrix with d-dimensional, normalized entries
d
a siamese network using a soft error criterion based on u(i, j) ∈ [0, 1] if (i, j) ∈ E and 0 otherwise. Also, U is
geodesic distances between nodes. We compare our method usually sparse with E = |E| N 2 entries. For a node
against this specialized method. i ∈ V its neighborhood set is denoted by N (i).
j j lar/spherical coordinates or Cartesian coordinates can be
j ϕ
y y ρ i used, as shown in Figure 2. Independent from the type of
i i x z i coordinates stored in U, our trainable, continuous kernel
x θ θ
ρ
j function, which we define in the following section, maps
each u(i, j) to a scalar that is used as a weight for feature
u(i, j) = (x, y) u(i, j) = (x, y, z) u(i, j) = (ρ, θ) u(i, j) = (ρ, θ, ϕ) aggregation.
Figure 2: Possibilities for pseudo-coordinates u: two- and 3.3. Convolution operator
three-dimensional Cartesian, polar and spherical coordi- We begin with the definition of a continuous kernel func-
nates. Values for scaling and translation of the coordinates tion using B-spline bases, which is parametrized by a con-
d
u to interval [0, 1] are omitted. stant number of trainable control values. The local sup-
port property of B-spline basis functions [19], which states
that basis functions evaluate to zero for all inputs outside
Input node features. Let f : V → RMin , with
of a known interval, proves to be advantageous for efficient
f (i) ∈ RMin , denote a vector of Min input features for
computation and scalability.
each node i ∈ V. For each 1 ≤ l ≤ Min we reference the
Figure 3 visualizes the following kernel construction
set {fl (i) | i ∈ V} as input feature map.
method for differing B-spline basis degree m. We intro-
duce a trainable parameter wp,l ∈ W for each element p
B-spline basis functions. In addition to the input graph from the Cartesian product P = (N1,i m m
)i × · · · × (Nd,i )i of
m m
and node features, let ((N1,i )1≤i≤k , . . . , (Nd,i )1≤i≤k ) the B-spline bases and each of the Min input feature maps,
1 d
denote d open B-spline bases of degree m, based on uni- Qd
indexed by l. This results in K = Min · i=1 ki trainable
form, i.e. equidistant, knot vectors (c.f . Piegl et al. [19]), parameters.
with k = (k1 , . . . , kd ) defining our d-dimensional kernel We define our continuous convolution kernel as func-
size. tions gl : [a1 , b1 ] × · · · × [ad , bd ] → R with
3.2. Main concept gl (u) =
X
wp,l · Bp (u), (1)
Our convolution operator aggregates node features in lo- p∈P
cal neighborhoods weighted by a trainable, continuous ker- with Bp being the product of the basis functions in p:
nel function. The node features f (i) represent features on
d
an irregular geometric structure, whose spatial relations are Y
m
locally defined by the pseudo-coordinates in U. Therefore, Bp (u) = Ni,p i
(ui ). (2)
i=1
when locally aggregating feature values in a node’s neigh-
borhood, the content of U is used to determine how the fea- One way to interpret this kernel is to see the train-
tures are aggregated and the content of f (i) defines what able parameters wp,l as control values for the height
is aggregated. We argue that common inputs for geomet- of a d + 1-dimensional B-spline surface, from which
ric deep learning tasks can be mapped to this model while a weight is sampled for each neighboring point j, de-
preserving relevant information: pending on u(i, j). However, in contrast to traditional
(d + 1)-dimensional B-spline approximation, we only have
• For graphs, V and E are given and U can contain edge one-dimensional control points and approximate functions
weights or, for example, features like the node degree gl : [a1 , b1 ] × · · · × [ad , bd ] → R instead of curves. The
of the target nodes. definition range of gl is the interval in which the partition
• For discrete manifolds, V contains points of the dis- of unity property of the B-spline bases holds [19]. There-
crete manifold, E represents connectivity in local Eu- fore, ai and bi depend on B-spline degree m and kernel size
clidean neighborhoods and U can contain local rela- (k1 , . . . , kd ). We scale the spatial relation vectors u(i, j) to
tional information like polar, spherical or Cartesian co- exactly match this interval, c.f . Figure 3.
ordinates of the target point in respect to the origin Given our kernel functions g = (g1 , . . . , gMin ) and input
point for each edge. node features f , we define our spatial convolution operator
for a node i as
We state no restriction on the values of U, except be-
Min X
ing element of a fixed interval range. Therefore, meshes, 1 X
(f ? g)(i) = fl (j) · gl (u(i, j)). (3)
for example, can be either interpreted as embedded three- |N (i)|
l=1 j∈N (i)
dimensional graphs or as two-dimensional manifolds, us-
ing local Euclidean neighborhood representations like ob- Similar to traditional CNNs, the convolution operator can
tained by the work of Boscaini et al. [2]. Also, either po- be used as a module in a deep neural network architecture,
gl (u)
gl (u)

b2
b2
u2
u2
b1
u1 b1
a2 u1 a2
a1 a1
(a) Linear B-spline basis functions (b) Quadratic B-spline basis functions

Figure 3: Examples of our continuous convolution kernel for B-spline basis degrees (a) m = 1 and (b) m = 2 for kernel
dimensionality d = 2. The heights of the red dots are the trainable parameters for a single input feature map. They are
multiplied by the elements of the B-spline tensor product basis before influencing the kernel value.

which we do in our SplineCNNs. To this end, the opera- u(i, j) fl (j) u(i, j)
tor is applied Mout times on the same input data with dif-
Select Compute
ferent trainable parameters, to obtain a convolutional layer
that produces Mout output feature maps. It should be high- wp1 ,l ∗ Bp1
lighted that, in contrast to self-attention methods, we train
.. .. .. .. ..
an individual set of weights for each combination of input . . . . .
and output feature map.
wps ,l ∗ Bps
W
Local support. Due to the local support property of B-
d
splines, Bp 6= 0 only holds true for s := (m + 1) of the K +
different vectors p ∈ P. Therefore, gl (u) only depends on
fl (0)gl (u(i, 0)) . . . fl (j)gl (u(i, j)) + (fl ? gl )(i)
Min · s of the Min · K trainable parameters for each neighbor
j, where s, d and m are constant and usually small. In ad-
dition, for each pair of nodes (i, j) ∈ E, the vectors p ∈ P Figure 4: Forward computation scheme of the proposed
with Bp 6= 0, which we denote as P(u(i, j)), can be found convolution operation. During the backward step of the
in constant time, given constant m and d. backpropagation algorithm, the gradient flows along the in-
This allows for an alternative representation of the inner verted solid arrows, reaching inputs from W and fl (i).
sums of our convolution operation, c.f . Equation 3, as
X
(fl ? gl )(i) = fl (j) · wp,l · Bp (u(i, j)). (4)
j∈N (i) situation is when u contains angle attributes of polar coor-
p∈P(u(i,j)) dinates. Using closed B-spline approximation in the angle
and K can be replaced by s in the time complexity of the dimension naturally enforces the angle 0 to be evaluated to
operation. Also, Bp (u(i, j)) does not depend on l and can the same weight as the angle 2π or, for higher m, the kernel
therefore be computed once for all input features. Figure 4 function to be continuously differentiable at those points.
shows a scheme of the computation. The gradient flow for The proposed kernels can easily be modified so that they
the backward pass can also be derived by following the solid use closed approximation in an arbitrary subset of the d di-
arrows backwards. mensions, by mapping different p ∈ P to the same trainable
control value wp,l . This leads to a reduction of trainable
Closed B-splines. Depending on the type of coordinate parameters and B-spline basis functions. Referring to Fig-
in vectors u, we use closed B-spline approximation in some ure 3, this approach can be interpreted as periodic repetition
dimensions. One frequently occurring example of such a of the function surface along the corresponding axis.
Root nodes. Up to now, we did not consider the node i Algorithm 1 Geometric convolution with B-spline kernels
of neighborhood N (i) in our convolution operator. It is not Input:
aggregated together with all j ∈ N (i), like it would be N : Number of nodes
the case in traditional CNNs. If Cartesian coordinates are Min : Number of input features per node
used, we can simply define N (i) to include i. However, Mout : Number of output features per node
when using polar/spherical pseudo-coordinates, problems d
s = (m + 1) : Number of non-zero Bp for one edge
arise since the point with zero radius is not well defined. W∈R K×Min ×Mout
: Trainable weights
Therefore, we introduce an additional trainable parameter B ∈ RE×s : Basis products of s weights for each edge
for each feature of the root node and add the product of this P ∈ NE×s : Indices of s weights in W for each edge
parameter and the corresponding feature to the result. Fin ∈ RN ×Min : Input features for each node
Output:
Relation to traditional CNNs. Except for a normaliza- Fout ∈ RN ×Mout : Output features for each node
tion factor, our spline-based convolution operator is a gen- ——————————————————————–
eralization of the traditional convolutional layer in CNNs Gather FE in from Fin based on target nodes of edges
with odd filter size in each dimension. For example, if we Parallelize over e ∈ {1, . . . , E}, o ∈ {1, . . . , Mout }:
assume to have a two-dimensional grid-graph with diago- r←0
nal, horizontal and vertical edges to be the input, B-spline for each i ∈ {1, . . . , Min } do
degree m = 1, kernel size (3, 3), and the vectors u to con- for each p ∈ {1, . . . , s} do
tain Cartesian relations between adjacent nodes, then our w ← W[P[e, p], i, o]
convolution operator is equivalent to a discrete convolution r ← r + (FE in [e, i] · w · B[e, p])
of an image with a kernel of size 3 × 3. This also holds end for
for larger discrete kernels if the neighborhoods of the grid- end for
graph are modified accordingly. FEout [e, o] ← r
Scatter-add FE out to Fout based on origin nodes of edges
4. GPGPU algorithm Return Fout

For the spline-based convolutional layer defined in the

last section, we introduce a GPU algorithm which allows
efficient training and inference with SplineCNNs. For we refer to our PyTorch implementation, which is available
simplicity, we use a tensor indexing notation with, e.g., on GitHub.
A[x, y, z] describing the element at position (x, y, z) of a
tensor A with rank three. Our forward operation of our
convolution operator is outlined in Algorithm 1. Mini-batch handling. For batch learning, parallelization
We achieve parallelization over the edges E by first gath- over a mini-batch can be achieved by creating sparse block
ering edge-wise input features FE in ∈ R
E×Min
from the in- diagonal matrices out of all U of one batch and concatenat-
put matrix Fin ∈ R N ×Min
, using the target node of each ing matrices Fin in the node dimension. For matrices FE in ,
edge as index. Then, we compute edge-wise output fea- B and P, this results in example-wise concatenation in the
tures FEout ∈ R
E×Mout
, as shown in Figure 4, before scatter- edge dimension. Note that this composition allows differ-
adding them back to node-wise features Fout ∈ RN ×Mout , ing number of nodes and edges over examples in one batch
performing the actual neighborhood aggregation. Our al- without introducing redundant computational overhead.
gorithm has a parallel time complexity of O(s · Min ), with
small s, using O(E·Mout ) processors, assuming that scatter-
add is a parallel operation with constant time complexity. 5. Results
We perform experiments with different SplineCNN ar-
Computing B-spline bases. We achieve independence chitectures on three distinct tasks from the fields of image
from the number of trainable weights by computing matri- graph classification (Section 5.1), graph node classification
ces P ∈ NE×s and B ∈ RE×s . P contains the indices of (Section 5.2) and shape correspondence on meshes (Sec-
parameters with Bp 6= 0 while B contains the basis prod- tion 5.3). For each of the tasks, we create a SplineCNN
ucts Bp for these parameters. B and P can be preprocessed using the spline-based convolution operator which we de-
for a given graph structure or can be computed directly in note as SConv(k, Min , Mout ) for a convolutional layer with
the kernel. For the GPU evaluation of the basis functions kernel size k, Min input feature maps and Mout output fea-
required for B we use explicit low-degree polynomial for- ture maps. In addition, we denote fully connected layers as
mulations of those functions for each m. For further details FC(o), with o as number of output neurons.
Accuracy [%] Dataset LeNet5 [14] MoNet [18] SplineCNN
98
Cartesian Grid 99.33% 99.19% 99.22%
Polar Superpixels – 91.11% 95.22%
96

95.22

94.77
94.64

94.62
94 Table 1: Classification accuracy on different representations

94.35

93.83
of the MNIST dataset (grid and superpixel) for a classical
92 CNN (LeNet5), MoNet and our SplineCNN approach.
Linear Quadratic Cubic

(a) MNIST superpixels example (b) Classification accuracy For classification on the grid data, we make use of a
LeNet5-like network architecture [14]: SConv((5, 5),1,32)
Figure 5: MNIST 75 superpixels (a) example and (b) clas- → MaxP(4) → SConv((5, 5),32,64) → MaxP(4) →
sification accuracy of SplineCNN using varying pseudo- FC(512) → FC(10). The initial learning rate was chosen
coordinates and B-spline base degrees. as 10−3 and dropout probability as 0.5. Note that we used
neighborhoods of size 5 × 5 from the grid graph, to mirror
the LeNet5 architecture with its 5 × 5 filters.
5.1. Image graph classification
The superpixel dataset is evaluated using the SplineCNN
For validation on two-dimensional regular and irregular architecture SConv((k1 , k2 ),1,32) → MaxP(4) →
structured input data, we apply our method on the widely- SConv((k1 , k2 ),32,64) → MaxP(4) → AvgP → FC(128) →
known MNIST dataset [14] of 60,000 training and 10,000 FC(10), where AvgP denotes a layer that averages features
test images containing grayscale, handwritten digits from in the node dimension. We use the Exponential Linear
10 different classes. We conduct two different experiments Unit (ELU) as non-linearity after each SConv layer and
on MNIST. For both experiments, we strictly follow the ex- the first FC layer. For Cartesian coordinates, we choose
perimental setup of Defferrard et al. and Monti et al. [6, 18] the kernel size to be k1 = k2 = 4 + m and for polar
to provide comparability. For the first experiment, the coordinates k1 = 1 + m and k2 = 8. Training was done for
MNIST images are represented as a set of equal grid graphs, 20 epochs with a batch size of 64, initial learning rate 0.01
where each node corresponds to one pixel in the original im- and dropout probability 0.5. Both networks were trained
age, resulting in grids of size 28 × 28 with N = 282 = 784 for 30 epochs using the Adam method [11].
nodes. For the second experiment, the MNIST superpixel
dataset of Monti et al. [18] is used, where each image is Discussion. All results of the MNIST experiments are
represented as an embedded graph of 75 nodes defining the shown in Table 1 and Figure 5b. The grid graph experi-
centroids of superpixels, c.f . Figure 5a, with each graph ment results in approximately the same accuracy as LeNet5
having different node positions and connectivities. This ex- and the MoNet method. For the superpixel dataset, we im-
periment is an ideal choice to validate the capabilities of our prove previous results by 4.11 percentage points in accu-
approach on irregular structured, image-based data. racy. Since we are using a similar architecture and the same
input data as MoNet, the better results are an indication that
our operator is able to capture more relevant information in
Pooling. Our SplineCNN architectures use a pooling op-
the structure of the input. This can be explained by the fact
erator based on the Graclus method [7, 6]. The pooling
that, in contrast to the MoNet kernels, our kernel function
operation is able to obtain a coarsened graph by deriving
has individual trainable weights for each combination of in-
a clustering on the graph nodes, aggregating nodes in one
put and output feature maps, just like the filters in traditional
cluster and computing new pseudo-coordinates for each of
CNNs.
those new nodes. We denote a max-pooling layer using this
Results for different configurations are shown in Fig-
algorithm with MaxP(c), with c being the cluster size (and
ure 5b. We only notice small differences in accuracy for
approximate downscaling factor).
varying m and pseudo-coordinates. However, lower m and
using Cartesian coordinates performs slightly better than
Architectures and parameters. For the grid graph exper- the other configurations.
iments, Cartesian coordinates and a B-spline basis degree of In addition, we visualized the 32 learned kernels of the
m = 1 are used to reach equivalence to the traditional con- first SConv layers from the grid and superpixel experiments
volution operator in CNNs, c.f . Section 3.3. In contrast, in Figure 6. It can be observed that edge detecting patterns
we compare all configurations of m and possible pseudo- are learned in both approaches, whether being trained on
coordinates against each other on the superpixel dataset. regular or irregular structured data.
Discussion. Results of our and related methods are shown
in Table 2 and report the mean classification accuracy aver-
aged over 100 experiments. It can be seen that SplineCNNs
(a) MNIST grid experiment improve the state-of-the-art in this experiment by approxi-
mately 1.58 percentage points. We contribute this improve-
ment to the filtering based on u, which contains node de-
grees as additional information to learn more complex ker-
(b) MNIST superpixel experiment nel functions. This indicates that SplineCNNs can be suc-
cessfully applied to irregular but non-geometric data and
Figure 6: Visualizations of the 32 kernels from the first that they are able to improve previous results in this domain.
spline-based convolutional layers, trained on the MNIST (a)
grid and (b) superpixels datasets, with kernel size (5, 5) and 5.3. Shape correspondence
B-spline base degree m = 1. As our last and largest experiment, we validate our
method on a collection of three-dimensional meshes solving
ChebNet [6] GCN [12] CayleyNet [15] SplineCNN the task of shape correspondence similar to [18, 2, 17, 16].
87.12 ± 0.60 87.17 ± 0.58 87.90 ± 0.66 89.48 ± 0.31 Shape correspondence refers to the task of labeling each
node of a given shape to the corresponding node of a refer-
ence shape [17]. We use the FAUST dataset [1], containing
Table 2: Graph node classification on the Cora dataset for 10 scanned human shapes in 10 different poses, resulting in
different learning methods (ChebNet, GCN, CayleyNet and a total of 100 non-watertight meshes with 6,890 nodes each.
SplineCNN). The presented accuracy means and standard The first 80 subjects in FAUST were used for training and
deviations are computed over 100 experiments, where for the remaining 20 subjects for testing, following the dataset
each experiment the network was trained for 200 epochs. splits introduced in [18]. Ground truth correspondence of
FAUST meshes are given implicitly, where nodes are sorted
in the exact same order for every example. Correspondence
5.2. Graph node classification quality is measured according to the Princeton benchmark
protocol [10], counting the percentage of derived correspon-
As second experiment, we address the problem of graph dences that lie within a geodesic radius r around the correct
node classification using the Cora citation graph [21]. We node.
validate that our method also performs strongly on datasets, In contrast to similar approaches, e.g. [18, 2, 17, 16], we
where no Euclidean relations are given. Cora consists of go without handcrafted feature descriptors as inputs, like the
2,708 nodes and 5,429 undirected unweighted edges, rep- local histogram of normal vectors known as SHOT descrip-
resenting scientific publications and citation links respec- tors [24], and force the network to learn from the geometry
tively. Each document is represented individually by a (i.e. spatial relations encoded in U) itself. Therefore, in-
1,433 dimensional sparse binary bag-of-words feature vec- put features are trivially given by 1 ∈ RN ×1 . Also, we
tor and is labeled to exactly one out of 7 classes. Similar validate our method on three-dimensional meshes as inputs
to the experimental setup in Levi et al. [15], we split the instead of generating two-dimensional geodesic patches for
dataset into 1,708 nodes for training and 500 nodes for test- each node. These simplifications reduce the computation
ing, to simulate labeled and unlabeled information. time and memory consumption that are required to prepro-
cess the data by a wide margin, making training and infer-
Architecture and parameters. We use a SplineCNN ence completely end-to-end and very efficient.
similar to the network architecture introduced
in [15, 12, 18]: SConv((2),1433,16) → SConv((2),16,7), Architecture and parameters. We apply a Spline-
with ELU activation after the first SConv layer CNN architecture with 6 convolutional layers:
and m = 1. For pseudo-coordinates, we choose SConv((k1 , k2 , k3 ),1,32) → SConv((k1 , k2 , k3 ),32,64) →
the globally normalized degree of the target nodes 4× SConv((k1 , k2 , k3 ),64,64) → Lin(256) → Lin(6890),
u(i, j) = (deg(j)/ maxv∈V deg(v)), leading to filtering where Lin(o) denotes a 1 × 1 convolutional layer to o
based on the number of cites of neighboring publica- output features per node. As non-linear activation function,
tions. Training was done using the Adam optimization ELU is used after each SConv and the first Lin layer.
method [11] for 200 epochs with learning rate 0.01, For Cartesian coordinates we choose the kernel size to
dropout probability 0.5 and L2 regularization 0.005. As be k1 = k2 = k3 = 4 + m and for polar coordinates
loss function, the cross entropy between the network’s k1 = k3 = 4 + m and k2 = 8. We evaluate our method on
softmax output and a one-hot target distribution was used. multiple choices of m = {1, 2, 3}. Training was done for
100 100
Correspondences [%]

Correspondences [%]
95
99

90 GCNN
ACNN
MoNet 98 m=1
85
FMNet Cartesian m=2
SplineCNN Spherical m=3
80 97
0 2 4 6 8 10 0 2 4 6 8 10
Geodesic error [% diameter] Geodesic error [% diameter]

(a) Results of SplineCNN and other methods (b) Results for different SplineCNNs (c) Geodesic error of test examples

Figure 7: Geodesic error plots of the shape correspondence experiments with (a) SplineCNN and related approaches and (b)
different SplineCNN experiments. The x-axis displays the geodesic distance in % of diameter and the y-axis the percentage
of correspondences that lie within a given geodesic radius around the correct node. Our SplineCNN achieves the highest
accuracy for low geodesic error and significantly outperforms other general approaches like MoNet, GCNN and ACNN.
In Figure (c), three examples of the FAUST test dataset with geodesic errors of SplineCNN predictions for each node are
presented. We show the best (left), the median (middle) and worst (right) test example, sorted by average geodesic error.

100 epochs with a batch size of 1, initial learning rate 0.01 layers. However, for this task we do not observe significant
and dropout probability 0.5, using the Adam optimizer [11] improvement in accuracy when using deeper networks.
and cross entropy loss.

Discussion. Obtained accuracies for different geodesic

6. Conclusion
errors are plotted in Figure 7. The results for different We introduced SplineCNN, a spline-based convolutional
SplineCNN parameters match the observations from before, neural network with a novel trainable convolution operator,
where only small differences could be seen but using Carte- which learns on irregular structured, geometric input data.
sian coordinates and small B-spline degrees seemed to be Our convolution filter operates in the spatial domain and
slightly better. Our SplineCNN outperforms all other ap- aggregates local features, applying a trainable continuous
proaches with 99.20% of predictions on the test set hav- kernel function parametrized by trainable B-spline control
ing zero geodesic error. However, the global behavior over values. We showed that SplineCNN is able to improve state-
larger geodesic error bounds is slightly worse in comparison of-the-art results in several benchmark tasks, including im-
to FMNet [16]. In Figure 7c it can be seen that most nodes age graph classification, graph node classification and shape
are classified correctly but that the few false classifications correspondence on meshes, while allowing very fast train-
have a high geodesic error. We contribute this differences ing and inference computation. To conclude, SplineCNN
to the varying loss formulations. While we train against a is the first architecture that allows deep end-to-end learning
one-hot binary vector using the cross entropy loss, FMNet directly from geometric data while providing strong results.
trains using a specialized soft error loss, which is a more ge- Due to missing preprocessing, this allows for even faster
ometrically meaningful criterion that punishes geodesically processing of data.
far-away predictions stronger than predictions near the cor-
rect node [16]. However, it is worth highlighting that we In the future we plan to enhance SplineCNNs by con-
do not use SHOT descriptors as input features, like all other cepts known from traditional CNNs, namely recurrent neu-
approaches we compare against. Instead, we train only on rons for geometric, spatio-temporal data or dynamic graphs,
the geometric structure of the meshes. and un-pooling layers to allow encoder-decoder or genera-
tive architectures.
Performance We report an average forward step runtime
of 0.043 seconds for a single FAUST example processed by Acknowledgments
the suggested SplineCNN architecture (k1 = k2 = k3 = 5,
m = 1) on a single NVIDIA GTX 1080 Ti. We train this This work has been supported by the German Research As-
network in approximately 40 minutes. Regarding scalabil- sociation (DFG) within the Collaborative Research Center
ity, we are able to stack up to 160 SConv((5, 5, 5),64,64) SFB 876, Providing Information by Resource-Constrained
layers before running out of memory on the mentioned Analysis, projects B2 and A6. We also thank Pascal
GPU, while the runtime scales linearly with the number of Libuschewski for proofreading and helpful advice.
References [17] J. Masci, D. Boscaini, M. M. Bronstein, and P. Van-
dergheynst. Geodesic convolutional neural networks on rie-
[1] F. Bogo, J. Romero, M. Loper, and M. J. Black. FAUST: mannian manifolds. In IEEE International Conference on
Dataset and evaluation for 3D mesh registration. In Pro- Computer Vision Workshop (ICCV), pages 832–840, 2015.
ceedings IEEE Conference on Computer Vision and Pattern
[18] F. Monti, D. Boscaini, J. Masci, E. Rodolà, J. Svoboda,
Recognition (CVPR), pages 3794 –3801, 2014.
and M. M. Bronstein. Geometric deep learning on graphs
[2] D. Boscaini, J. Masci, E. Rodolà, and M. Bronstein. Learn- and manifolds using mixture model CNNs. In Proceedings
ing shape correspondence with anisotropic convolutional IEEE Conference on Computer Vision and Pattern Recogni-
neural networks. In Advances in Neural Information Pro- tion (CVPR), pages 5425–5434, 2017.
cessing Systems (NIPS), pages 3189–3197, 2016. [19] L. Piegl and W. Tiller. The NURBS Book. Springer-Verlag
[3] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Van- New York, Inc, 1997.
dergheynst. Geometric deep learning: Going beyond eu- [20] K. T. Schütt, P.-J. Kindermans, H. E. Sauceda, S. Chmiela,
clidean data. IEEE Signal Processing Magazine, pages 18– A. Tkatchenko, and K.-R. Müller. SchNet: A continuous-
42, 2017. filter convolutional neural network for modeling quantum in-
[4] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun. Spectral teractions. In Advances in Neural Information Processing
networks and locally connected networks on graphs. In In- Systems (NIPS), pages 992–1002, 2017.
ternational Conference on Learning Representations (ICLR), [21] P. Sen, G. M. Namata, M. Bilgic, L. Getoor, B. Gallagher,
2014. and T. Eliassi-Rad. Collective classification in network data.
[5] F. R. K. Chung. Spectral Graph Theory. American Mathe- AI Magazine, 29(3):93–106, 2008.
matical Society, 1997. [22] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and
[6] M. Defferrard, X. Bresson, and P. Vandergheynst. Convolu- P. Vandergheynst. The emerging field of signal processing
tional neural networks on graphs with fast localized spectral on graphs: Extending high-dimensional data analysis to net-
filtering. In Advances in Neural Information Processing Sys- works and other irregular domains. IEEE Signal Processing
tems (NIPS), pages 3837–3845, 2016. Magazine, 30(3):83–98, 2013.
[7] I. S. Dhillon, Y. Guan, and B. Kulis. Weighted graph cuts [23] M. Simonovsky and N. Komodakis. Dynamic edge-
without eigenvectors: A multilevel approach. IEEE Trans- conditioned filters in convolutional neural networks on
actions on Pattern Analysis and Machine Intelligence, pages graphs. In Proceedings IEEE Conference on Computer Vi-
1944–1957, 2007. sion and Pattern Recognition (CVPR), pages 29–38, 2017.
[8] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. [24] F. Tombari, S. Salti, and L. Di Stefano. Unique signatures of
Dahl. Neural message passing for quantum chemistry. In histograms for local surface description. In Proceedings of
Proceedings of the 34th International Conference on Ma- the 11th European Conference on Computer Vision (ECCV),
chine Learning (ICML), pages 1263–1272, 2017. pages 356–369, 2010.
[9] M. Henaff, J. Bruna, and Y. LeCun. Deep convolutional [25] L. Yi, H. Su, X. Guo, and L. J. Guibas. SyncSpecCNN: Syn-
networks on graph-structured data. CoRR, abs/1506.05163, chronized spectral CNN for 3D shape segmentation. In Pro-
2015. ceedings IEEE Conference on Computer Vision and Pattern
[10] V. G. Kim, Y. Lipman, and T. Funkhouser. Blended intrinsic Recognition (CVPR), pages 6584–6592, 2017.
maps. ACM Trans. Graph., 30(4):79:1–79:12, July 2011.
[11] D. P. Kingma and J. L. Ba. Adam: A method for stochas-
tic optimization. In International Conference on Learning
Representations (ICLR), 2015.
[12] T. N. Kipf and M. Welling. Semi-supervised classification
with graph convolutional networks. In International Confer-
ence on Learning Representations (ICLR), 2017.
[13] I. Kokkinos, M. M. Bronstein, R. Litman, and A. M. Bron-
stein. Intrinsic shape context descriptors for deformable
shapes. In Proceedings IEEE Conference on Computer Vi-
sion and Pattern Recognition (CVPR), pages 159–166, 2012.
[14] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-
based learning applied to document recognition. In Proceed-
ings of the IEEE, pages 2278–2324, 1998.
[15] R. Levie, F. Monti, X. Bresson, and M. M. Bronstein. Cay-
leyNets: Graph convolutional neural networks with complex
rational spectral filters. CoRR, abs/1705.07664, 2017.
[16] O. Litany, T. Remez, E. Rodolà, A. M. Bronstein, and M. M.
Bronstein. Deep functional maps: Structured prediction for
dense shape correspondence. In IEEE International Confer-
ence on Computer Vision (ICCV), pages 5660–5668, 2017.