ST Eik
ST Eik
Abstract
We present new insights and a novel paradigm (StEik) for learning implicit neural
representations (INR) of shapes. In particular, we shed light on the popular eikonal
loss used for imposing a signed distance function constraint in INR. We show
analytically that as the representation power of the network increases, the optimiza-
tion approaches a partial differential equation (PDE) in the continuum limit that is
unstable. We show that this instability can manifest in existing network optimiza-
tion, leading to irregularities in the reconstructed surface and/or convergence to
sub-optimal local minima, and thus fails to capture fine geometric and topological
structure. We show analytically how other terms added to the loss, currently used in
the literature for other purposes, can actually eliminate these instabilities. However,
such terms can over-regularize the surface, preventing the representation of fine
shape detail. Based on a similar PDE theory for the continuum limit, we introduce
a new regularization term that still counteracts the eikonal instability but without
over-regularizing. Furthermore, since stability is now guaranteed in the continuum
limit, this stabilization also allows for considering new network structures that
are able to represent finer shape detail. We introduce such a structure based on
quadratic layers. Experiments on multiple benchmark data sets show that our new
regularization and network are able to capture more precise shape details and more
accurate topology than existing state-of-the-art.1
1 Introduction
Implicit neural representations (INR) [1]–[17], which are neural network representations for implicit
representations of shape, have recently become a powerful tool for modeling shape in learning based
frameworks for surface reconstruction tasks [1]–[3], [10]–[17] in computer vision and graphics.
INRs typically represent a shape as the zero level set of its corresponding signed distance function
(SDF), which is represented with a neural network (e.g., a multi-layer perceptron - MLP). To learn
an INR, one minimizes a loss consisting of a data fidelity term (e.g., fidelity to known points on the
surface, i.e., a point cloud, for the point cloud to surface reconstruction task [1]–[3], [10]–[17]) and
regularization terms. A regularization term used is the eikonal loss [11], which constrains the neural
representation to be an SDF. While existing methods have shown the ability to recover complex
scenes and objects, in many cases as datasets become more complex, finer scale geometric and
topological structures may not always be recovered.
In this paper, we show that the continuum limit of the optimization of neural SDFs as the network
representation power increases (to recover finer scale shape features) can be unstable due to the
eikonal loss. This limits recovery of fine shape details and/or can lead to convergence to sub-optimal
1
Code: https://github.com/sunyx523/StEik
2 Related Work
Shape implicit neural representations (INRs): Traditional approaches for representing 3D shapes
such as meshes pose difficulties in integrating them with deep learning methods. Deep learning ap-
proaches operating directly on traditional representations have limited quality and flexibility. Recently,
neural networks have been proposed to represent shapes and scenes using signed distance functions
(SDF) [1], [10]–[16] or an occupancy functions [2], [3], [17]. They have proven more convenient and
accurate than traditional representations within deep learning-based solutions. DeepSDF [13] was
the first to introduce the use of SDFs in INRs, and was used to represent a collection of shapes. It
regresses on ground truth SDFs. In many applications, such SDF ground truth is difficult to obtain.
Thus, some methods learn INRs directly from raw data, e.g., point clouds (from range scanners) or
2D images (e.g., [3], [16]) in multi-view reconstruction applications.
SAL [1] aims to recover SDFs from point clouds. SALD [10] further improves SAL by incorporating
surface normal data. IGR [11] proposes a loss function based on the eikonal equation, which helps
regularize the learned function towards an SDF. FFN [17] and SIREN [12] introduce high frequencies
into their architecture to avoid bias towards low-frequency solutions in different ways. FFN uses a
ReLU MLP that is paired with a Fourier feature layer. SIREN uses the sine activation. Recently,
DIGS[14] improves the performance of SIREN on shape representation tasks by proposing a soft
constraint on the divergence of the gradient field and proposing a new initialization method. DiGS
is motivated by avoiding the use of normal data, which is not available for many applications.
Approaches above recovering SDFs use the eikonal constraint in training, which we show limits to
2
an unstable PDE, causing artifacts that limit recovery of fine details or convergence to sub-optimal
local minima. We provide a theoretical framework to understand this instability and explain how
some existing approaches can unknowingly mitigate this instability, however, over-regularizing. Our
framework enables us to design a new regularizer that stabilizes the eikonal term while recovering
finer geometric details over existing approaches.
Quadratic Deep Neural Networks: Our new regularization enables us to use new neural networks
for SDFs to represent finer shape details without suffering from destabilizing effects of the eikonal
term that become more prominent in higher capacity networks. We use quadratic layers in INRs,
which is novel, to illustrate this point. Quadratic Deep Neural Networks (QDNNs), proposed back in
1990s [21], [22], have been recently used to enhance the learning capability of Deep Neural Networks
(DNNs) [23]–[27]. Rather than linear functions used in conventional linear layers, a quadratic
function is used. Since compositions of quadratic functions can be higher-order polynomials, such
QDNNs can represent piece-wise polynomial functions. Thus, QDNNs have better model efficiency
because they can approximate polynomial decision boundaries using smaller network depth/width.
However, the improvement is limited when it is applied to Convolution Neural Networks (CNN)
[30]–[35]. We demonstrate that using quadratic neurons in MLPs for representing shapes as implicit
functions is highly effective.
Let u : Ω ⊂ Rn → R be the function that is evolving in the continuum (e.g., level set representation;
the hyper-surface of interest is the zero level set, i.e., {x ∈ Ω : u(x) = 0}). This is the continuum
limit of the typical neural SDF evolutions. Suppose the loss of interest (defined on u) is L. Then the
gradient descent is given by the PDE:
∂u
= −∇L(u), (1)
∂t
where t is the artificial parameter of the evolution (the continuum equivalent of the iteration index),
∇L satisfies the relation δL · δu = ⟨δu, ∇L(u)⟩L2 , and the latter expression is the L2 inner product
between the gradient and the (infinite dimensional) perturbation of u, δu. Note that while we analyze
stability of gradient descent, our analysis also applies to second order optimizers (e.g., Nesterov
momentum) as such optimizers do not change stability properties [20]. Suppose now that u is
parameterized by θ, denoted uθ , as in neural SDFs. We compute the projected gradient descent of the
loss with respect to the parameters θ. Note that if we wish to perturb u according to a perturbation
δθ, then δu = ∂u∂θ · δθ. Therefore,
Z Z
∂u ∂u
δL · δθ = ∇L(u)(x) (x) · δθ dx = ∇L(u)(x) (x) dx · δθ.
Ω ∂θ Ω ∂θ
Thus, the projected gradient descent in parameter-space is
Z
dθ ∂u
=− ∇L(u)(x) (x) dx. (2)
dt Ω ∂θ
The corresponding PDE evolution of the neural representation (in function space) is
∂u ∂u dθ X ∂u ∂u
= =− ∇L(u), , (3)
∂t ∂θ dt i
∂θi ∂θi L2
n o
∂u
where B = ∂θ i
is a basis for the sub-space of the tangent space of function representations that
i
is spanned by the parameterization of the network (e.g., neural SDF). The evolution above is simply a
projection of the continuum gradient of the loss onto the basis of the tangent space formed by the
neural representation.
3
Note that as the neural network representation gains more representational power (more capacity to
represent finer scale and more divers shapes), the basis B approaches spanning the entire tangent
space of functions, i.e., in Rn , and hence the projected PDE approaches the full PDE (1). Therefore,
analyzing the unconstrained PDE (1) gives insight into the neural representation. In the next sub-
sections, we will focus on the notion of stability of the PDEs, which impacts the accuracy of the
neural representation.
3.2 PDE Stability Analysis: Theoretical Framework for Analysis of Neural SDF Optimization
Current approaches for learning a neural signed distance function minimize a loss that consists of a
data fidelity term and regularization. Regularization aims to keep the representation close to a signed
distance function, and can also include terms that regularize the underlying shape (e.g., to keep the
shape smooth). In this sub-section, we will focus on the eikonal loss that is part of the regularization.
A necessary condition for a signed distance function is that it satisfies the eikonal PDE and thus the
eikonal loss penalizes deviation from that constraint:
Z
1 p
|∇u(x)| = 1, x ∈ Ω, =⇒ Leik (u) = ||∇u(x)| − 1| dx, (4)
2 Ω
where sgn is the sign function. The local linearization of this equation is obtained by treating κe as
constant, which is true locally; this results in the linearization:
∂u
= κe ∆u, (6)
∂t
where ∆ denotes Laplacian, and note κe can be positive or negative. When κe < 0, the process
is a backward diffusion, which is ill-posed and therefore fundamentally unstable, regardless of the
numerical implementation scheme to be used. To see this, we may compute the spatial Fourier
transform of the above equation, which yields:
∂ û 2
(t, ω) = −κe |ω|2 û(t, ω) =⇒ û(t, ω) ∝ e−κe |ω| t , (7)
∂t
where ω = (ω1 , . . . , ωn ) is the frequency variable, and û is the Fourier transform of u. Notice that
when κe < 0, the process diverges and so is unstable. Therefore, the projected gradient descent
PDE of the Eikonal loss when u is represented with a (parametric) neural representation can become
unstable as the representational power of the neural SDF increases (approaching the continuum limit).
One may wonder, if the optimization of the Eikonal loss is unstable, why the network optimization
seems to converge. There may be several reasons for this. Firstly, since κe can be positive or negative
at certain locations, the PDE could go from unstable to stable and even oscillate between these two
states without fully blowing up. However, this can cause irregularities in the evolution and recovered
shape (see Figure 1). Second, due to the finite parameterization of neural representations, networks
with less capacity may project to a flow that annihilates some of the unstable components. Lastly, as
we will see in the next sub-section through analysis, various regularization terms introduced (for other
purposes) can have a stabilizing effect. Nevertheless, these approaches can limit the representational
power of the network to represent fine-scale shape details. Our approaches in the next section, built
upon our theory, stabilize while allowing more complex networks to have finer shape representation.
4
w.o. div
w.t. div
We now use our theory to analyze existing methods. In [14], a regularization term is added to the loss
function for training neural SDFs; the loss (called the divergence loss) is as follows:
Z
Ldiv (u) = |∆u(x)|p dx, (8)
Ω\Ω0
where Ω0 are points on the ground truth surface (e.g., points of a point cloud or the zero level set of
the ground truth). The authors observe empirically that the Laplacian of a SDF is close to zero and
thus this is added as a constraint. Although we show in the next section that this is not always or only
partially true, we will now show that this term has another beneficial property, i.e., that it stabilizes
the instability of the eikonal loss gradient descent. The gradient descent PDE for the sum of the
above divergence loss and the eikonal loss (αe Leik + αd Ldiv , where αe , αd > 0 are weights) is
∂u ∆[sgn(∆u)] p = 1
= αe ∇ · [κe ∇u] − αd (9)
∂t ∆[∆u] p=2
which is a fourth-order PDE. Note that in implementations, one would have to approximate the sign
function with a differentiable approximation. We will assume sgn(x) = 2σ(x) − 1, where σ is the
sigmoid function, i.e., the key property is that the approximation is positively sloped near the origin,
and close to a constant away from the origin on either side. Note that the stability of the PDE is
typically dominated by the highest-order terms, which in the above case is stable. To see this, we
linearize the first term as done previously (assuming κe is constant, and approximating sign as linear
near the origin and constant elsewhere). In this case,
κd ∆[∆u](x) ∆u(x) ≈ 0
∆[sgn(∆u)](x) ≈ ,
0 |∆u(x)| ≫ 0
where κd > 0 is the slope of the sign approximation at zero. Therefore, in both p = 1 and p = 2, the
linearization of the PDE (near ∆u = 0 for p = 1 and everywhere for p = 2) is given by
n n
∂u X ∂2u X ∂4u
= αe κe ∆u − αd κd ∆[∆u] = αe κe 2 − α κ
d d . (10)
∂t j=1
∂xi ∂x2j ∂x2k
j,k=1
5
Computing the spatial Fourier transform of the above linearized equation yields:
∂ û
(t, ω) = − αe κe |ω|2 + αd κd |ω|4 û(t, ω) = A(w)û(t, ω) =⇒ û(t, ω) ∝ eA(ω)t .
(11)
∂t
Note that in any local approximation of κd with a constant, κd > 0. Thus, regardless of the sign
of κe , so long as αd is chosen large enough, the set in which A(ω) is positive can be minimized,
and so the process is stable. Thus, besides aiming to enforce the empirically observed property that
the Laplacian of the neural SDF is close to zero, that term also adds stability to the neural SDF
optimization, adding a regularizing effect.
In several works, a term is included to penalize the deviation between the normal to the SDF and the
ground truth normal direction to the surface (or point cloud), which provides further constraints on
the recovered SDF. In some problems this ground truth data is available. In addition to serving as an
additional constraint, for particular forms of that constraint [11], we show that term can stabilize the
eikonal term. The normal constraint is given by the loss:
Z
Lnorm. (u) = |∇u(x) − Ngt |p dx, (12)
Ωo
where Ωo are points on the ground truth surface. The gradient descent of this term is given by
|∇u − Ngt |−1 p = 1
∂u
= ∇ · [κn (∇u − Ngt )], κn = , (13)
∂t 1 p=2
which includes a forward diffusion, which if the weight on this term is chosen larger than −αe κe ,
would stabilize the (unstable) backward diffusion of the eikonal loss.
6
However, we note that there is a component of the Laplacian of a SDF that is zero. Indeed, if we
compute the gradient of both sides of the eikonal equation (|∇u(x)| = 1), we obtain that
∇u(x)
0 = D2 u(x) · = D2 u(x) · ∇u(x), (15)
|∇u(x)|
where D2 u(x) indicates the Hessian of the SDF. Note that the above quantity dotted with ∇u is the
second derivative of u in the normal direction of the level sets, which is a component of the full
Laplacian of u. Hence, we introduce a new loss term as a replacement for the penalty on the full
Laplacian (we refer to this Laplacian normal regularization or directional divergence):
Z
LL. n. (u) = |∇u(x)T D2 u(x) · ∇u(x)| dx. (16)
Ω
This loss enforces the constraint in SDFs that the second derivative in the normal direction is zero,
without enforcing unwanted smoothness by penalizing the fine detail (points of high mean curvature)
of the level sets. This will lead to a fourth-order (non-linear) PDE for its gradient descent. The
gradient descent PDE includes a term that is −∆[∆u], an isotropic fourth order term, which from
the previous analysis would stabilize the lower order eikonal instability. Although the full flow only
regularizes in the normal direction, over the evolution it regularizes over other directions as the
normal vector changes direction, killing the eikonal instability.
New Training Loss: We combine the new stabilizing term with the loss function used in SIREN
without the normal constraint (for more applicability) to form our proposed training loss:
L = αe Leik + αm Lmanifold + αn Lnon manifold + αl LL.n. ,
Z Z
(17)
Lmanifold = |u(x)| dx, Lnon manifold = exp (−α|u(x)|) dx,
Ω0 Ω\Ω0
where α, αe , αn , αm , αl > 0 are hyper-parameters, and Ω0 are known points on the surface of interest
(e.g., point cloud data). Lmanifold penalizes surface points away from the zero level set. Lnon manifold
penalizes points not on the surface of interest from being close to the zero level set. We use p = 1 for
the Eikonal loss, the same as in SIREN[12] and DiGS[14]. For αl we use the annealing strategy as in
DiGS[14]. See supplementary for details.
We now introduce a new neural network representation for SDFs, motivated by our result that allows
stabilizing the eikonal loss even when the representational power of the network increases. Note in
a ReLu MLP, the network represents a piecewise-linear function. Activations partition the domain
where various linear approximations are used. To capture finer details of shape (without resorting to
heavy linear networks), it is natural to leverage more general Taylor series (quadratic and beyond)
approximations to capture the curvature of the shape. Motivated by this observation, we propose
to use quadratic layers rather than linear layers. Notice that the composition of a quadratic with
a quadratic function is a quartic function, and thus composing quadratic layers many times can
approximate any desired order of a Taylor series, even without the use of activations. We still use
activations, however, to partition the domain into regions where different Taylor approximations
are used. Without stabilizing the eikonal term in the optimization, such finer-scale representations
become unstable; thus, our regularization plays a crucial role. Note quadratic layers have been
proposed for neural networks [26]; however, proposing them for shape representation in neural SDFs
is novel to the best of our knowledge. Note also that SIREN [12] uses a sinusoidal activation to
obtain a representation beyond piecewise linear in ReLu MLPs; in that representation, however, the
activation serves to both partition the domain in pieces, and represent each piece with more complex
functions. Quadratic layers allow more complex functions in the pieces, without overloading the
activation with both partitioning the domain and more complex function representation.
As in [26], we define a quadratic layer using the following representation:
a(x) = (W1 x + b1 ) ◦ (W2 x + b2 ) + W3 x2 + b3 , (18)
m1 ×m2 m2 2 m2
where Wj ∈ R , x ∈ R is the input vector, x is the element-wise square, bj ∈ R are
biases, and ◦ denotes the element-wise product. We replace the linear neurons in the SIREN [12]
network with quadratic neurons to obtain a high-order expression for the signed distance function.
For implementation, we use the combination of three linear layer modules in PyTorch.
7
5 Experiments
We now demonstrate the effectiveness of our method on the task of surface reconstruction from point
clouds. For all the experiments in this section, we follow the same mesh generation procedure and
evaluation setting as the state-of-the-art method DiGS [14]. We experiment on three benchmarks:
Surface Reconstruction Benchmark (SRB) [28], ShapeNet [29], and the Scene Reconstruction
Benchmark [12]. We use a network with 5 hidden layers and 128 hidden channel for SRB and
ShapeNet, and we use 8 hidden layers, and 256 channels for scene reconstruction. The number
of training iterations is the same as in DiGS [14], 10k for SRB and ShapeNet, and 100k for scene
reconstruction. We provide all of the training details in the supplementary.
5.2 ShapeNet
We evaluated our method on a preprocessed subset [39], [40] of ShapeNet [29], which consists of
20 shapes in each of 13 categories with only surface point data. Note points are sampled from the
shapes (as in [39]) to simulate point clouds. We compare StEik against the current state-of-the-art
methods on this dataset without using normal data and report the results in Table 2. As criteria for
the benchmark, we consider the Intersection over Union (IoU) and Chamfer Distance between the
reconstructed shapes and the ground truth shapes. The Intersection over Union (IoU) captures the
accuracy of the predicted occupancy function, while the Chamfer Distance captures the accuracy of
the predicted surface. Under both metrics, StEik outperforms all other methods by a large margin.
This demonstrates that StEik is particularly effective for reconstructing thin structures. A visual
example is shown in Figure 3 (see supplementary for more).
Table 2: Quantitative results on the ShapeNet[29] using only point data (no normals).
Ablation Study: Below the middle line in Table 2, we study the effectiveness of each of our novel
contributions (the Laplacian normal regularization and quadratic layers). On 4 out of 6 metrics, our
8
normal Laplacian regularization out-performs the standard Laplacian regularization using quadratic
networks, again showing the utility of our new regularization separately. Note that in both cases
the metrics where the normal Laplacian normal performs worse are just slightly worse compared
to the amount of increase in the other metrics. On all 6 metrics, the quadratic network using the
same Laplacian regularization as DiGs out-performs DiGs, showing the utility of quadratic networks
alone. Note that each of our contributions, i.e., Laplacian normal regularization and quadratic layers,
separately show increase performance against DiGs (except one metric in the linear case) even though
the hyper-parameters were not optimized in the approaches compared against DiGS. The results
suggest that quadratic layers are about better fit to the surface, while the L.n. loss is about regularising
towards the correct shape.
In Figure 4, we show the reconstruction of a room scene point cloud from roughly 10M points and
compare our method with DiGS[14], the current SoTA method without normals. This is the same
scene used in [12] and contains many thin features that are difficult to reconstruct. The surface
produced by DiGS is over-smoothed so that the thin structures like picture frames and sofa legs are
not recovered, while in StEik those fine details are recovered.
9
5.5 Limitations
Due to the lack of efficient implementation of quadratic layers in the deep learning libraries, the in-
crease in training time is not negligible. In addition, there is still improvement space for reconstruction
results, as in some cases the surface is not perfectly recovered.
6 Conclusion
We showed that stability is an important consideration in the design of neural SDF representations. We
showed that the eikonal loss can result in instabilities that can cause artifacts in both the optimization
and the recovered shape, or even converge to sub-optimal local minima. Our theory allows for
understanding the instability and existing methods for neural SDFs in a common framework. Our
framework enabled the construction of a new regularization term for neural SDFs that stabilizes the
instability while avoiding over-regularization. The regularization enabled us to consider finer shape
representations with neural SDFs that are piecewise polynomial while stabilizing the eikonal term.
Empirical results validated our theoretical findings. This work opens up the possibility of exploring
a broader range of geometric regularizations that naturally arise from PDEs, and the possibility of
exploring new finer-scale network representations.
References
[1] M. Atzmon and Y. Lipman, Sal: Sign agnostic learning of shapes from raw data, 2020. arXiv:
1911.10414 [cs.CV].
[2] Y. Lipman, Phase transitions, distance functions, and implicit neural representations, 2021.
arXiv: 2106.07689 [cs.LG].
[3] M. Niemeyer, L. Mescheder, M. Oechsle, and A. Geiger, Differentiable volumetric rendering:
Learning implicit 3d representations without 3d supervision, 2020. arXiv: 1912 . 07372
[cs.CV].
[4] L. Yariv, J. Gu, Y. Kasten, and Y. Lipman, Volume rendering of neural implicit surfaces, 2021.
arXiv: 2106.12052 [cs.CV].
[5] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, Nerf:
Representing scenes as neural radiance fields for view synthesis, 2020. arXiv: 2003.08934
[cs.CV].
[6] J. T. Barron, B. Mildenhall, M. Tancik, P. Hedman, R. Martin-Brualla, and P. P. Srinivasan,
Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields, 2021. arXiv:
2103.13415 [cs.CV].
[7] K. Zhang, G. Riegler, N. Snavely, and V. Koltun, Nerf++: Analyzing and improving neural
radiance fields, 2020. arXiv: 2010.07492 [cs.CV].
[8] J. T. Barron, B. Mildenhall, D. Verbin, P. P. Srinivasan, and P. Hedman, Mip-nerf 360: Un-
bounded anti-aliased neural radiance fields, 2022. arXiv: 2111.12077 [cs.CV].
[9] T. Müller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primitives with a
multiresolution hash encoding,” ACM Transactions on Graphics, vol. 41, no. 4, pp. 1–15,
Jul. 2022. DOI: 10.1145/3528223.3530127. [Online]. Available: https://doi.org/10.
1145%2F3528223.3530127.
[10] M. Atzmon and Y. Lipman, Sald: Sign agnostic learning with derivatives, 2020. arXiv:
2006.05400 [cs.CV].
[11] A. Gropp, L. Yariv, N. Haim, M. Atzmon, and Y. Lipman, Implicit geometric regularization
for learning shapes, 2020. arXiv: 2002.10099 [cs.LG].
[12] V. Sitzmann, J. N. P. Martel, A. W. Bergman, D. B. Lindell, and G. Wetzstein, Implicit neural
representations with periodic activation functions, 2020. arXiv: 2006.09661 [cs.CV].
[13] J. J. Park, P. Florence, J. Straub, R. Newcombe, and S. Lovegrove, Deepsdf: Learning continu-
ous signed distance functions for shape representation, 2019. arXiv: 1901.05103 [cs.CV].
[14] Y. Ben-Shabat, C. H. Koneputugodage, and S. Gould, Digs : Divergence guided shape implicit
neural representation for unoriented point clouds, 2022. arXiv: 2106.10811 [cs.CV].
[15] P. Wang, L. Liu, Y. Liu, C. Theobalt, T. Komura, and W. Wang, Neus: Learning neural
implicit surfaces by volume rendering for multi-view reconstruction, 2023. arXiv: 2106.10689
[cs.CV].
10
[16] L. Yariv, Y. Kasten, D. Moran, et al., Multiview neural surface reconstruction by disentangling
geometry and appearance, 2020. arXiv: 2003.09852 [cs.CV].
[17] M. Tancik, P. P. Srinivasan, B. Mildenhall, et al., Fourier features let networks learn high
frequency functions in low dimensional domains, 2020. arXiv: 2006.10739 [cs.CV].
[18] Y. Sun, D. Lao, G. Sundaramoorthi, and A. Yezzi, “Accelerated PDEs for construction and
theoretical analysis of an SGD extension,” in The Symbiosis of Deep Learning and Differ-
ential Equations, 2021. [Online]. Available: https : / / openreview . net / forum ? id =
j3nedszy5Vc.
[19] Y. Sun, D. LAO, G. Sundaramoorthi, and A. Yezzi, “Surprising instabilities in training deep
networks and a theoretical analysis,” in Advances in Neural Information Processing Systems, S.
Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35, Curran Asso-
ciates, Inc., 2022, pp. 19 567–19 578. [Online]. Available: https://proceedings.neurips.
cc/paper_files/paper/2022/file/7b97adeafa1c51cf65263459ca9d0d7c-Paper-
Conference.pdf.
[20] Y. Sun, D. Lao, G. Sundaramoorthi, and A. Yezzi, Surprising instabilities in training deep
networks and a theoretical analysis, 2023. arXiv: 2206.02001 [cs.LG].
[21] C.-C. Chiang and H.-C. Fu, “A variant of second-order multilayer perceptron and its application
to function approximations,” in [Proceedings 1992] IJCNN International Joint Conference on
Neural Networks, vol. 3, 1992, 887–892 vol.3. DOI: 10.1109/IJCNN.1992.227087.
[22] B.-L. Lu, Y. Bai, H. Kita, and Y. Nishikawa, “An efficient multilayer quadratic perceptron
for pattern classification and function approximation,” in Proceedings of 1993 International
Conference on Neural Networks (IJCNN-93-Nagoya, Japan), vol. 2, 1993, 1385–1388 vol.2.
DOI : 10.1109/IJCNN.1993.716802.
[23] G. Zoumpourlis, A. Doumanoglou, N. Vretos, and P. Daras, “Non-linear convolution filters
for cnn-based learning,” CoRR, vol. abs/1708.07038, 2017. arXiv: 1708.07038. [Online].
Available: http://arxiv.org/abs/1708.07038.
[24] P. Mantini and S. K. Shah, “Cqnn: Convolutional quadratic neural networks,” in 2020 25th
International Conference on Pattern Recognition (ICPR), 2021, pp. 9819–9826. DOI: 10.
1109/ICPR48806.2021.9413207.
[25] G. G. Chrysos, S. Moschoglou, G. Bouritsas, Y. Panagakis, J. Deng, and S. Zafeiriou, Π−Nets:
Deep polynomial neural networks, 2020. arXiv: 2003.03828 [cs.LG].
[26] F. Fan, J. Xiong, and G. Wang, Universal approximation with quadratic deep networks, 2019.
arXiv: 1808.00098 [cs.LG].
[27] Z. Xu, F. Yu, J. Xiong, and X. Chen, Quadralib: A performant quadratic neural network library
for architecture optimization and design exploration, 2022. arXiv: 2204.01701 [cs.LG].
[28] M. Berger, J. A. Levine, L. G. Nonato, G. Taubin, and C. T. Silva, “A benchmark for surface
reconstruction,” ACM Trans. Graph., vol. 32, no. 2, Apr. 2013, ISSN: 0730-0301. DOI: 10.
1145/2451236.2451246. [Online]. Available: https://doi.org/10.1145/2451236.
2451246.
[29] A. X. Chang, T. Funkhouser, L. Guibas, et al., Shapenet: An information-rich 3d model
repository, 2015. arXiv: 1512.03012 [cs.GR].
[30] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
2016.
[31] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image
recognition,” arXiv preprint arXiv:1409.1556, 2014.
[32] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional
networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition,
2017, pp. 4700–4708.
[33] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proceedings
of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1251–1258.
[34] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted
residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision
and pattern recognition, 2018, pp. 4510–4520.
11
[35] N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “Shufflenet v2: Practical guidelines for efficient cnn
architecture design,” in Proceedings of the European conference on computer vision (ECCV),
2018, pp. 116–131.
[36] L. N. Trefethen, “Finite difference and spectral methods for ordinary and partial differential
equations,” 1996.
[37] M. Spivak, A comprehensive introduction to differential geometry. Publish or Perish, Incorpo-
rated, 1970, vol. 4.
[38] L. Ambrosio and H. M. Soner, “Level set approach to mean curvature flow in arbitrary
codimension,” Journal of differential geometry, vol. 43, no. 4, pp. 693–737, 1996.
[39] F. Williams, M. Trager, J. Bruna, and D. Zorin, “Neural splines: Fitting 3d surfaces with
infinitely-wide neural networks,” in Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition, 2021, pp. 9949–9958.
[40] L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, and A. Geiger, Occupancy networks:
Learning 3d reconstruction in function space, 2019. arXiv: 1812.03828 [cs.CV].
[41] W. E. Lorensen and H. E. Cline, “Marching cubes: A high resolution 3d surface construction
algorithm,” SIGGRAPH Comput. Graph., vol. 21, no. 4, pp. 163–169, Aug. 1987, ISSN: 0097-
8930. DOI: 10.1145/37402.37422. [Online]. Available: https://doi.org/10.1145/
37402.37422.
We use the marching cube algorithm[41] to extract the zero level set of the shape INR. The resolution
is 512 and we use the same mesh generation procedure as in IGR[11].
12
Ground Truth Scans
Model Method dC dH dC dH
IGR wo n 1.38 16.33 0.25 2.96
SIREN wo n 0.42 7.67 0.08 1.42
SAL 0.36 7.47 0.13 3.50
Overall IGR+FF 0.96 11.06 0.32 4.75
PHASE+FF 0.22 4.96 0.07 1.56
DiGS 0.19 3.52 0.08 1.47
Our StEik 0.18 2.80 0.10 1.45
IGR wo n 0.45 7.45 0.17 4.55
SIREN wo n 0.72 10.98 0.11 1.27
SAL 0.42 7.21 0.17 4.67
Anchor IGR+FF 0.72 9.48 0.24 8.89
PHASE+FF 0.29 7.43 0.09 1.49
DiGS 0.29 7.19 0.11 1.17
Our StEik 0.26 4.26 0.13 1.12
IGR wo n 4.9 42.15 0.7 3.68
SIREN wo n 0.21 4.37 0.09 1.78
SAL 0.62 13.21 0.11 2.15
Daratech IGR+FF 2.48 19.6 0.74 4.23
PHASE+FF 0.35 7.24 0.08 1.21
DiGS 0.20 3.72 0.09 1.80
Our StEik 0.18 1.72 0.10 1.77
IGR wo n 0.63 10.35 0.14 3.44
SIREN wo n 0.34 6.27 0.06 2.71
SAL 0.18 3.06 0.08 2.82
DC IGR+FF 0.86 10.32 0.28 3.98
PHASE+FF 0.19 4.65 0.05 2.78
DiGS 0.15 1.70 0.07 2.75
Our StEik 0.16 1.73 0.08 2.77
IGR wo n 0.77 17.46 0.18 2.04
SIREN wo n 0.46 7.76 0.08 0.68
SAL 0.45 9.74 0.21 3.84
Gargoyle IGR+FF 0.26 5.24 0.18 2.93
PHASE+FF 0.17 4.79 0.07 1.58
DiGS 0.17 4.10 0.09 0.92
Our StEik 0.18 4.49 0.10 0.87
IGR wo n 0.16 4.22 0.08 1.14
SIREN wo n 0.35 8.96 0.06 0.65
SAL 0.13 4.14 0.07 4.04
Lord Quas IGR+FF 0.49 10.71 0.14 3.71
PHASE+FF 0.11 0.71 0.05 0.74
DiGS 0.12 0.91 0.06 0.70
Our StEik 0.13 1.81 0.07 0.73
Table 4: Additional quantitative results on the Surface Reconstruction Benchmark[28] using only
point data (no normals).
In Figure 5 we provide visualization results for all shapes in SRB. The improvement is not so
dramatic compared to DiGS because this SRB is a relatively easy task without many thin structures
and complex structures, and DiGS already has a good performance. In the Anchor shape, which
is the most difficult one, the edges are much sharper, and the hole is recovered much better in our
reconstruction result.
13
DiGS
Our StEik
(a) Anchor
(b) Daratech (c) DC (d) Garagoyle (e) Lord Quas
A.4 ShapeNet
14
Squared Chamfer
Overall airplane bench
Methods Mean Median Std Mean Median Std Mean Median Std
SIREN wo n 3.08e-4 2.58e-4 3.26e-4 2.42e-4 2.50e-4 5.92e-5 1.93e-4 1.67e-4 9.09e-5
SAL 1.14e-3 2.11e-4 3.63e-3 5.98e-4 2.38e-4 9.22e-4 3.55e-4 1.71e-4 4.26e-4
DiGS 1.32e-4 2.55e-5 4.73e-4 1.32e-5 1.01e-5 7.56e-6 7.26e-5 2.21e-5 1.74e-4
Our StEik 6.86e-5 6.33e-6 3.34e-4 3.33e-6 2.59e-6 1.78e-6 7.90e-6 5.27e-6 9.63e-6
telephone watercraft
Methods Mean Median Std Mean Median Std
SIREN wo n 2.10e-4 1.86e-4 6.60e-5 2.97e-4 2.43e-4 1.26e-4
SAL 1.04e-4 6.81e-5 7.99e-5 8.08e-4 2.06e-4 1.75e-3
DiGS 1.77e-5 1.74e-5 4.49e-6 6.10e-5 2.43e-5 9.03e-5
Our StEik 5.53e-6 4.63e-6 2.61e-6 6.13e-6 4.25e-6 6.53e-6
IoU
Overall airplane bench
Methods Mean Median Std Mean Median Std Mean Median Std
SIREN wo n 0.3085 0.2952 0.2014 0.2248 0.1735 0.1103 0.4020 0.4231 0.1953
SAL 0.4030 0.3944 0.2722 0.1908 0.1693 0.0955 0.2260 0.2311 0.1401
DiGS 0.9390 0.9754 0.1262 0.9613 0.9577 0.0164 0.9061 0.9536 0.1413
Our StEik 0.9671 0.9841 0.0878 0.9814 0.9827 0.0073 0.9607 0.9756 0.0493
telephone watercraft
Methods Mean Median Std Mean Median Std
SIREN wo n 0.3778 0.3806 0.2590 0.3190 0.3007 0.1877
SAL 0.6025 0.6704 0.2203 0.4170 0.4728 0.2367
DiGS 0.9854 0.9876 0.0071 0.9522 0.9735 0.0504
Our StEik 0.9866 0.9883 0.0051 0.9858 0.9894 0.0090
Table 5: Additional quantitative results on the ShapeNet dataset[29] using only point data (no
normals).
15
(a) Ground Truth (b) DiGS (c) Linear + LL.n. (d) Quadratic + Ldiv (e) Our StEik
Figure 6: Additional visual results of ShapeNet.
|∆u(x)|p dx
R
B.1 Gradients for Ldiv (u) = Ω
16
(a)DiGS (b)Our StEik
Figure 7: Visual results of scene reconstruction.
|∇u(x)T D2 u(x)∇u(x)| dx
R
B.2 Gradient for LL.n. (u) = Ω
In the implementation, we normalize the gradient of u to reduce the weight tunning overheads. This
formula is converted as
∇u(x)T D2 u(x)∇u(x)
Z
LL.n (u) = dx (22)
Ω ∥∇u∥2
However, we note that these two expressions are equivalent when the eikonal loss is minimized. We
use the unnormalized version to compute the gradient for simplicity. One may notice that the inner
17
part of equation (22) computes the second order derivative along the normal direction, which equals
to the divergence subtracting the orthonormal tangential components
n−1
X
∆u − tTi D2 uti
i
where ti , i = 1, ..., n − 1 denotes the orthonormal tangent vectors that span the tangent subspace.
Hence we could rewrite equation (22) as
Z n−1
X
LL.n. (u) = ∆u − tTi D2 uti dx (23)
Ω i
Only the term 2 in equation (24) would contain a fourth order term. Therefore we expand term 2 in
the following,
P
n−1 T 2
∂L
Pn−1 T 2
∆u − i ti D uti ∂(∆u) ∂ i t i D uti
∇2 · 2
= ∇2 · Pn−1 T 2 ( 2
− 2
)
∂(D u) |∆u − i ti D uti | ∂(D u) ∂(D u)
P
n−1 T 2
∆u −
Pn−1 T 2
t D uti
∂ i t i D ut i
i
= ∇2 · Pin−1 (I − ) (25)
|∆u − i tTi D2 uti | ∂(D2 u)
From ∆u and I, we can get ∇2 · (∆u) = ∆[∆u] as mentioned in the paper, factored by
P 1 .
|∆u− n−1 tT D 2 ut |
i i i
1
R p
B.3 Gradients for Leik (u) = 2 Ω
|∥∇u∥ − 1| dx
18
n−1
1 X
= uηη + 1 − u ξi ξi (27)
∥∇u∥ i
where the first equality is from the Euler-Lagrange equation. We remove the variable x for simplicity.
Equation (26) is demonstrated in the paper and the remaining part decomposes the second order
derivatives into the normal direction η and the tangential directions ξi . Equation (27) shows that
minimizing the squared eikonal loss comes down to a stable diffusion along the normal direction and
instable diffusion in all tangential directions.
Similarly, for p = 1, we have
∂L ∂L
−∇u Leik = − +∇·
∂u ∂(∇u)
∥∇u∥ − 1 ∇u
=∇·
|∥∇u∥ − 1| ∥∇u∥
sgn (∥∇u∥ − 1)
=∇· ∇u (28)
∥∇u∥
∥∇u∥ − 1 ∥∇u∥ − 1
= ∆u + ∇u · ∇
|∥∇u∥ − 1| ∥∇u∥ |∥∇u∥ − 1| ∥∇u∥
(∥∇u∥−1)2
(∥∇u∥ − 1) ∆u + ∇u · ∇∥∇u∥ − ∥∇u∥−1
∥∇u∥ ∇∥∇u∥ − |∥∇u∥−1|2
∇∥∇u∥
=
|∥∇u∥ − 1| ∥∇u∥
∥∇u∥−1
(∥∇u∥ − 1) ∆u − ∥∇u∥ ∇u · ∇∥∇u∥
=
|∥∇u∥ − 1| ∥∇u∥
∇u
(∥∇u∥ − 1) ∆u − ∥∇u∥ · ∇∥∇u∥
=
|∥∇u∥ − 1| ∥∇u∥
∇uT 2 ∇u
sgn (∥∇u∥ − 1)
= ∆u − ∇ u
∥∇u∥ ∥∇u∥ ∥∇u∥
n−1
!
sgn (∥∇u∥ − 1) X
= ( uξi ξi + uηη ) − uηη
∥∇u∥ i
n−1
sgn (∥∇u∥ − 1) X
= uξi ξi (29)
∥∇u∥ i
Comparing against p = 2, the absolute value of the eikonal loss (p = 1) leads to instable diffusion
along all tangential directions and no constraints in the normal direction.
∥∇u(x) − Ngt ∥p dx
R
B.4 Gradients for Lnorm. (u) = Ωo
∂L ∂L
−∇u Lnorm. = − +∇·
∂u ∂(∇u)
= 2∇ · (∇u − Ngt )
= 2∆u − 2div (Ngt ) (30)
19
Not that factor 2 was omitted in the full paper for simplicity. For p = 1, we have
∂L ∂L
−∇u Lnorm. = − +∇·
∂u ∂(∇u)
∇u − Ngt
=∇·
∥∇u − Ngt ∥
1
= ∆u − ∇ · Ngt
∥∇u − Ngt ∥
T 2
Ngt D u∇u
= ∆u + (31)
∥∇u − Ngt ∥3
Given the equations (29) and (27), both exhibit instability in the tangential directions. While the
coefficients of the diffusion terms are different, it is not straightforward to justify the effectiveness of
one over the other. However, we show empirically that p = 1 achieves better results on SRB, present
in the next subsection.
We investigate the effects of design choices made for regularization terms and report the averages
over all shapes in the dataset in Table 6. We demonstrate that if we choose p = 1 for both first-order
and second-order regularization, the algorithm will achieve the best performance.
GT Scans
Leik LL.n. dC dH dC⃗ dH
⃗
L1 L1 0.180 2.800 0.096 1.454
L1 L2 0.205 4.389 0.105 1.486
L2 L1 0.194 3.917 0.469 1.486
L2 L2 0.217 4.844 0.093 1.483
20
Almost converged Intermediate after instability Final
Figure 8: Instability on linear networks. (left) The evolution is almost converged. (middle) However,
after several additional iterations, instability occurs. We show the intermediate results 50 steps after
the instability. (right) This instability drives the network to a sub-optimal local minimizer.
GT Scans
αl dC dH dC⃗ dH
⃗
10 0.264 6.089 0.099 1.513
50 0.191 3.799 0.096 1.485
100∗ 0.180 2.800 0.096 1.453
200 0.188 3.520 0.097 1.495
300 0.192 3.497 0.102 1.535
400 0.187 3.177 0.102 1.537
500 0.194 3.557 0.098 1.499
Table 7: Varying αl and performance. The relationship between αl and the performance is not salient.
We may mention that the weight needs to be tuned based on different tasks, but a relatively larger
weight is preferred given the annealing strategy.
21