Shape As Points A Differentiable Poisson Solver
Shape As Points A Differentiable Poisson Solver
Abstract
In recent years, neural implicit representations gained popularity in 3D reconstruc-
tion due to their expressiveness and flexibility. However, the implicit nature of
neural implicit representations results in slow inference time and requires care-
ful initialization. In this paper, we revisit the classic yet ubiquitous point cloud
representation and introduce a differentiable point-to-mesh layer using a differen-
tiable formulation of Poisson Surface Reconstruction (PSR) that allows for a GPU-
accelerated fast solution of the indicator function given an oriented point cloud.
The differentiable PSR layer allows us to efficiently and differentiably bridge the
explicit 3D point representation with the 3D mesh via the implicit indicator field,
enabling end-to-end optimization of surface reconstruction metrics such as Cham-
fer distance. This duality between points and meshes hence allows us to represent
shapes as oriented point clouds, which are explicit, lightweight and expressive.
Compared to neural implicit representations, our Shape-As-Points (SAP) model
is more interpretable, lightweight, and accelerates inference time by one order of
magnitude. Compared to other explicit representations such as points, patches,
and meshes, SAP produces topology-agnostic, watertight manifold surfaces. We
demonstrate the effectiveness of SAP on the task of surface reconstruction from
unoriented point clouds and learning-based reconstruction.
1 Introduction
Shape representations are central to many of the recent advancements in 3D computer vision and
computer graphics, ranging from neural rendering [43,47,50,58,61] to shape reconstruction [10,28,42,
49, 52, 54, 73]. While conventional representations such as point clouds and meshes are efficient and
well-studied, they also suffer from several limitations: Point clouds are lightweight and easy to obtain,
but do not directly encode surface information. Meshes, on the other hand, are usually restricted to
fixed topologies. More recently, neural implicit representations [10, 42, 52] have shown promising
results for representing geometry due to their flexibility in encoding varied topologies, and their
easy integration with differentiable frameworks. However, as such representations implicitly encode
surface information, extracting the underlying surface is typically slow as they require numerous
network evaluations in 3D space for extracting complete surfaces using marching cubes [10, 42, 52],
or along rays for intersection detection in the context of volumetric rendering [47, 49, 51, 73].
In this work, we introduce a novel Poisson solver which performs fast GPU-accelerated Differentiable
Poisson Surface Reconstruction (DPSR) and solves for an indicator function from an oriented point
cloud in a few milliseconds. Thanks to the differentiablility of our Poisson solver, gradients from a
loss on the output mesh or a loss on the intermediate indicator grid can be efficiently backpropagated
to update the oriented point cloud representation. This differential bridge between points, indicator
∗
Work done while at UC Berkeley.
†
Corresponding authors.
Efficiency Grid Eval Time (1283 ) n/a n/a n/a n/a 0.33s 0.012s
Priors Easy Initialization 4 4 4 8 8 4
Watertight 8 4 4 8 4 4
Quality No Self-intersection n/a n/a 8 8 4 4
Topology-Agnostic 4 4 8 4 4 4
functions, and meshes allows us to represent shapes as oriented point clouds. We therefore call this
shape representation Shape-As-Points (SAP). Compared to existing shape representations, Shape-As-
Points has the following advantages (see also Table 1):
Efficiency: SAP has a low memory footprint as it only requires storing a collection of oriented
point samples at the surface, rather than volumetric quantities (voxels) or a large number of network
parameters for neural implicit representations. Using spectral methods, the indicator field can be
computed efficiently (12 ms at 1283 resolution3 ), compared to the typical rather slow query time
of neural implicit networks (330 ms using [42] at the same resolution). Accuracy: The resulting
mesh can be generated at high resolutions, is guaranteed to be watertight, free from self-intersections
and also topology-agnostic. Initialization: It is easy to initialize SAP with a given geometry such
as template shapes or noisy observations. In contrast, neural implicit representations are harder to
initialize, except for few simple primitives like spheres [1]. See supplementary for more discussions.
To investigate the aforementioned properties, we perform a set of controlled experiments. Moreover,
we demonstrate state-of-the-art performance in reconstructing surface geometry from unoriented
point clouds in two settings: an optimization-based setting that does not require training and is
applicable to a wide range of shapes, and a learning-based setting for conditional shape reconstruction
that is robust to noisy point clouds and outliers. In summary, the main contributions of this work are:
2 Related Work
2.1 3D Shape Representations
3D shape representations are central to 3D computer vision and graphics. Shape representations
can be generally categorized as being either explicit or implicit. Explicit shape representations
and learning algorithms depending on such representations directly parameterize the surface of the
geometry, either as a point cloud [17, 40, 55, 56, 67, 70], parameterized mesh [22, 26, 29, 64] or
surface patches [2, 20, 39, 45, 68, 71, 72]. Explicit representations are usually lightweight and require
few parameters to represent the geometry, but suffer from discretization, the difficulty to represent
On average, our method requires 12 ms for computing a 1283 indicator grid from 15K points on a single
3
NVIDIA GTX 1080Ti GPU. Computing a 2563 indicator grid requires 140 ms.
2
watertight surfaces (point clouds, surface patches), or are restricted to a pre-defined topology (mesh).
Implicit representations, in contrast, represent the shape as a level set of a continuous function
over a discretized voxel grid [14, 27, 35, 69] or more recently parameterized as a neural network,
typically referred to as neural implicit functions [10, 42, 52]. Neural implicit representations have
been successfully used to represent geometries of objects [10, 16, 18, 37, 42, 46, 49, 52, 60, 62, 65, 66]
and scenes [8, 28, 36, 48, 54, 60]. Additionally, neural implicit functions are able to represent radiance
fields which allow for high-fidelity appearance and novel view synthesis [41,47]. However, extracting
surface geometry from implicit representations typically requires dense evaluation of multi-layer
perceptrons, either on a volumetric grid or along rays, resulting in slow inference time. In contrast,
SAP efficiently solves the Poisson Equation during inference by representing the shape as an oriented
point cloud.
2.2 Optimization-based 3D Reconstruction from Point Clouds
Several works have addressed the problem of inferring continuous surfaces from a point cloud.
They tackle this task by utilizing basis functions, set properties of the points, or neural networks.
Early works in shape reconstruction from point clouds utilize the convex hull or alpha shapes for
reconstruction [15]. The ball pivoting algorithm [5] leverages the continuity property of spherical balls
of a given radius. One of the most popular techniques, Poisson Surface Reconstruction (PSR) [30,31],
solves the Poisson Equation and inherits smoothness properties from the basis functions used in
the Poisson Equation. However, PSR is sensitive to the normals of the input points which must be
inferred using a separate preprocessing step. In contrast, our method does not require any normal
estimation and is thus more robust to noise. More recent works take advantage of the continuous
nature of neural networks as function approximators to fit surfaces to point sets [19, 23, 44, 68].
However, these methods tend to be memory and computationally intensive, while our method yields
high-quality watertight meshes in a few milliseconds.
2.3 Learning-based 3D Reconstruction from Point Clouds
Learning-based approaches exploit a training set of 3D shapes to infer the parameters of a recon-
struction model. Some approaches focus on local data priors [2, 28] which typically result in better
generalization, but suffer when large surfaces must be completed. Other approaches learn object-
level [35, 42, 52] or scene-level priors [12, 13, 28, 54]. Most reconstruction approaches directly recon-
struct a meshed surface geometry, though some works [3, 4, 21, 33] first predict point set normals to
subsequently reconstruct the geometry via PSR [30, 31]. However, such methods fail to handle large
levels of noise, since they are unable to move points or selectively ignore outliers. In contrast, our
end-to-end approach is able to address this issue by either moving outlier points to the actual surface
or by selectively muting outliers either by forming paired point clusters that self-cancel or reducing
the magnitude of the predicted normals which controls their influence on the reconstruction.
3 Method
At the core of the Shape-As-Points representation is a differentiable Poisson solver, which can be used
for both optimization-based and learning-based surface estimation. We first introduce the Poisson
solver in Section 3.1. Next, we investigate two applications using our solver: optimization-based 3D
reconstruction (Section 3.2) and learning-based 3D reconstruction (Section 3.3).
3.1 Differentiable Poisson Solver
The key step in Poisson Surface Reconstruction [30, 31] involves solving the Poisson Equation. Let
x ∈ R3 denote a spatial coordinate and n ∈ R3 denote its corresponding normal. The Poisson
Equation arises from the insight that a set consisting of point coordinates and normals {p = (c, n)}
can be viewed as samples of the gradient of the underlying implicit indicator function χ(x) that
describesPthe solid geometry. We define the normal vector field as a superposition of pulse functions
v(x) = (ci ,ni )∈{p} δ(x − ci , ni ), where δ(x, n) = {n if x = 0 and 0 otherwise}. By applying
the divergence operator, the variational problem transforms into the standard Poisson equation:
∇2 χ := ∇ · ∇χ = ∇ · v (1)
In order to solve this set of linear Partial Differential Equations (PDEs), we discretize the function
values and differential operators. Without loss of generality, we assume that the normal vector field
v and the indicator function χ are sampled at r uniformly spaced locations along each dimension.
Denote the spatial dimensionality of the problem to be d. Without loss of generality, we consider
3
the three dimensional case where n := r × r × r for d = 3. We have the indicator function χ ∈ Rn ,
the point normal field v ∈ Rn×d , the gradient operator ∇ : Rn 7→ Rn×d , the divergence operator
(∇·) : Rn×d 7→ Rn , and the derived laplacian operator ∇2 := ∇ · ∇ : Rn 7→ Rn . Under such a
discretization scheme, solving for the indicator function amounts to solving the linear system by
inverting the divergence operator subject to boundary conditions of surface points having zero level
set. Following [30], we fix the overall scale to m = 0.5 at x = 0:
χ = (∇2 )−1 ∇ · v s.t. χ|x∈{c} = 0 and abs(χ|x=0 ) = m (2)
Point Rasterization: We obtain the uniformly discretized point normal field v by rasterizing the
point normals onto a uniformly sampled voxel grid. We can differentiably perform point rasterization
via inverse trilinear interpolation, similar to the approach in [30, 31]. We scatter the point normal
values to the voxel grid vertices, weighted by the trilinear interpolation weights. The point rasterization
process has O(n) space complexity, linear with respect to the number of grid cells, and O(N ) time
complexity, linear with respect to the number of points. See supplementary for details.
Spectral Methods for Solving PSR: In contrast to the finite-element approach taken in [30, 31], we
solve the PDEs using spectral methods [7]. While spectral methods are commonly used in scientific
computing for solving PDEs and in some cases applied to computer vision problems [34], we are the
first to apply them in the context of Poisson Surface Reconstruction. Unlike finite-element approaches
that depend on irregular data structures such as octrees or tetrahedral meshes for discritizing space,
spectral methods can be efficently solved over a uniform grid as they leverage highly optimized Fast
Fourier Transform (FFT) operations that are well supported for GPUs, TPUs, and mainstream deep
learning frameworks. Spectral methods decompose the original signal into a linear sum of functions
represented using the sine / cosine basis functions whose derivatives can be computed analytically.
This allows us to easily approximate differential operators in spectral space. We denote the spectral
domain signals with a tilde symbol, i.e., ṽ = FFT(v). We first solve for the unnormalized indicator
function χ0 , not accounting for boundary conditions
iu · ṽ σ 2 kuk2
χ0 = IFFT(χ̃) χ̃ = g̃σ,r (u) g̃σ,r (u) = exp − 2 (3)
−2πkuk2 r2
where the spectral frequencies are denoted as u := (u, v, w) ∈ Rn×d corresponding to the x, y, z
spatial dimensions, and IFFT(χ̃) represents the inverse fast Fourier transform of χ̃. g̃σ,r (u) is a
Gaussian smoothing kernel of bandwidth σ at grid resolution r in the spectral domain. The Gaussian
kernel is used to mitigate ringing effects as a result of the Gibbs phenomenon from rasterizing
the point normals. We denote the element-wise product as : Rn × Rn 7→ Rn , the L2-norm as
k · k2 : Rn×d 7→ Rn , and the dot product as (·) : Rn×d × Rn×d 7→ Rn . Finally, we subtract by the
mean of the indicator function at the point set and scale the indicator function to obtain the solution
to the PSR problem in Eqn. 2:
m
0 1 X 0
χ= χ − χ | x=c (4)
abs(χ0 |x=0 ) |{c}|
| {z } c∈{c}
scale | {z }
subtract by mean
A detailed derivation of our differentiable PSR solver is provided in the supplementary material.
3.2 SAP for Optimization-based 3D Reconstruction
We can use the proposed differentiable Poisson solver for various applications. First, we consider the
classical task of surface reconstruction from unoriented point clouds. The overall pipeline for this
setting is illustrated in Fig. 1 (top). We now provide details about each component.
Forward pass: It is natural to initialize the oriented 3D point cloud serving as 3D shape representa-
tion using the noisy 3D input points and corresponding (estimated) normals. However, to demonstrate
the flexibility and robustness of our model, we purposefully initialize our model using a generic 3D
sphere with radius r in our experiments. Given the orientated point cloud, we apply our Poisson solver
to obtain an indicator function grid, which can be converted to a mesh using Marching Cubes [38].
Backward pass: For every point pmesh sampled from the mesh M, we calculate a bi-directional L2
Chamfer Distance LCD with respect to the input point cloud. To backpropagate the loss LCD through
4
Shape-As-Points
Optimize
Points and Normals
DPSR Marching
Optimization-based Setting Cubes Target
PSR Sample
Indicator Function
Learning-based Setting Ground Truth
Figure 1: Model Overview. Top: Pipeline for optimization-based single object reconstruction. The
Chamfer loss on the target point cloud is backpropagated to the source point cloud w/ normals for
optimization. Bottom: Pipeline for learning-based surface reconstruction. Unlike the optimization-
based setting, here we provide supervision at the indicator grid level, since we assume access to wa-
tertight meshes for supervision, as is common practice in learning-based single object reconstruction.
pmesh to point p in our source oriented point cloud, we decompose the gradient using the chain rule:
∂LCD ∂LCD ∂pmesh ∂χ
= (5)
∂p ∂pmesh ∂χ ∂p
All terms in (5) are differentialable except for the middle one ∂p∂χ
mesh
which involves Marching Cubes.
However, this gradient can be effectively approximated by the inverse surface normal [59]:
∂pmesh
= −nmesh (6)
∂χ
where nmesh is the normal of the point pmesh . Different from MeshSDF [59] that uses the gradients to
update the latent code of a pretrained implicit shape representation, our method updates the source
point cloud using the proposed differentiable Poisson solver.
Resampling: To increase the robustness of the optimization process, we uniformly resample points
and normals from the largest mesh component every 200 iterations, and replace all points in the
original point clouds with the resampled ones. This resampling strategy eliminates outlier points that
drift away during the optimization, and enforces a more uniform distribution of points. We provide
an ablation study in supplementary.
5
from these meshes. Fig. 1 (bottom) illustrates the pipeline of our architecture for the learning-based
surface reconstruction task.
Architecture: We first encode the unoriented input point cloud coordinates {c} into a feature φ.
The resulting feature should encapsulate both local and global information about the input point
cloud. We utilize the convolutional point encoder proposed in [54] for this purpose. Note that in the
following, we will use φθ (c) to denote the features at point c, dropping the dependency of φ on the
remaining points {c} for clarity. Also, we use θ to refer to network parameters in general.
Given their features, we aim to estimate both offsets and normals for every input point c in the point
cloud {c}. We use a shallow Multi-Layer Perceptron (MLP) fθ to predict the offset for c:
∆c = fθ (c, φθ (c)) (7)
where φ(c) is obtained from the feature volume using trilinear interpolation. We predict k offsets
per input point, where k ≥ 1. We add the offsets ∆c to the input point position c and call the
updated point position ĉ. Additional offsets allow us to densify the point cloud, leading to enhanced
reconstruction quality. We choose k = 7 for all learning-based reconstruction experiments (see
ablation study in Table 4). For each updated point ĉ, we use a second MLP gθ to predict its normal:
n̂ = gθ (ĉ, φθ (ĉ)) (8)
We use the same decoder architecture as in [54] for both fθ and gθ . The network comprises 5 layers
of ResNet blocks with a hidden dimension of 32. These two networks fθ and gθ do not share weights.
Training and Inference: During training, we obtain the estimated indicator grid χ̂ from the
predicted point clouds (ĉ, n̂) using our differentiable Poisson solver. Since we assume watertight
and noise-free meshes for supervision, we acquire the ground truth indicator grid by running PSR
on a densely sampled point clouds of the ground truth meshes with the corresponding ground truth
normals. This avoids running Marching Cubes at every iteration and accelerates training. We use the
Mean Square Error (MSE) loss on the predicted and ground truth indicator grid:
LDPSR = kχ̂ − χk2 (9)
We implement all models in PyTorch [53] and use the Adam optimizer [32] with a learning rate of
5e-4. During inference, we use our trained model to predict normals and offsets, use DPSR to solve
for the indicator grid, and run Marching Cubes [38] to extract meshes.
4 Experiments
Following the exposition in the previous section, we conduct two types of experiments to evaluate
our method. First, we perform single object reconstruction from unoriented point clouds. Next, we
apply our method to learning-based surface reconstruction on ShapeNet [9], using noisy point clouds
with or without outliers as inputs.
Datasets: We use the following datasets for optimization-based reconstruction: 1) Thingi10K [74],
2) Surface reconstruction benchmark (SRB) [68] and 3) D-FAUST [6]. Similar to prior works, we
use 5 objects per dataset [19, 23, 68]. For learning-based object-level reconstruction, we consider all
13 classes of the ShapeNet [9] subset, using the train/val/test split from [11].
6
Thingi10K
Thingi10K
SRB
D-FAUST
Figure 2: Optimization-based 3D Reconstruction. Input point clouds are downsampled for visual-
ization. Note that the ground truth of SRB is provided as point clouds.
Dataset Method Chamfer-L1 (↓) F-Score (↑) Normal C. (↑) Time (s)
IGR [19] 0.440 0.505 0.692 1842.3
Point2Mesh [23] 0.109 0.656 0.806 3714.7
Thingi10K
SPSR [31] 0.223 0.787 0.896 9.3
Ours 0.054 0.940 0.947 370.1
IGR [19] 0.178 0.755 – 1847.6
Point2Mesh [23] 0.116 0.648 – 4707.9
SRB
SPSR [31] 0.232 0.735 – 9.2
Ours 0.076 0.830 – 326.0
IGR [19] 0.235 0.805 0.911 1857.2
Point2Mesh [23] 0.071 0.855 0.905 3678.7
D-FAUST
SPSR [31] 0.044 0.966 0.965 4.3
Ours 0.043 0.966 0.959 379.9
calculate LDPSR on the indicator grid, we directly supervise the point positions with a bi-directional
Chamfer distance, and an L1 Loss on the normals as done in [39]. During inference, we also feed the
predicted points and normals to our PSR solver and run Marching Cubes to obtain meshes.
Metrics: We consider Chamfer Distance, Normal Consistency and F-Score with the default threshold
of 1% for evaluation, and also report optimization & inference time.
4.1 Optimization-based 3D Reconstruction
In this part, we investigate whether our method can be used for the single-object surface reconstruction
task from unoriented point clouds or scans. We consider three different types of 3D inputs: point
clouds sampled from synthetic meshes [74] with Gaussian noise, real-world scans [68], and high-
resolution raw scans of humans with comparably little noise [6].
Fig. 2 and Table 2 show that our method achieves superior performance compared to both classical
methods and network-based approaches. Note that the objects considered in this task are challenging
7
Low Noise
High Noise
Outliers
Input SPSR [31] 3D-R2N2 [11] AtlasNet [20] ConvONet [54] Ours GT mesh
due to their complex geometry, thin structures, noisy and incomplete observations. While some
of the baseline methods fail completely on these challenging objects, our method achieves robust
performance across all datasets.
In particular, Fig. 2 shows that IGR occasionally creates meshes in free space, as this is not penalized
by its optimization objective when point clouds are unoriented. Both, Point2Mesh and our method
alleviate this problem by optimizing for the Chamfer distance between the estimated mesh and the
input point clouds. However, Point2Mesh requires an initial mesh as input of which the topology
cannot be changed during optimization. Thus, it relies on SPSR to provide an initial mesh for objects
with genus larger than 0 and suffers from inaccurate initialization [23]. Furthermore, compared to
both IGR and Point2Mesh, our method converges faster.
While SPSR is even more efficient, it suffers from incorrect normal estimation on noisy input point
clouds, which is a non-trivial task on its own. In contrast, our method demonstrates more robust
behavior as we optimize points and normals guided by the Chamfer distance. Note that in this single
object reconstruction task, our method is not able to complete large unobserved regions (e.g., the
bottom of the person’s feet in Fig. 2 is unobserved and hence not completed). This limitation can be
addressed using learning-based object-level reconstruction as discussed next.
To analyze whether our proposed differentiable Poisson solver is also beneficial for learning-based
reconstruction, we evaluate our method on the single object reconstruction task using noise and outlier-
augmented point clouds from ShapeNet as input to our method. We investigate the performance for
three different noise levels: (a) Gaussian noise with zero mean and standard deviation 0.005, (b)
Gaussian noise with zero mean and standard deviation 0.025, (c) 50% points have the same noise as
in a) and the other 50% points are outliers uniformly sampled inside the unit cube.
8
Chamfer F-Score NormalC
128 3
2563 Offset 1× 0.041 0.952 0.928
Enc. Grid MC Total Enc. Grid MC Total Offset 3× 0.039 0.958 0.934
ConvONet 0.010 0.280 0.037 0.327 0.010 3.798 0.299 4.107 Offset 5× 0.039 0.957 0.934
Ours 0.013 0.012 0.039 0.064 0.019 0.140 0.374 0.533 Offset 7× 0.038 0.959 0.936
2D Enc. 0.043 0.939 0.928
3D Enc. 0.038 0.959 0.936
Table 4: Ablation Study. Left: Runtime breakdown (encoding, grid evaluation, marching cubes) for
ConvONet vs. ours in seconds. Right: Ablation over number of offsets and 2D vs. 3D encoders.
Number of Offsets: From Table 4 we notice that predicting more offsets per input point leads to
better performance. This can be explained by the fact that with more points near the object surface,
geometric details can be better preserved.
Point Cloud Encoder: Here we compare two different point encoder architectures proposed in [54]:
a 2D encoder using 3 canonical planes at a resolution of 642 pixels and a 3D encoder using a feature
volume with a resolution of 323 voxels. We find that the 3D encoder works best in this setting and
hypothesize that this is due to the representational alignment with the 3D indicator grid.
5 Conclusion
We introduce Shape-As-Points, a novel shape representation which is lightweight, interpretable and
produces watertight meshes efficiently. We demonstrate its effectiveness for the task of surface
reconstruction from unoriented point clouds in both optimization-based and learning-based settings.
Our method is currently limited to small scenes due to the cubic memory requirements with respect
to the indicator grid resolution. We believe that processing scenes in a sliding-window manner and
space-adaptive data structures (e.g., octrees) will enable extending our method to larger scenes. Point
cloud-based methods are broadly used in real-world applications ranging from household robots to
self-driving cars, and hence share the same societal opportunities and risks as other learning-based
3D reconstruction techniques.
5
To be consistent, we use the Marching Cubes implementation from [63] for both ConvONet and ours.
9
Acknowledgement: Andreas Geiger was supported by the ERC Starting Grant LEGO-3D (850533)
and the DFG EXC number 2064/1 - project number 390727645. The authors thank the Max Planck
ETH Center for Learning Systems (CLS) for supporting Songyou Peng and the International Max
Planck Research School for Intelligent Systems (IMPRS-IS) for supporting Michael Niemeyer. This
work was supported by an NVIDIA research gift. We thank Matthias Niessner, Thomas Funkhouser,
Hugues Hopp, Yue Wang for helpful discussions in early stages of this project. We also thank Xu
Chen, Christian Reiser, Rémi Pautrat for proofreading.
References
[1] M. Atzmon and Y. Lipman. Sal: Sign agnostic learning of shapes from raw data. In Proc. IEEE
Conf. on Computer Vision and Pattern Recognition (CVPR), 2020. 2, 17
[2] A. Badki, O. Gallo, J. Kautz, and P. Sen. Meshlet priors for 3d mesh reconstruction. In Proc.
IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020. 2, 3
[3] Y. Ben-Shabat and S. Gould. Deepfit: 3d surface fitting via neural network weighted least
squares. In Proc. of the European Conf. on Computer Vision (ECCV), 2020. 3
[4] Y. Ben-Shabat, M. Lindenbaum, and A. Fischer. Nesti-net: Normal estimation for unstructured
3d point clouds using convolutional neural networks. In Proc. IEEE Conf. on Computer Vision
and Pattern Recognition (CVPR), 2019. 3
[5] F. Bernardini, J. Mittleman, H. Rushmeier, C. Silva, and G. Taubin. The ball-pivoting algorithm
for surface reconstruction. IEEE Trans. on Visualization and Computer Graphics (VCG), 1999. 3
[6] F. Bogo, J. Romero, G. Pons-Moll, and M. J. Black. Dynamic FAUST: registering human bodies
in motion. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2017. 6,
7, 15, 18
[7] C. Canuto, M. Y. Hussaini, A. Quarteroni, and T. A. Zang. Spectral methods: fundamentals in
single domains. Springer Science & Business Media, 2007. 4, 14
[8] R. Chabra, J. E. Lenssen, E. Ilg, T. Schmidt, J. Straub, S. Lovegrove, and R. Newcombe. Deep
local shapes: Learning local sdf priors for detailed 3d reconstruction. In Proc. of the European
Conf. on Computer Vision (ECCV), 2020. 3
[9] A. X. Chang, T. A. Funkhouser, L. J. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese,
M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu. Shapenet: An information-rich 3d model
repository. arXiv.org, 1512.03012, 2015. 6
[10] Z. Chen and H. Zhang. Learning implicit fields for generative shape modeling. In Proc. IEEE
Conf. on Computer Vision and Pattern Recognition (CVPR), 2019. 1, 3
[11] C. B. Choy, D. Xu, J. Gwak, K. Chen, and S. Savarese. 3d-r2n2: A unified approach for single
and multi-view 3d object reconstruction. In Proc. of the European Conf. on Computer Vision
(ECCV), 2016. 2, 6, 8, 9, 17, 20, 22
[12] A. Dai, C. Diller, and M. Nießner. SG-NN: sparse generative neural networks for self-supervised
scene completion of RGB-D scans. In Proc. IEEE Conf. on Computer Vision and Pattern
Recognition (CVPR), 2020. 3
[13] A. Dai, C. Diller, and M. Nießner. Sg-nn: Sparse generative neural networks for self-supervised
scene completion of rgb-d scans. In Proc. IEEE Conf. on Computer Vision and Pattern
Recognition (CVPR), 2020. 3
[14] A. Dai, D. Ritchie, M. Bokeloh, S. Reed, J. Sturm, and M. Nießner. Scancomplete: Large-scale
scene completion and semantic segmentation for 3d scans. In Proc. IEEE Conf. on Computer
Vision and Pattern Recognition (CVPR), 2018. 3
[15] H. Edelsbrunner and E. P. Mücke. Three-dimensional alpha shapes. ACM Trans. on Graphics,
1994. 3
[16] P. Erler, P. Guerrero, S. Ohrhallinger, M. Wimmer, and N. J. Mitra. Points2surf: Learning
implicit surfaces from point cloud patches. In Proc. of the European Conf. on Computer Vision
(ECCV), 2020. 3, 21, 22
[17] H. Fan, H. Su, and L. J. Guibas. A point set generation network for 3d object reconstruction
from a single image. Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR),
2017. 2, 6, 8, 9, 17
[18] K. Genova, F. Cole, A. Sud, A. Sarna, and T. A. Funkhouser. Local deep implicit functions for
3d shape. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020. 3
10
[19] A. Gropp, L. Yariv, N. Haim, M. Atzmon, and Y. Lipman. Implicit geometric regularization for
learning shapes. In Proc. of the International Conf. on Machine learning (ICML), 2020. 3, 6, 7,
15, 16
[20] T. Groueix, M. Fisher, V. G. Kim, B. C. Russell, and M. Aubry. AtlasNet: A papier-mâché
approach to learning 3d surface generation. In Proc. IEEE Conf. on Computer Vision and
Pattern Recognition (CVPR), 2018. 2, 6, 8, 9, 17, 22
[21] P. Guerrero, Y. Kleiman, M. Ovsjanikov, and N. J. Mitra. Pcpnet learning local shape properties
from raw point clouds. In Computer Graphics Forum, 2018. 3
[22] K. Gupta and M. Chandraker. Neural mesh flow: 3d manifold mesh generation via diffeomorphic
flows. In Advances in Neural Information Processing Systems (NeurIPS), 2020. 2
[23] R. Hanocka, G. Metzer, R. Giryes, and D. Cohen-Or. Point2mesh: a self-prior for deformable
meshes. In ACM Trans. on Graphics, 2020. 3, 6, 7, 8, 15, 16
[24] J. Huang, H. Su, and L. J. Guibas. Robust watertight manifold surface generation method for
shapenet models. arXiv.org, 1802.01698, 2018. 15
[25] R. R. Jensen, A. L. Dahl, G. Vogiatzis, E. Tola, and H. Aanæs. Large scale multi-view stereopsis
evaluation. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2014.
19, 20
[26] C. Jiang, J. Huang, A. Tagliasacchi, and L. J. Guibas. Shapeflow: Learnable deformation flows
among 3d shapes. In Advances in Neural Information Processing Systems (NeurIPS), 2020. 2
[27] C. Jiang, P. Marcus, et al. Hierarchical detail enhancing mesh-based shape generation with 3d
generative adversarial network. arXiv preprint arXiv:1709.07581, 2017. 3
[28] C. Jiang, A. Sud, A. Makadia, J. Huang, M. Nießner, and T. Funkhouser. Local implicit grid
representations for 3d scenes. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition
(CVPR), 2020. 1, 3
[29] C. M. Jiang, J. Huang, K. Kashinath, Prabhat, P. Marcus, and M. Nießner. Spherical cnns on
unstructured grids. In Proc. of the International Conf. on Learning Representations (ICLR),
2019. 2
[30] M. M. Kazhdan, M. Bolitho, and H. Hoppe. Poisson surface reconstruction. In Proceedings of
the Fourth Eurographics Symposium on Geometry Processing, 2006. 3, 4
[31] M. M. Kazhdan and H. Hoppe. Screened poisson surface reconstruction. ACM Trans. on
Graphics, 32(3):29, 2013. 3, 4, 6, 7, 8, 16, 22
[32] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In Proc. of the
International Conf. on Machine learning (ICML), 2015. 6, 15, 17
[33] J. E. Lenssen, C. Osendorfer, and J. Masci. Deep iterative surface normal estimation. In Proc.
IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020. 3
[34] J. Li and A. O. Hero. A spectral method for solving elliptic equations for surface reconstruction
and 3d active contours. In Proc. IEEE International Conf. on Image Processing (ICIP), 2001. 4
[35] Y. Liao, S. Donne, and A. Geiger. Deep marching cubes: Learning explicit surface representa-
tions. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2018. 3
[36] S. Lionar, D. Emtsev, D. Svilarkovic, and S. Peng. Dynamic plane convolutional occupancy
networks. In Proc. of the IEEE Winter Conference on Applications of Computer Vision (WACV),
2021. 3
[37] S. Liu, Y. Zhang, S. Peng, B. Shi, M. Pollefeys, and Z. Cui. Dist: Rendering deep implicit
signed distance function with differentiable sphere tracing. In Proc. IEEE Conf. on Computer
Vision and Pattern Recognition (CVPR), 2020. 3
[38] W. E. Lorensen and H. E. Cline. Marching cubes: A high resolution 3d surface construction
algorithm. In ACM Trans. on Graphics, 1987. 4, 6
[39] Q. Ma, S. Saito, J. Yang, S. Tang, and M. J. Black. Scale: Modeling clothed humans with
a surface codec of articulated local elements. In Proc. IEEE Conf. on Computer Vision and
Pattern Recognition (CVPR), 2021. 2, 7
[40] Q. Ma, J. Yang, S. Tang, and M. J. Black. The power of points for modeling humans in clothing.
In Proc. of the IEEE International Conf. on Computer Vision (ICCV), 2021. 2
[41] R. Martin-Brualla, N. Radwan, M. S. M. Sajjadi, J. T. Barron, A. Dosovitskiy, and D. Duckworth.
NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections. In Proc. IEEE
Conf. on Computer Vision and Pattern Recognition (CVPR), 2021. 3
[42] L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, and A. Geiger. Occupancy networks:
Learning 3d reconstruction in function space. In Proc. IEEE Conf. on Computer Vision and
Pattern Recognition (CVPR), 2019. 1, 2, 3
11
[43] M. Meshry, D. B. Goldman, S. Khamis, H. Hoppe, R. Pandey, N. Snavely, and R. Martin-
Brualla. Neural rerendering in the wild. In Proc. IEEE Conf. on Computer Vision and Pattern
Recognition (CVPR), 2019. 1
[44] G. Metzer, R. Hanocka, D. Zorin, R. Giryes, D. Panozzo, and D. Cohen-Or. Orienting point
clouds with dipole propagation. ACM Trans. on Graphics, 2021. 3
[45] M. Mihajlovic, S. Weder, M. Pollefeys, and M. R. Oswald. Deepsurfels: Learning online
appearance fusion. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR),
2021. 2
[46] M. Mihajlovic, Y. Zhang, M. J. Black, and S. Tang. LEAP: Learning articulated occupancy of
people. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021. 3
[47] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng. NeRF:
Representing scenes as neural radiance fields for view synthesis. In Proc. of the European Conf.
on Computer Vision (ECCV), 2020. 1, 3
[48] M. Niemeyer and A. Geiger. Giraffe: Representing scenes as compositional generative neural
feature fields. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021. 3
[49] M. Niemeyer, L. Mescheder, M. Oechsle, and A. Geiger. Differentiable volumetric rendering:
Learning implicit 3d representations without 3d supervision. In Proc. IEEE Conf. on Computer
Vision and Pattern Recognition (CVPR), 2020. 1, 3, 19
[50] M. Oechsle, L. Mescheder, M. Niemeyer, T. Strauss, and A. Geiger. Texture fields: Learning
texture representations in function space. In Proc. of the IEEE International Conf. on Computer
Vision (ICCV), 2019. 1
[51] M. Oechsle, S. Peng, and A. Geiger. Unisurf: Unifying neural implicit surfaces and radiance
fields for multi-view reconstruction. In Proc. of the IEEE International Conf. on Computer
Vision (ICCV), 2021. 1, 19
[52] J. J. Park, P. Florence, J. Straub, R. A. Newcombe, and S. Lovegrove. Deepsdf: Learning
continuous signed distance functions for shape representation. In Proc. IEEE Conf. on Computer
Vision and Pattern Recognition (CVPR), 2019. 1, 3
[53] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin,
N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani,
S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. Pytorch: An imperative style,
high-performance deep learning library. In Advances in Neural Information Processing Systems
(NeurIPS), 2019. 6
[54] S. Peng, M. Niemeyer, L. Mescheder, M. Pollefeys, and A. Geiger. Convolutional occupancy
networks. In Proc. of the European Conf. on Computer Vision (ECCV), 2020. 1, 3, 6, 8, 9, 17,
18, 22
[55] C. R. Qi, H. Su, K. Mo, and L. J. Guibas. Pointnet: Deep learning on point sets for 3d classifi-
cation and segmentation. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition
(CVPR), 2017. 2
[56] C. R. Qi, L. Yi, H. Su, and L. J. Guibas. Pointnet++: Deep hierarchical feature learning on point
sets in a metric space. In Advances in Neural Information Processing Systems (NeurIPS), 2017. 2
[57] N. Ravi, J. Reizenstein, D. Novotny, T. Gordon, W.-Y. Lo, J. Johnson, and G. Gkioxari.
Accelerating 3d deep learning with pytorch3d. arXiv:2007.08501, 2020. 19
[58] C. Reiser, S. Peng, Y. Liao, and A. Geiger. Kilonerf: Speeding up neural radiance fields with
thousands of tiny mlps. In Proc. of the IEEE International Conf. on Computer Vision (ICCV),
2021. 1
[59] E. Remelli, A. Lukoianov, S. R. Richter, B. Guillard, T. Bagautdinov, P. Baque, and P. Fua.
Meshsdf: Differentiable iso-surface extraction. In Advances in Neural Information Processing
Systems (NeurIPS), 2020. 5
[60] V. Sitzmann, J. N. Martel, A. W. Bergman, D. B. Lindell, and G. Wetzstein. Implicit neural
representations with periodic activation functions. In Advances in Neural Information Processing
Systems (NeurIPS), 2020. 3
[61] V. Sitzmann, J. Thies, F. Heide, M. Nießner, G. Wetzstein, and M. Zollhöfer. Deepvoxels:
Learning persistent 3d feature embeddings. In Proc. IEEE Conf. on Computer Vision and
Pattern Recognition (CVPR), 2019. 1
[62] M. Tancik, P. Srinivasan, B. Mildenhall, S. Fridovich-Keil, N. Raghavan, U. Singhal, R. Ra-
mamoorthi, J. Barron, and R. Ng. Fourier features let networks learn high frequency functions
in low dimensional domains. In Advances in Neural Information Processing Systems (NeurIPS),
2020. 3
12
[63] S. Van der Walt, J. L. Schönberger, J. Nunez-Iglesias, F. Boulogne, J. D. Warner, N. Yager,
E. Gouillart, and T. Yu. scikit-image: image processing in python. PeerJ, 2014. 9
[64] N. Wang, Y. Zhang, Z. Li, Y. Fu, W. Liu, and Y.-G. Jiang. Pixel2mesh: Generating 3d mesh
models from single rgb images. In Proc. of the European Conf. on Computer Vision (ECCV),
2018. 2
[65] S. Wang, M. Mihajlovic, Q. Ma, A. Geiger, and S. Tang. Metaavatar: Learning animatable
clothed human models from few depth images. In Advances in Neural Information Processing
Systems (NeurIPS), 2021. 3
[66] W. Wang, Q. Xu, D. Ceylan, R. Mech, and U. Neumann. Disn: Deep implicit surface network
for high-quality single-view 3d reconstruction. In Advances in Neural Information Processing
Systems (NeurIPS), 2019. 3
[67] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon. Dynamic graph
cnn for learning on point clouds. ACM Trans. on Graphics, 2019. 2
[68] F. Williams, T. Schneider, C. Silva, D. Zorin, J. Bruna, and D. Panozzo. Deep geometric prior
for surface reconstruction. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition
(CVPR), 2019. 2, 3, 6, 7, 15, 18
[69] J. Wu, C. Zhang, T. Xue, B. Freeman, and J. Tenenbaum. Learning a probabilistic latent space
of object shapes via 3d generative-adversarial modeling. In Advances in Neural Information
Processing Systems (NeurIPS), 2016. 3
[70] G. Yang, X. Huang, Z. Hao, M. Liu, S. J. Belongie, and B. Hariharan. Pointflow: 3d point cloud
generation with continuous normalizing flows. In Proc. of the IEEE International Conf. on
Computer Vision (ICCV), 2019. 2
[71] Y. Yang, C. Feng, Y. Shen, and D. Tian. Foldingnet: Point cloud auto-encoder via deep grid
deformation. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2018. 2
[72] Z. Yang, Y. Chai, D. Anguelov, Y. Zhou, P. Sun, D. Erhan, S. Rafferty, and H. Kretzschmar.
Surfelgan: Synthesizing realistic sensor data for autonomous driving. In Proc. IEEE Conf. on
Computer Vision and Pattern Recognition (CVPR), 2020. 2
[73] L. Yariv, Y. Kasten, D. Moran, M. Galun, M. Atzmon, B. Ronen, and Y. Lipman. Multiview
neural surface reconstruction by disentangling geometry and appearance. In Advances in Neural
Information Processing Systems (NeurIPS), 2020. 1, 19
[74] Q. Zhou and A. Jacobson. Thingi10k: A dataset of 10,000 3d-printing models. arXiv preprint
arXiv:1605.04797, 2016. 6, 7, 15, 18
[75] Q.-Y. Zhou, J. Park, and V. Koltun. Open3D: A modern library for 3D data processing.
arXiv:1801.09847, 2018. 6
13
Supplementary Material for
Shape As Points: A Differentiable Poisson Solver
In this supplementary document, we first provide derivation details for our Differentiable Poisson
Solver in Section A. In Section B, we provide implementation details for our optimization-based and
learning-based methods. Next, we supply discussions on the easy initialization property of our Shape-
As-Points representation in Section C. Additional results and ablations for the optimization-based
and learning-based reconstruction can be found in Section D and Section E, respectively.
14
Where g̃σ,r (u) is a Gaussian smoothing kernel of bandwidth σ for grid resolution of r in the spectral
domain to mitigate the ringing effects as a result of the Gibbs phenomenon from rasterizing the
point normals. Please refer to Section D.3 for a more in-depth discussion to motivate the use of the
smoothing parameter, as well as related ablation studies on our parameter choice for σ.
The unnormalized indicator function in the physical domain χ0 can be obtained via inverse Fourier
Transform:
χ0 = IFFT(χ̃) (17)
We further normalize the indicator field to incorporate the boundary condition that the indicator field
is valued at zero at point locations and valued ±0.5 inside and outside the shapes.
m 1 X 0
χ= 0
χ0 − χ |x=c (18)
abs(χ |x=0 ) |{c}|
| {z } c∈{c}
scale | {z }
subtract by mean
B Implementation Details
In this section, we provide implementation details for baselines and our method for both settings,
optimization-based and the learning-based reconstruction.
6
https://github.com/amosgropp/IGR
7
https://github.com/ranahanocka/point2mesh
15
Thingi10K
SRB
D-FAUST
Figure 4: Optimization-based 3D Reconstruction. Here we show all the other 11 out of 15 objects
used for comparison (We have already shown 4 objects in Fig. 2 in main paper). Input point clouds
are downsampled for visualization. Note that the ground truth of SRB is provided as point clouds.
16
Time 0s 50 s 100 s 150 s 300 s
Sphere
Figure 5: Different Geometric Initialization under Optimization Setting. We compare the recon-
structions of SAP initialized from a sphere and the coarse geometry. The number below each image
indicates the Chamfer Distance to GT mesh.
Table 5: Training Progress. We show the Chamfer distance at different training iterations evaluated
in the Shapenet test set with 3K input points ((noise level=0.005). Our method uses geometric
initialization and converges much faster than ConvONet.
Learning-based 3D reconstruction: For AtlasNet [20], we use the official implementation8 with
25 parameterizations. We change the number of input points from 2500 (default) to 3000 for our
setting. Depending on the experiment, we adde different noise levels or outlier points (see Section
4.2 in main paper). We train ConvONet [54], PSGN [17], and 3D-R2N2 [11] for at least 300000
iterations, and use Adam optimizer [32] with a learning rate of 10−4 for all methods.
We train our method as well as Ours (w/o LDPSR ) for all 3 noise levels for 300000 iterations (roughly
2 days with 2 GTX 1080Ti GPUs) and use Adam optimizer with a learning rate of 5 × 10−4 . We
consider a batch size of 32. To generate the ground truth PSR indicator field χ in Eq. (9) of the main
paper, first we sample 100000 points and the corresponding point normals from the ground truth
mesh, and input to our DPSR at a grid resolution of 1283 .
17
Dataset Method Chamfer-L1 (↓) F-Score (↑) Normal C. (↑)
Ours (w/o resampling) 0.061 0.897 0.902
Thingi10K
Ours 0.053 0.941 0.947
Ours (w/o resampling) 0.077 0.813 –
DGP
Ours 0.067 0.848 –
Ours (w/o resampling) 0.044 0.964 0.952
D-FAUST
Ours 0.043 0.965 0.959
Table 6: Ablation Study of Resampling Strategy. On all datasets, our resampling strategy leads to
improved results. For D-FAUST, the increase is the lowest because the supervision point clouds are
noise free. Note that normal consistency cannot be evaluated on SRB as this dataset provides only
unoriented point clouds.
Thingi10K
SRB
Figure 6: Ablation Study of Resampling Strategy. We show the optimized point cloud and the
reconstructed mesh without and with the resampling strategy. Using the point resampling strategy
leads to a more uniformly distributed point cloud and better shape reconstruction.
initialization. In Table 5, we show the Chamfer distance at different training iterations evaluated in
the ShapeNet test set with 3K input points (noise level=0.005). As can be seen, our method enables
much faster training convergence than the neural-implicit method ConvONet [54], which illustrates
the effectiveness of the geometric initialization directly from input point clouds.
D Optimization-based 3D Reconstruction
18
D.3 Analysis on the Gaussian Smoothing Parameter σ
Why we need a Gaussian: The Gaussian serves as a regularizer to the smoothness of the solved
implicit function. Not using a Gaussian is equivalent to using a Gaussian kernel with σ = 0. Fig. 7
below motivates the use of our sigma parameter.
Why use a Gaussian in the spectral domain: First, the FFT of a Gaussian remains a Gaussian.
Second, convolution of a Gaussian in the physical domain is equivalent to a dot product with a
Gaussian in the spectral domain and a dot product in the spectral domain is more efficient than
convolution in the physical domain: O(N log N ) vs O(N 2 ), where n is the resolution of a regular
grid and N = n3 .
Ablation study for the Gaussian smoothing parameter σ: We study the effect of the Gaussian
smoothing parameter σ at a resolution of 2563 . As visualized in Fig. 7, we can obtain faithful
reconstructions given different σ values. Nevertheless, we can notice that lower σ can preserve details
better but also is prone to noise, while high σ results in smooth shapes but can also lead to losing
of details. In practice, σ can be chosen according to the noise level of the target point cloud. In the
results depicted in Fig. 4 and Table 2 in main paper, we choose σ = 3 for SRB and D-FAUST dataset
and σ = 5 for Thingi10K dataset.
Thingi10K
D-FAUST
Figure 7: Ablation Study of the Gaussian Smoothing Parameter σ. Low σ preserves details better
but is prone to noise, while high σ results in smooth shapes but also lead to detail losing.
19
Synthetic DTU [25]
Figure 8: SAP Multi-view Reconstruction with Differentiable Rendering. We show the recon-
struction results on a synthetic dataset and a real-world dataset [25].
ray marching step. In contrast, in each iteration our method outputs an explicit mesh. Therefore, no
expensive ray marching is required to find the surface, leading to faster inference.
E Learning-based 3D Reconstruction
E.1 Visualization of How SAP Handles Noise and Outliers
In this section, we visualize how our trained models handle noise and outliers during inference.
Noise Handling: We can see from the top row of Fig. 9 that, compared to the input point cloud, the
updated SAP points are densified because we predict k = 7 offsets per input point. More importantly,
all SAP points are located roughly on the surface, which leads to enhanced reconstruction quality.
Outlier Handling: We also visualize how SAP handles outlier points at the bottom row of Fig. 9.
The arrows’ length represents the magnitude of the predicted normals. There are two interesting
observations: a) A large amount of outlier points in the input are moved near to the surface. b) Some
outlier points still remain outliers. For these points, the network learns to predict normals with a very
small magnitude/norm as shown in the zoom-in view (we do not normalize the point normals to unit
length). In this way, those outlier points are “muted” when being passed to the DPSR layer such that
they do not contribute to the final reconstruction.
Figure 9: Visualization of SAP Handling Noise and Outliers. The length of arrows represents the
magnitude of normals. SAP point clouds are downsampled for better visualization.
20
Chamfer-L1 F-Score Normal Consistency
SPSR PSGN 3D-R2N2 AtlasNet ConvONet Ours Ours SPSR PSGN 3D-R2N2 AtlasNet ConvONet Ours Ours SPSR PSGN 3D-R2N2 AtlasNet ConvONet Ours Ours
category (w/o LDPSR ) (w/o LDPSR ) (w/o LDPSR )
airplane 0.437 0.102 0.151 0.064 0.034 0.040 0.027 0.551 0.476 0.382 0.827 0.965 0.940 0.981 0.747 - 0.669 0.854 0.931 0.919 0.931
bench 0.544 0.129 0.153 0.073 0.035 0.041 0.032 0.430 0.266 0.431 0.786 0.965 0.949 0.979 0.649 - 0.691 0.820 0.921 0.915 0.920
cabinet 0.154 0.164 0.167 0.112 0.047 0.044 0.037 0.728 0.137 0.412 0.603 0.955 0.952 0.975 0.835 - 0.786 0.875 0.956 0.951 0.957
car 0.180 0.132 0.197 0.099 0.075 0.061 0.045 0.729 0.211 0.348 0.642 0.849 0.875 0.928 0.783 - 0.719 0.827 0.893 0.886 0.897
chair 0.369 0.168 0.181 0.114 0.046 0.047 0.036 0.473 0.152 0.393 0.629 0.939 0.939 0.979 0.715 - 0.673 0.829 0.943 0.940 0.952
display 0.280 0.160 0.170 0.089 0.036 0.036 0.030 0.544 0.175 0.401 0.727 0.971 0.975 0.990 0.749 - 0.747 0.905 0.968 0.967 0.972
lamp 0.278 0.207 0.243 0.137 0.059 0.069 0.047 0.586 0.204 0.333 0.562 0.892 0.897 0.959 0.765 - 0.598 0.759 0.900 0.899 0.921
loudspeaker 0.148 0.205 0.199 0.142 0.063 0.058 0.041 0.731 0.107 0.405 0.516 0.892 0.900 0.957 0.843 - 0.735 0.867 0.938 0.935 0.950
rifle 0.409 0.091 0.147 0.051 0.028 0.027 0.023 0.590 0.615 0.381 0.877 0.980 0.982 0.990 0.788 - 0.700 0.837 0.929 0.935 0.937
sofa 0.227 0.144 0.160 0.091 0.041 0.039 0.032 0.712 0.184 0.427 0.717 0.953 0.960 0.982 0.826 - 0.754 0.888 0.958 0.957 0.963
table 0.393 0.166 0.177 0.102 0.038 0.043 0.033 0.442 0.158 0.404 0.692 0.967 0.958 0.986 0.706 - 0.734 0.867 0.959 0.954 0.962
telephone 0.281 0.110 0.130 0.054 0.027 0.026 0.023 0.674 0.317 0.484 0.867 0.989 0.992 0.997 0.805 - 0.847 0.957 0.983 0.982 0.984
vessel 0.181 0.130 0.169 0.078 0.043 0.043 0.030 0.771 0.363 0.394 0.757 0.931 0.930 0.974 0.820 - 0.641 0.837 0.918 0.917 0.930
mean 0.299 0.147 0.173 0.093 0.044 0.044 0.034 0.612 0.259 0.400 0.708 0.942 0.942 0.975 0.772 - 0.715 0.855 0.938 0.935 0.944
airplane 0.716 0.107 0.147 0.103 0.052 0.059 0.045 0.268 0.457 0.413 0.558 0.883 0.857 0.915 0.550 - 0.665 0.787 0.904 0.897 0.905
bench 0.661 0.133 0.154 0.101 0.056 0.060 0.050 0.296 0.255 0.446 0.587 0.872 0.862 0.905 0.551 - 0.683 0.797 0.887 0.879 0.885
cabinet 0.323 0.166 0.165 0.118 0.065 0.067 0.051 0.383 0.138 0.435 0.554 0.883 0.863 0.920 0.671 - 0.784 0.855 0.937 0.927 0.938
car 0.338 0.137 0.188 0.115 0.104 0.091 0.071 0.415 0.200 0.388 0.528 0.739 0.749 0.817 0.632 - 0.714 0.792 0.875 0.862 0.871
chair 0.524 0.176 0.191 0.126 0.071 0.073 0.058 0.263 0.141 0.387 0.527 0.818 0.799 0.882 0.585 - 0.666 0.811 0.915 0.905 0.920
display 0.409 0.166 0.167 0.111 0.057 0.057 0.047 0.321 0.164 0.431 0.554 0.889 0.881 0.925 0.600 - 0.743 0.884 0.946 0.944 0.951
lamp 0.457 0.210 0.261 0.146 0.090 0.101 0.076 0.319 0.195 0.329 0.455 0.754 0.734 0.841 0.617 - 0.588 0.737 0.866 0.859 0.881
loudspeaker 0.320 0.205 0.203 0.144 0.090 0.092 0.065 0.369 0.109 0.407 0.471 0.793 0.778 0.853 0.675 - 0.734 0.850 0.920 0.910 0.925
rifle 0.848 0.097 0.144 0.119 0.047 0.044 0.042 0.218 0.575 0.403 0.439 0.905 0.917 0.928 0.541 - 0.691 0.746 0.888 0.895 0.896
sofa 0.452 0.152 0.153 0.109 0.065 0.061 0.051 0.337 0.166 0.457 0.572 0.857 0.865 0.908 0.631 - 0.744 0.860 0.936 0.934 0.941
table 0.514 0.169 0.177 0.115 0.057 0.061 0.049 0.293 0.158 0.431 0.564 0.885 0.869 0.923 0.597 - 0.729 0.847 0.936 0.929 0.940
telephone 0.521 0.112 0.128 0.105 0.038 0.038 0.033 0.329 0.311 0.508 0.520 0.959 0.964 0.976 0.591 - 0.847 0.917 0.975 0.975 0.976
vessel 0.403 0.135 0.173 0.111 0.073 0.072 0.058 0.399 0.341 0.397 0.527 0.796 0.794 0.856 0.612 - 0.639 0.788 0.881 0.875 0.886
mean 0.499 0.151 0.173 0.117 0.066 0.067 0.054 0.324 0.247 0.418 0.527 0.849 0.841 0.896 0.604 - 0.710 0.821 0.913 0.907 0.917
airplane 1.573 0.745 0.164 2.113 0.041 0.213 0.031 0.093 0.011 0.405 0.027 0.938 0.667 0.970 0.621 - 0.650 0.561 0.920 0.864 0.923
bench 1.499 0.573 0.166 1.856 0.041 0.076 0.036 0.126 0.007 0.431 0.053 0.945 0.829 0.965 0.575 - 0.695 0.630 0.910 0.876 0.909
cabinet 1.060 0.712 0.175 1.472 0.052 0.065 0.042 0.248 0.004 0.399 0.083 0.938 0.868 0.960 0.659 - 0.778 0.639 0.950 0.925 0.950
car 1.262 0.536 0.200 1.844 0.087 0.092 0.057 0.177 0.009 0.351 0.059 0.812 0.764 0.893 0.634 - 0.711 0.620 0.884 0.863 0.888
chair 0.984 0.689 0.228 1.478 0.055 0.087 0.041 0.186 0.005 0.362 0.076 0.903 0.783 0.959 0.628 - 0.672 0.644 0.930 0.898 0.941
display 1.312 0.965 0.201 1.685 0.041 0.060 0.034 0.188 0.004 0.374 0.062 0.956 0.878 0.980 0.627 - 0.747 0.593 0.962 0.945 0.967
lamp 1.402 0.958 0.399 2.080 0.073 0.110 0.047 0.123 0.004 0.283 0.037 0.838 0.699 0.941 0.630 - 0.587 0.569 0.885 0.844 0.910
loudspeaker 0.930 0.905 0.224 1.392 0.075 0.093 0.050 0.264 0.003 0.376 0.093 0.861 0.792 0.927 0.673 - 0.732 0.652 0.930 0.904 0.940
rifle 1.689 0.479 0.163 2.442 0.037 0.055 0.026 0.066 0.022 0.386 0.012 0.953 0.888 0.985 0.627 - 0.679 0.519 0.916 0.907 0.929
sofa 1.267 0.607 0.172 1.656 0.047 0.064 0.037 0.211 0.006 0.412 0.080 0.934 0.866 0.967 0.655 - 0.756 0.666 0.950 0.933 0.956
table 1.159 0.913 0.202 1.581 0.044 0.082 0.037 0.166 0.004 0.405 0.081 0.950 0.810 0.972 0.618 - 0.737 0.672 0.951 0.915 0.954
telephone 1.458 0.851 0.146 1.890 0.030 0.036 0.025 0.173 0.005 0.461 0.047 0.983 0.971 0.994 0.668 - 0.839 0.589 0.980 0.975 0.982
vessel 1.530 0.639 0.189 2.200 0.054 0.072 0.036 0.108 0.009 0.387 0.032 0.890 0.814 0.956 0.652 - 0.635 0.568 0.907 0.886 0.921
mean 1.317 0.736 0.202 1.822 0.052 0.085 0.038 0.164 0.007 0.387 0.057 0.916 0.818 0.959 0.636 - 0.709 0.609 0.929 0.903 0.936
9
https://github.com/ErlerPhilipp/points2surf
21
High Noise
Outliers
Input SPSR [31] 3D-R2N2 [11] AtlasNet [20] ConvONet [54] Ours GT mesh
Figure 10: 3D Reconstruction from Point Clouds on ShapeNet. Comparison of SAP to baselines
on 2 different setups. Compared to SPSR, our method can robustly estimate normals under noise and
selectively ignore outliers. Compared to 3D-R2N2 and AtlasNet, SAP has greater representational
expressivity that allows it to reconstruct fine geometric details. Compared to ConvONet, SAP
produces a higher quality reconstruction at a fraction of its inference time.
Table 8: Quantitative Comparison to Points2Surf [16]. We train and test on the ShapeNet ‘lamp’
class. Our model is trained only for 30K iterations but outperforms Points2Surf in all metrics with
much shorter inference time.
22