0% found this document useful (0 votes)
8 views

Shape As Points A Differentiable Poisson Solver

Uploaded by

hengyuanm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Shape As Points A Differentiable Poisson Solver

Uploaded by

hengyuanm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Shape As Points: A Differentiable Poisson Solver

Songyou Peng1,2 Chiyu “Max” Jiang∗ † Yiyi Liao2,3† Michael Niemeyer2,3


Marc Pollefeys1,4 Andreas Geiger2,3
1 2
ETH Zurich Max Planck Institute for Intelligent Systems, Tübingen
3 4
University of Tübingen Microsoft
arXiv:2106.03452v2 [cs.CV] 8 Nov 2021

Abstract
In recent years, neural implicit representations gained popularity in 3D reconstruc-
tion due to their expressiveness and flexibility. However, the implicit nature of
neural implicit representations results in slow inference time and requires care-
ful initialization. In this paper, we revisit the classic yet ubiquitous point cloud
representation and introduce a differentiable point-to-mesh layer using a differen-
tiable formulation of Poisson Surface Reconstruction (PSR) that allows for a GPU-
accelerated fast solution of the indicator function given an oriented point cloud.
The differentiable PSR layer allows us to efficiently and differentiably bridge the
explicit 3D point representation with the 3D mesh via the implicit indicator field,
enabling end-to-end optimization of surface reconstruction metrics such as Cham-
fer distance. This duality between points and meshes hence allows us to represent
shapes as oriented point clouds, which are explicit, lightweight and expressive.
Compared to neural implicit representations, our Shape-As-Points (SAP) model
is more interpretable, lightweight, and accelerates inference time by one order of
magnitude. Compared to other explicit representations such as points, patches,
and meshes, SAP produces topology-agnostic, watertight manifold surfaces. We
demonstrate the effectiveness of SAP on the task of surface reconstruction from
unoriented point clouds and learning-based reconstruction.

1 Introduction
Shape representations are central to many of the recent advancements in 3D computer vision and
computer graphics, ranging from neural rendering [43,47,50,58,61] to shape reconstruction [10,28,42,
49, 52, 54, 73]. While conventional representations such as point clouds and meshes are efficient and
well-studied, they also suffer from several limitations: Point clouds are lightweight and easy to obtain,
but do not directly encode surface information. Meshes, on the other hand, are usually restricted to
fixed topologies. More recently, neural implicit representations [10, 42, 52] have shown promising
results for representing geometry due to their flexibility in encoding varied topologies, and their
easy integration with differentiable frameworks. However, as such representations implicitly encode
surface information, extracting the underlying surface is typically slow as they require numerous
network evaluations in 3D space for extracting complete surfaces using marching cubes [10, 42, 52],
or along rays for intersection detection in the context of volumetric rendering [47, 49, 51, 73].
In this work, we introduce a novel Poisson solver which performs fast GPU-accelerated Differentiable
Poisson Surface Reconstruction (DPSR) and solves for an indicator function from an oriented point
cloud in a few milliseconds. Thanks to the differentiablility of our Poisson solver, gradients from a
loss on the output mesh or a loss on the intermediate indicator grid can be efficiently backpropagated
to update the oriented point cloud representation. This differential bridge between points, indicator

Work done while at UC Berkeley.

Corresponding authors.

35th Conference on Neural Information Processing Systems (NeurIPS 2021).


Points Voxels Meshes Patches Implicits SAP GT
Representations [17] [11] [64] [20] [42] (Ours)

Efficiency Grid Eval Time (1283 ) n/a n/a n/a n/a 0.33s 0.012s
Priors Easy Initialization 4 4 4 8 8 4
Watertight 8 4 4 8 4 4
Quality No Self-intersection n/a n/a 8 8 4 4
Topology-Agnostic 4 4 8 4 4 4

Table 1: Overview of Different Shape Representations. Shape-As-Points produces higher quality


geometry compared to other explicit representations [11, 17, 20, 64] and requires significantly less
inference time for extracting geometry compared to neural implicit representations [42].

functions, and meshes allows us to represent shapes as oriented point clouds. We therefore call this
shape representation Shape-As-Points (SAP). Compared to existing shape representations, Shape-As-
Points has the following advantages (see also Table 1):
Efficiency: SAP has a low memory footprint as it only requires storing a collection of oriented
point samples at the surface, rather than volumetric quantities (voxels) or a large number of network
parameters for neural implicit representations. Using spectral methods, the indicator field can be
computed efficiently (12 ms at 1283 resolution3 ), compared to the typical rather slow query time
of neural implicit networks (330 ms using [42] at the same resolution). Accuracy: The resulting
mesh can be generated at high resolutions, is guaranteed to be watertight, free from self-intersections
and also topology-agnostic. Initialization: It is easy to initialize SAP with a given geometry such
as template shapes or noisy observations. In contrast, neural implicit representations are harder to
initialize, except for few simple primitives like spheres [1]. See supplementary for more discussions.
To investigate the aforementioned properties, we perform a set of controlled experiments. Moreover,
we demonstrate state-of-the-art performance in reconstructing surface geometry from unoriented
point clouds in two settings: an optimization-based setting that does not require training and is
applicable to a wide range of shapes, and a learning-based setting for conditional shape reconstruction
that is robust to noisy point clouds and outliers. In summary, the main contributions of this work are:

• We present Shape-As-Points, a novel shape representation that is interpretable, lightweight, and


yields high-quality watertight meshes at low inference times.
• The core of the Shape-As-Points representation is a versatile, differentiable and generalizable
Poisson solver that can be used for a range of applications.
• We study various properties inherent to the Shape-As-Points representation, including inference
time, sensitivity to initialization and topology-agnostic representation capacity.
• We demonstrate state-of-the-art reconstruction results from noisy unoriented point clouds at a
significantly reduced computational budget compared to existing methods.

Code is available at https://github.com/autonomousvision/shape_as_points.

2 Related Work
2.1 3D Shape Representations
3D shape representations are central to 3D computer vision and graphics. Shape representations
can be generally categorized as being either explicit or implicit. Explicit shape representations
and learning algorithms depending on such representations directly parameterize the surface of the
geometry, either as a point cloud [17, 40, 55, 56, 67, 70], parameterized mesh [22, 26, 29, 64] or
surface patches [2, 20, 39, 45, 68, 71, 72]. Explicit representations are usually lightweight and require
few parameters to represent the geometry, but suffer from discretization, the difficulty to represent

On average, our method requires 12 ms for computing a 1283 indicator grid from 15K points on a single
3

NVIDIA GTX 1080Ti GPU. Computing a 2563 indicator grid requires 140 ms.

2
watertight surfaces (point clouds, surface patches), or are restricted to a pre-defined topology (mesh).
Implicit representations, in contrast, represent the shape as a level set of a continuous function
over a discretized voxel grid [14, 27, 35, 69] or more recently parameterized as a neural network,
typically referred to as neural implicit functions [10, 42, 52]. Neural implicit representations have
been successfully used to represent geometries of objects [10, 16, 18, 37, 42, 46, 49, 52, 60, 62, 65, 66]
and scenes [8, 28, 36, 48, 54, 60]. Additionally, neural implicit functions are able to represent radiance
fields which allow for high-fidelity appearance and novel view synthesis [41,47]. However, extracting
surface geometry from implicit representations typically requires dense evaluation of multi-layer
perceptrons, either on a volumetric grid or along rays, resulting in slow inference time. In contrast,
SAP efficiently solves the Poisson Equation during inference by representing the shape as an oriented
point cloud.
2.2 Optimization-based 3D Reconstruction from Point Clouds
Several works have addressed the problem of inferring continuous surfaces from a point cloud.
They tackle this task by utilizing basis functions, set properties of the points, or neural networks.
Early works in shape reconstruction from point clouds utilize the convex hull or alpha shapes for
reconstruction [15]. The ball pivoting algorithm [5] leverages the continuity property of spherical balls
of a given radius. One of the most popular techniques, Poisson Surface Reconstruction (PSR) [30,31],
solves the Poisson Equation and inherits smoothness properties from the basis functions used in
the Poisson Equation. However, PSR is sensitive to the normals of the input points which must be
inferred using a separate preprocessing step. In contrast, our method does not require any normal
estimation and is thus more robust to noise. More recent works take advantage of the continuous
nature of neural networks as function approximators to fit surfaces to point sets [19, 23, 44, 68].
However, these methods tend to be memory and computationally intensive, while our method yields
high-quality watertight meshes in a few milliseconds.
2.3 Learning-based 3D Reconstruction from Point Clouds
Learning-based approaches exploit a training set of 3D shapes to infer the parameters of a recon-
struction model. Some approaches focus on local data priors [2, 28] which typically result in better
generalization, but suffer when large surfaces must be completed. Other approaches learn object-
level [35, 42, 52] or scene-level priors [12, 13, 28, 54]. Most reconstruction approaches directly recon-
struct a meshed surface geometry, though some works [3, 4, 21, 33] first predict point set normals to
subsequently reconstruct the geometry via PSR [30, 31]. However, such methods fail to handle large
levels of noise, since they are unable to move points or selectively ignore outliers. In contrast, our
end-to-end approach is able to address this issue by either moving outlier points to the actual surface
or by selectively muting outliers either by forming paired point clusters that self-cancel or reducing
the magnitude of the predicted normals which controls their influence on the reconstruction.

3 Method
At the core of the Shape-As-Points representation is a differentiable Poisson solver, which can be used
for both optimization-based and learning-based surface estimation. We first introduce the Poisson
solver in Section 3.1. Next, we investigate two applications using our solver: optimization-based 3D
reconstruction (Section 3.2) and learning-based 3D reconstruction (Section 3.3).
3.1 Differentiable Poisson Solver
The key step in Poisson Surface Reconstruction [30, 31] involves solving the Poisson Equation. Let
x ∈ R3 denote a spatial coordinate and n ∈ R3 denote its corresponding normal. The Poisson
Equation arises from the insight that a set consisting of point coordinates and normals {p = (c, n)}
can be viewed as samples of the gradient of the underlying implicit indicator function χ(x) that
describesPthe solid geometry. We define the normal vector field as a superposition of pulse functions
v(x) = (ci ,ni )∈{p} δ(x − ci , ni ), where δ(x, n) = {n if x = 0 and 0 otherwise}. By applying
the divergence operator, the variational problem transforms into the standard Poisson equation:
∇2 χ := ∇ · ∇χ = ∇ · v (1)
In order to solve this set of linear Partial Differential Equations (PDEs), we discretize the function
values and differential operators. Without loss of generality, we assume that the normal vector field
v and the indicator function χ are sampled at r uniformly spaced locations along each dimension.
Denote the spatial dimensionality of the problem to be d. Without loss of generality, we consider

3
the three dimensional case where n := r × r × r for d = 3. We have the indicator function χ ∈ Rn ,
the point normal field v ∈ Rn×d , the gradient operator ∇ : Rn 7→ Rn×d , the divergence operator
(∇·) : Rn×d 7→ Rn , and the derived laplacian operator ∇2 := ∇ · ∇ : Rn 7→ Rn . Under such a
discretization scheme, solving for the indicator function amounts to solving the linear system by
inverting the divergence operator subject to boundary conditions of surface points having zero level
set. Following [30], we fix the overall scale to m = 0.5 at x = 0:
χ = (∇2 )−1 ∇ · v s.t. χ|x∈{c} = 0 and abs(χ|x=0 ) = m (2)

Point Rasterization: We obtain the uniformly discretized point normal field v by rasterizing the
point normals onto a uniformly sampled voxel grid. We can differentiably perform point rasterization
via inverse trilinear interpolation, similar to the approach in [30, 31]. We scatter the point normal
values to the voxel grid vertices, weighted by the trilinear interpolation weights. The point rasterization
process has O(n) space complexity, linear with respect to the number of grid cells, and O(N ) time
complexity, linear with respect to the number of points. See supplementary for details.

Spectral Methods for Solving PSR: In contrast to the finite-element approach taken in [30, 31], we
solve the PDEs using spectral methods [7]. While spectral methods are commonly used in scientific
computing for solving PDEs and in some cases applied to computer vision problems [34], we are the
first to apply them in the context of Poisson Surface Reconstruction. Unlike finite-element approaches
that depend on irregular data structures such as octrees or tetrahedral meshes for discritizing space,
spectral methods can be efficently solved over a uniform grid as they leverage highly optimized Fast
Fourier Transform (FFT) operations that are well supported for GPUs, TPUs, and mainstream deep
learning frameworks. Spectral methods decompose the original signal into a linear sum of functions
represented using the sine / cosine basis functions whose derivatives can be computed analytically.
This allows us to easily approximate differential operators in spectral space. We denote the spectral
domain signals with a tilde symbol, i.e., ṽ = FFT(v). We first solve for the unnormalized indicator
function χ0 , not accounting for boundary conditions
iu · ṽ  σ 2 kuk2 
χ0 = IFFT(χ̃) χ̃ = g̃σ,r (u) g̃σ,r (u) = exp − 2 (3)
−2πkuk2 r2
where the spectral frequencies are denoted as u := (u, v, w) ∈ Rn×d corresponding to the x, y, z
spatial dimensions, and IFFT(χ̃) represents the inverse fast Fourier transform of χ̃. g̃σ,r (u) is a
Gaussian smoothing kernel of bandwidth σ at grid resolution r in the spectral domain. The Gaussian
kernel is used to mitigate ringing effects as a result of the Gibbs phenomenon from rasterizing
the point normals. We denote the element-wise product as : Rn × Rn 7→ Rn , the L2-norm as
k · k2 : Rn×d 7→ Rn , and the dot product as (·) : Rn×d × Rn×d 7→ Rn . Finally, we subtract by the
mean of the indicator function at the point set and scale the indicator function to obtain the solution
to the PSR problem in Eqn. 2:
m 
0 1 X 0 
χ= χ − χ | x=c (4)
abs(χ0 |x=0 ) |{c}|
| {z } c∈{c}
scale | {z }
subtract by mean

A detailed derivation of our differentiable PSR solver is provided in the supplementary material.
3.2 SAP for Optimization-based 3D Reconstruction
We can use the proposed differentiable Poisson solver for various applications. First, we consider the
classical task of surface reconstruction from unoriented point clouds. The overall pipeline for this
setting is illustrated in Fig. 1 (top). We now provide details about each component.

Forward pass: It is natural to initialize the oriented 3D point cloud serving as 3D shape representa-
tion using the noisy 3D input points and corresponding (estimated) normals. However, to demonstrate
the flexibility and robustness of our model, we purposefully initialize our model using a generic 3D
sphere with radius r in our experiments. Given the orientated point cloud, we apply our Poisson solver
to obtain an indicator function grid, which can be converted to a mesh using Marching Cubes [38].

Backward pass: For every point pmesh sampled from the mesh M, we calculate a bi-directional L2
Chamfer Distance LCD with respect to the input point cloud. To backpropagate the loss LCD through

4
Shape-As-Points

Optimize
Points and Normals

DPSR Marching
Optimization-based Setting Cubes Target

Features DPSR Marching


Offsets Cubes Mesh Output
Normals
Noisy Input Optimize Parameters

PSR Sample

Indicator Function
Learning-based Setting Ground Truth

Figure 1: Model Overview. Top: Pipeline for optimization-based single object reconstruction. The
Chamfer loss on the target point cloud is backpropagated to the source point cloud w/ normals for
optimization. Bottom: Pipeline for learning-based surface reconstruction. Unlike the optimization-
based setting, here we provide supervision at the indicator grid level, since we assume access to wa-
tertight meshes for supervision, as is common practice in learning-based single object reconstruction.

pmesh to point p in our source oriented point cloud, we decompose the gradient using the chain rule:
∂LCD ∂LCD ∂pmesh ∂χ
= (5)
∂p ∂pmesh ∂χ ∂p
All terms in (5) are differentialable except for the middle one ∂p∂χ
mesh
which involves Marching Cubes.
However, this gradient can be effectively approximated by the inverse surface normal [59]:
∂pmesh
= −nmesh (6)
∂χ
where nmesh is the normal of the point pmesh . Different from MeshSDF [59] that uses the gradients to
update the latent code of a pretrained implicit shape representation, our method updates the source
point cloud using the proposed differentiable Poisson solver.

Resampling: To increase the robustness of the optimization process, we uniformly resample points
and normals from the largest mesh component every 200 iterations, and replace all points in the
original point clouds with the resampled ones. This resampling strategy eliminates outlier points that
drift away during the optimization, and enforces a more uniform distribution of points. We provide
an ablation study in supplementary.

Coarse-to-fine: To further decrease run-time, we consider a coarse-to-fine strategy during optimiza-


tion. More specifically, we start optimizing at an indicator grid resolution of 323 for 1000 iterations,
from which we obtain a coarse shape. Next, we sample from this coarse mesh and continue opti-
mization at a resolution of 643 for 1000 iterations. We repeat this process until we reach the target
resolution (2563 ) at which we acquire the final output mesh. See also supplementary.
3.3 SAP for Learning-based 3D Reconstruction
We now consider the learning-based 3D reconstruction setting in which we train a conditional model
that takes a noisy, unoriented point cloud as input and outputs a 3D shape. More specifically, we train
the model to predict a clean oriented point cloud, from which we obtain a watertight mesh using our
Poisson solver and Marching Cubes. We leverage the differentiability of our Poisson solver to learn
the parameters of this conditional model. Following common practice, we assume watertight meshes
as ground truth and consequently supervise directly with the ground truth indicator grid obtained

5
from these meshes. Fig. 1 (bottom) illustrates the pipeline of our architecture for the learning-based
surface reconstruction task.

Architecture: We first encode the unoriented input point cloud coordinates {c} into a feature φ.
The resulting feature should encapsulate both local and global information about the input point
cloud. We utilize the convolutional point encoder proposed in [54] for this purpose. Note that in the
following, we will use φθ (c) to denote the features at point c, dropping the dependency of φ on the
remaining points {c} for clarity. Also, we use θ to refer to network parameters in general.
Given their features, we aim to estimate both offsets and normals for every input point c in the point
cloud {c}. We use a shallow Multi-Layer Perceptron (MLP) fθ to predict the offset for c:
∆c = fθ (c, φθ (c)) (7)
where φ(c) is obtained from the feature volume using trilinear interpolation. We predict k offsets
per input point, where k ≥ 1. We add the offsets ∆c to the input point position c and call the
updated point position ĉ. Additional offsets allow us to densify the point cloud, leading to enhanced
reconstruction quality. We choose k = 7 for all learning-based reconstruction experiments (see
ablation study in Table 4). For each updated point ĉ, we use a second MLP gθ to predict its normal:
n̂ = gθ (ĉ, φθ (ĉ)) (8)
We use the same decoder architecture as in [54] for both fθ and gθ . The network comprises 5 layers
of ResNet blocks with a hidden dimension of 32. These two networks fθ and gθ do not share weights.

Training and Inference: During training, we obtain the estimated indicator grid χ̂ from the
predicted point clouds (ĉ, n̂) using our differentiable Poisson solver. Since we assume watertight
and noise-free meshes for supervision, we acquire the ground truth indicator grid by running PSR
on a densely sampled point clouds of the ground truth meshes with the corresponding ground truth
normals. This avoids running Marching Cubes at every iteration and accelerates training. We use the
Mean Square Error (MSE) loss on the predicted and ground truth indicator grid:
LDPSR = kχ̂ − χk2 (9)

We implement all models in PyTorch [53] and use the Adam optimizer [32] with a learning rate of
5e-4. During inference, we use our trained model to predict normals and offsets, use DPSR to solve
for the indicator grid, and run Marching Cubes [38] to extract meshes.

4 Experiments
Following the exposition in the previous section, we conduct two types of experiments to evaluate
our method. First, we perform single object reconstruction from unoriented point clouds. Next, we
apply our method to learning-based surface reconstruction on ShapeNet [9], using noisy point clouds
with or without outliers as inputs.

Datasets: We use the following datasets for optimization-based reconstruction: 1) Thingi10K [74],
2) Surface reconstruction benchmark (SRB) [68] and 3) D-FAUST [6]. Similar to prior works, we
use 5 objects per dataset [19, 23, 68]. For learning-based object-level reconstruction, we consider all
13 classes of the ShapeNet [9] subset, using the train/val/test split from [11].

Baselines: In the optimization-based reconstruction setting, we compare to network-based methods


IGR [19] and Point2Mesh [23], as well as Screened Poisson Surface Reconstruction4 (SPSR) [31] on
plane-fitted normals. To ensure that the predicted normals are consistently oriented for SPSR, we
propagate the normal orientation using the minimum spanning tree [75]. For learning-based surface
reconstruction, we compare against point-based Point Set Generation Networks (PSGN) [17], patch-
based AtlasNet [20], voxel-based 3D-R2N2 [11], and ConvONet [54], which has recently reported
state-of-the-art results on this task. We use ConvOnet in their best-performing setting (3-plane
encoders). SPSR is also used as a baseline. In addition, to evaluate the importance of our differentiable
PSR optimization, we design another point-based baseline. This baseline uses the same network
architecture to predict points and normals. However, instead of passing them to our Poisson solver and
4
We use the official implementation https://github.com/mkazhdan/PoissonRecon.

6
Thingi10K
Thingi10K
SRB
D-FAUST

Input IGR [19] Point2Mesh [23] SPSR [31] Ours GT mesh

Figure 2: Optimization-based 3D Reconstruction. Input point clouds are downsampled for visual-
ization. Note that the ground truth of SRB is provided as point clouds.

Dataset Method Chamfer-L1 (↓) F-Score (↑) Normal C. (↑) Time (s)
IGR [19] 0.440 0.505 0.692 1842.3
Point2Mesh [23] 0.109 0.656 0.806 3714.7
Thingi10K
SPSR [31] 0.223 0.787 0.896 9.3
Ours 0.054 0.940 0.947 370.1
IGR [19] 0.178 0.755 – 1847.6
Point2Mesh [23] 0.116 0.648 – 4707.9
SRB
SPSR [31] 0.232 0.735 – 9.2
Ours 0.076 0.830 – 326.0
IGR [19] 0.235 0.805 0.911 1857.2
Point2Mesh [23] 0.071 0.855 0.905 3678.7
D-FAUST
SPSR [31] 0.044 0.966 0.965 4.3
Ours 0.043 0.966 0.959 379.9

Table 2: Optimization-based 3D Reconstruction. Quantitative comparison on 3 datasets. Normal


Consistency cannot be evaluated on SRB as this dataset provides only unoriented point clouds.
Optimization time is evaluated on a single GTX 1080Ti GPU for IGR, Point2Mesh and our method.

calculate LDPSR on the indicator grid, we directly supervise the point positions with a bi-directional
Chamfer distance, and an L1 Loss on the normals as done in [39]. During inference, we also feed the
predicted points and normals to our PSR solver and run Marching Cubes to obtain meshes.

Metrics: We consider Chamfer Distance, Normal Consistency and F-Score with the default threshold
of 1% for evaluation, and also report optimization & inference time.
4.1 Optimization-based 3D Reconstruction
In this part, we investigate whether our method can be used for the single-object surface reconstruction
task from unoriented point clouds or scans. We consider three different types of 3D inputs: point
clouds sampled from synthetic meshes [74] with Gaussian noise, real-world scans [68], and high-
resolution raw scans of humans with comparably little noise [6].
Fig. 2 and Table 2 show that our method achieves superior performance compared to both classical
methods and network-based approaches. Note that the objects considered in this task are challenging

7
Low Noise
High Noise
Outliers

Input SPSR [31] 3D-R2N2 [11] AtlasNet [20] ConvONet [54] Ours GT mesh

Figure 3: 3D Reconstruction from Point Clouds on ShapeNet. Comparison of SAP to baselines


on 3 different setups. More results can be found in supplementary.

(a) Noise=0.005 (b) Noise=0.025 (c) Noise=0.005, Outliers=50%


Chamfer-L1 F-Score Normal C. Chamfer-L1 F-Score Normal C. Chamfer-L1 F-Score Normal C. Runtime
SPSR [31] 0.298 0.612 0.772 0.499 0.324 0.604 1.317 0.164 0.636 -
PSGN [17] 0.147 0.259 - 0.151 0.247 - 0.736 0.007 - 0.010 s
3D-R2N2 [11] 0.172 0.400 0.715 0.173 0.418 0.710 0.202 0.387 0.709 0.015 s
AtlasNet [20] 0.093 0.708 0.855 0.117 0.527 0.821 1.822 0.057 0.609 0.025 s
ConvONet [54] 0.044 0.942 0.938 0.066 0.849 0.913 0.052 0.916 0.929 0.327 s
Ours (w/o LDPSR ) 0.044 0.942 0.935 0.067 0.841 0.907 0.085 0.819 0.903 0.064 s
Ours 0.034 0.975 0.944 0.054 0.896 0.917 0.038 0.959 0.936 0.064 s

Table 3: 3D Reconstruction from Point Clouds on ShapeNet. Quantitative comparison between


our method and baselines on the ShapeNet dataset (mean over 13 classes).

due to their complex geometry, thin structures, noisy and incomplete observations. While some
of the baseline methods fail completely on these challenging objects, our method achieves robust
performance across all datasets.
In particular, Fig. 2 shows that IGR occasionally creates meshes in free space, as this is not penalized
by its optimization objective when point clouds are unoriented. Both, Point2Mesh and our method
alleviate this problem by optimizing for the Chamfer distance between the estimated mesh and the
input point clouds. However, Point2Mesh requires an initial mesh as input of which the topology
cannot be changed during optimization. Thus, it relies on SPSR to provide an initial mesh for objects
with genus larger than 0 and suffers from inaccurate initialization [23]. Furthermore, compared to
both IGR and Point2Mesh, our method converges faster.
While SPSR is even more efficient, it suffers from incorrect normal estimation on noisy input point
clouds, which is a non-trivial task on its own. In contrast, our method demonstrates more robust
behavior as we optimize points and normals guided by the Chamfer distance. Note that in this single
object reconstruction task, our method is not able to complete large unobserved regions (e.g., the
bottom of the person’s feet in Fig. 2 is unobserved and hence not completed). This limitation can be
addressed using learning-based object-level reconstruction as discussed next.
To analyze whether our proposed differentiable Poisson solver is also beneficial for learning-based
reconstruction, we evaluate our method on the single object reconstruction task using noise and outlier-
augmented point clouds from ShapeNet as input to our method. We investigate the performance for
three different noise levels: (a) Gaussian noise with zero mean and standard deviation 0.005, (b)
Gaussian noise with zero mean and standard deviation 0.025, (c) 50% points have the same noise as
in a) and the other 50% points are outliers uniformly sampled inside the unit cube.

8
Chamfer F-Score NormalC
128 3
2563 Offset 1× 0.041 0.952 0.928
Enc. Grid MC Total Enc. Grid MC Total Offset 3× 0.039 0.958 0.934
ConvONet 0.010 0.280 0.037 0.327 0.010 3.798 0.299 4.107 Offset 5× 0.039 0.957 0.934
Ours 0.013 0.012 0.039 0.064 0.019 0.140 0.374 0.533 Offset 7× 0.038 0.959 0.936
2D Enc. 0.043 0.939 0.928
3D Enc. 0.038 0.959 0.936

Table 4: Ablation Study. Left: Runtime breakdown (encoding, grid evaluation, marching cubes) for
ConvONet vs. ours in seconds. Right: Ablation over number of offsets and 2D vs. 3D encoders.

4.2 Learning-based Reconstruction on ShapeNet


Fig. 3 and Table 3 show our results. Compared to the baselines, our method achieves similar or better
results on all three metrics. The results show that, in comparison to directly using Chamfer loss on
point positions and L1 loss on point normals, our DPSR loss can produce better reconstructions in all
settings as it directly supervises the indicator grid which implicitly determines the surface through the
Poisson equation. SPSR fails when the noise level is high or when there are outliers in the input point
cloud. We achieve significantly better performances than other representations such as point clouds,
meshes, voxel grids and patches. Moreover, we find that our method is robust to strong outliers. We
refer to the supplementary for more detailed visualizations on how SAP handles outliers.
Table 3 also reports the runtime for setting (a) for all GPU-accelerated methods using a single NVIDIA
GTX 1080Ti GPU, averaged over all objects of the ShapeNet test set. The baselines [11, 17, 20]
demonstrate fast inference time but suffer in terms of reconstruction quality while the neural implicit
model [54] attains high quality reconstructions but suffers from slow inference. In contrast, our
method is able to produce competitive reconstruction results at reasonably fast inference time. In
addition, since ConvONet and our method share a similar reconstruction pipeline, we provide a more
detailed breakdown of the runtime at a resolution of 1283 and 2563 voxels in Table 4. We use the
default setup from ConvONet5 . As we can see from Table 4, the difference in terms of point encoding
and Marching Cubes is marginal, but we gain more than 20× speed-up over ConvONet in evaluating
the indicator grid. In total, we are roughly 5× and 8× faster regarding the total inference time at a
resolution of 1283 and 2563 voxels, respectively.
4.3 Ablation Study
In this section, we investigate different architecture choices in the context of learning-based recon-
struction. We conduct our ablation experiments on ShapeNet for the third setup (most challenging).

Number of Offsets: From Table 4 we notice that predicting more offsets per input point leads to
better performance. This can be explained by the fact that with more points near the object surface,
geometric details can be better preserved.

Point Cloud Encoder: Here we compare two different point encoder architectures proposed in [54]:
a 2D encoder using 3 canonical planes at a resolution of 642 pixels and a 3D encoder using a feature
volume with a resolution of 323 voxels. We find that the 3D encoder works best in this setting and
hypothesize that this is due to the representational alignment with the 3D indicator grid.

5 Conclusion
We introduce Shape-As-Points, a novel shape representation which is lightweight, interpretable and
produces watertight meshes efficiently. We demonstrate its effectiveness for the task of surface
reconstruction from unoriented point clouds in both optimization-based and learning-based settings.
Our method is currently limited to small scenes due to the cubic memory requirements with respect
to the indicator grid resolution. We believe that processing scenes in a sliding-window manner and
space-adaptive data structures (e.g., octrees) will enable extending our method to larger scenes. Point
cloud-based methods are broadly used in real-world applications ranging from household robots to
self-driving cars, and hence share the same societal opportunities and risks as other learning-based
3D reconstruction techniques.
5
To be consistent, we use the Marching Cubes implementation from [63] for both ConvONet and ours.

9
Acknowledgement: Andreas Geiger was supported by the ERC Starting Grant LEGO-3D (850533)
and the DFG EXC number 2064/1 - project number 390727645. The authors thank the Max Planck
ETH Center for Learning Systems (CLS) for supporting Songyou Peng and the International Max
Planck Research School for Intelligent Systems (IMPRS-IS) for supporting Michael Niemeyer. This
work was supported by an NVIDIA research gift. We thank Matthias Niessner, Thomas Funkhouser,
Hugues Hopp, Yue Wang for helpful discussions in early stages of this project. We also thank Xu
Chen, Christian Reiser, Rémi Pautrat for proofreading.

References
[1] M. Atzmon and Y. Lipman. Sal: Sign agnostic learning of shapes from raw data. In Proc. IEEE
Conf. on Computer Vision and Pattern Recognition (CVPR), 2020. 2, 17
[2] A. Badki, O. Gallo, J. Kautz, and P. Sen. Meshlet priors for 3d mesh reconstruction. In Proc.
IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020. 2, 3
[3] Y. Ben-Shabat and S. Gould. Deepfit: 3d surface fitting via neural network weighted least
squares. In Proc. of the European Conf. on Computer Vision (ECCV), 2020. 3
[4] Y. Ben-Shabat, M. Lindenbaum, and A. Fischer. Nesti-net: Normal estimation for unstructured
3d point clouds using convolutional neural networks. In Proc. IEEE Conf. on Computer Vision
and Pattern Recognition (CVPR), 2019. 3
[5] F. Bernardini, J. Mittleman, H. Rushmeier, C. Silva, and G. Taubin. The ball-pivoting algorithm
for surface reconstruction. IEEE Trans. on Visualization and Computer Graphics (VCG), 1999. 3
[6] F. Bogo, J. Romero, G. Pons-Moll, and M. J. Black. Dynamic FAUST: registering human bodies
in motion. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2017. 6,
7, 15, 18
[7] C. Canuto, M. Y. Hussaini, A. Quarteroni, and T. A. Zang. Spectral methods: fundamentals in
single domains. Springer Science & Business Media, 2007. 4, 14
[8] R. Chabra, J. E. Lenssen, E. Ilg, T. Schmidt, J. Straub, S. Lovegrove, and R. Newcombe. Deep
local shapes: Learning local sdf priors for detailed 3d reconstruction. In Proc. of the European
Conf. on Computer Vision (ECCV), 2020. 3
[9] A. X. Chang, T. A. Funkhouser, L. J. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese,
M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu. Shapenet: An information-rich 3d model
repository. arXiv.org, 1512.03012, 2015. 6
[10] Z. Chen and H. Zhang. Learning implicit fields for generative shape modeling. In Proc. IEEE
Conf. on Computer Vision and Pattern Recognition (CVPR), 2019. 1, 3
[11] C. B. Choy, D. Xu, J. Gwak, K. Chen, and S. Savarese. 3d-r2n2: A unified approach for single
and multi-view 3d object reconstruction. In Proc. of the European Conf. on Computer Vision
(ECCV), 2016. 2, 6, 8, 9, 17, 20, 22
[12] A. Dai, C. Diller, and M. Nießner. SG-NN: sparse generative neural networks for self-supervised
scene completion of RGB-D scans. In Proc. IEEE Conf. on Computer Vision and Pattern
Recognition (CVPR), 2020. 3
[13] A. Dai, C. Diller, and M. Nießner. Sg-nn: Sparse generative neural networks for self-supervised
scene completion of rgb-d scans. In Proc. IEEE Conf. on Computer Vision and Pattern
Recognition (CVPR), 2020. 3
[14] A. Dai, D. Ritchie, M. Bokeloh, S. Reed, J. Sturm, and M. Nießner. Scancomplete: Large-scale
scene completion and semantic segmentation for 3d scans. In Proc. IEEE Conf. on Computer
Vision and Pattern Recognition (CVPR), 2018. 3
[15] H. Edelsbrunner and E. P. Mücke. Three-dimensional alpha shapes. ACM Trans. on Graphics,
1994. 3
[16] P. Erler, P. Guerrero, S. Ohrhallinger, M. Wimmer, and N. J. Mitra. Points2surf: Learning
implicit surfaces from point cloud patches. In Proc. of the European Conf. on Computer Vision
(ECCV), 2020. 3, 21, 22
[17] H. Fan, H. Su, and L. J. Guibas. A point set generation network for 3d object reconstruction
from a single image. Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR),
2017. 2, 6, 8, 9, 17
[18] K. Genova, F. Cole, A. Sud, A. Sarna, and T. A. Funkhouser. Local deep implicit functions for
3d shape. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020. 3

10
[19] A. Gropp, L. Yariv, N. Haim, M. Atzmon, and Y. Lipman. Implicit geometric regularization for
learning shapes. In Proc. of the International Conf. on Machine learning (ICML), 2020. 3, 6, 7,
15, 16
[20] T. Groueix, M. Fisher, V. G. Kim, B. C. Russell, and M. Aubry. AtlasNet: A papier-mâché
approach to learning 3d surface generation. In Proc. IEEE Conf. on Computer Vision and
Pattern Recognition (CVPR), 2018. 2, 6, 8, 9, 17, 22
[21] P. Guerrero, Y. Kleiman, M. Ovsjanikov, and N. J. Mitra. Pcpnet learning local shape properties
from raw point clouds. In Computer Graphics Forum, 2018. 3
[22] K. Gupta and M. Chandraker. Neural mesh flow: 3d manifold mesh generation via diffeomorphic
flows. In Advances in Neural Information Processing Systems (NeurIPS), 2020. 2
[23] R. Hanocka, G. Metzer, R. Giryes, and D. Cohen-Or. Point2mesh: a self-prior for deformable
meshes. In ACM Trans. on Graphics, 2020. 3, 6, 7, 8, 15, 16
[24] J. Huang, H. Su, and L. J. Guibas. Robust watertight manifold surface generation method for
shapenet models. arXiv.org, 1802.01698, 2018. 15
[25] R. R. Jensen, A. L. Dahl, G. Vogiatzis, E. Tola, and H. Aanæs. Large scale multi-view stereopsis
evaluation. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2014.
19, 20
[26] C. Jiang, J. Huang, A. Tagliasacchi, and L. J. Guibas. Shapeflow: Learnable deformation flows
among 3d shapes. In Advances in Neural Information Processing Systems (NeurIPS), 2020. 2
[27] C. Jiang, P. Marcus, et al. Hierarchical detail enhancing mesh-based shape generation with 3d
generative adversarial network. arXiv preprint arXiv:1709.07581, 2017. 3
[28] C. Jiang, A. Sud, A. Makadia, J. Huang, M. Nießner, and T. Funkhouser. Local implicit grid
representations for 3d scenes. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition
(CVPR), 2020. 1, 3
[29] C. M. Jiang, J. Huang, K. Kashinath, Prabhat, P. Marcus, and M. Nießner. Spherical cnns on
unstructured grids. In Proc. of the International Conf. on Learning Representations (ICLR),
2019. 2
[30] M. M. Kazhdan, M. Bolitho, and H. Hoppe. Poisson surface reconstruction. In Proceedings of
the Fourth Eurographics Symposium on Geometry Processing, 2006. 3, 4
[31] M. M. Kazhdan and H. Hoppe. Screened poisson surface reconstruction. ACM Trans. on
Graphics, 32(3):29, 2013. 3, 4, 6, 7, 8, 16, 22
[32] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In Proc. of the
International Conf. on Machine learning (ICML), 2015. 6, 15, 17
[33] J. E. Lenssen, C. Osendorfer, and J. Masci. Deep iterative surface normal estimation. In Proc.
IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020. 3
[34] J. Li and A. O. Hero. A spectral method for solving elliptic equations for surface reconstruction
and 3d active contours. In Proc. IEEE International Conf. on Image Processing (ICIP), 2001. 4
[35] Y. Liao, S. Donne, and A. Geiger. Deep marching cubes: Learning explicit surface representa-
tions. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2018. 3
[36] S. Lionar, D. Emtsev, D. Svilarkovic, and S. Peng. Dynamic plane convolutional occupancy
networks. In Proc. of the IEEE Winter Conference on Applications of Computer Vision (WACV),
2021. 3
[37] S. Liu, Y. Zhang, S. Peng, B. Shi, M. Pollefeys, and Z. Cui. Dist: Rendering deep implicit
signed distance function with differentiable sphere tracing. In Proc. IEEE Conf. on Computer
Vision and Pattern Recognition (CVPR), 2020. 3
[38] W. E. Lorensen and H. E. Cline. Marching cubes: A high resolution 3d surface construction
algorithm. In ACM Trans. on Graphics, 1987. 4, 6
[39] Q. Ma, S. Saito, J. Yang, S. Tang, and M. J. Black. Scale: Modeling clothed humans with
a surface codec of articulated local elements. In Proc. IEEE Conf. on Computer Vision and
Pattern Recognition (CVPR), 2021. 2, 7
[40] Q. Ma, J. Yang, S. Tang, and M. J. Black. The power of points for modeling humans in clothing.
In Proc. of the IEEE International Conf. on Computer Vision (ICCV), 2021. 2
[41] R. Martin-Brualla, N. Radwan, M. S. M. Sajjadi, J. T. Barron, A. Dosovitskiy, and D. Duckworth.
NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections. In Proc. IEEE
Conf. on Computer Vision and Pattern Recognition (CVPR), 2021. 3
[42] L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, and A. Geiger. Occupancy networks:
Learning 3d reconstruction in function space. In Proc. IEEE Conf. on Computer Vision and
Pattern Recognition (CVPR), 2019. 1, 2, 3

11
[43] M. Meshry, D. B. Goldman, S. Khamis, H. Hoppe, R. Pandey, N. Snavely, and R. Martin-
Brualla. Neural rerendering in the wild. In Proc. IEEE Conf. on Computer Vision and Pattern
Recognition (CVPR), 2019. 1
[44] G. Metzer, R. Hanocka, D. Zorin, R. Giryes, D. Panozzo, and D. Cohen-Or. Orienting point
clouds with dipole propagation. ACM Trans. on Graphics, 2021. 3
[45] M. Mihajlovic, S. Weder, M. Pollefeys, and M. R. Oswald. Deepsurfels: Learning online
appearance fusion. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR),
2021. 2
[46] M. Mihajlovic, Y. Zhang, M. J. Black, and S. Tang. LEAP: Learning articulated occupancy of
people. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021. 3
[47] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng. NeRF:
Representing scenes as neural radiance fields for view synthesis. In Proc. of the European Conf.
on Computer Vision (ECCV), 2020. 1, 3
[48] M. Niemeyer and A. Geiger. Giraffe: Representing scenes as compositional generative neural
feature fields. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021. 3
[49] M. Niemeyer, L. Mescheder, M. Oechsle, and A. Geiger. Differentiable volumetric rendering:
Learning implicit 3d representations without 3d supervision. In Proc. IEEE Conf. on Computer
Vision and Pattern Recognition (CVPR), 2020. 1, 3, 19
[50] M. Oechsle, L. Mescheder, M. Niemeyer, T. Strauss, and A. Geiger. Texture fields: Learning
texture representations in function space. In Proc. of the IEEE International Conf. on Computer
Vision (ICCV), 2019. 1
[51] M. Oechsle, S. Peng, and A. Geiger. Unisurf: Unifying neural implicit surfaces and radiance
fields for multi-view reconstruction. In Proc. of the IEEE International Conf. on Computer
Vision (ICCV), 2021. 1, 19
[52] J. J. Park, P. Florence, J. Straub, R. A. Newcombe, and S. Lovegrove. Deepsdf: Learning
continuous signed distance functions for shape representation. In Proc. IEEE Conf. on Computer
Vision and Pattern Recognition (CVPR), 2019. 1, 3
[53] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin,
N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani,
S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. Pytorch: An imperative style,
high-performance deep learning library. In Advances in Neural Information Processing Systems
(NeurIPS), 2019. 6
[54] S. Peng, M. Niemeyer, L. Mescheder, M. Pollefeys, and A. Geiger. Convolutional occupancy
networks. In Proc. of the European Conf. on Computer Vision (ECCV), 2020. 1, 3, 6, 8, 9, 17,
18, 22
[55] C. R. Qi, H. Su, K. Mo, and L. J. Guibas. Pointnet: Deep learning on point sets for 3d classifi-
cation and segmentation. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition
(CVPR), 2017. 2
[56] C. R. Qi, L. Yi, H. Su, and L. J. Guibas. Pointnet++: Deep hierarchical feature learning on point
sets in a metric space. In Advances in Neural Information Processing Systems (NeurIPS), 2017. 2
[57] N. Ravi, J. Reizenstein, D. Novotny, T. Gordon, W.-Y. Lo, J. Johnson, and G. Gkioxari.
Accelerating 3d deep learning with pytorch3d. arXiv:2007.08501, 2020. 19
[58] C. Reiser, S. Peng, Y. Liao, and A. Geiger. Kilonerf: Speeding up neural radiance fields with
thousands of tiny mlps. In Proc. of the IEEE International Conf. on Computer Vision (ICCV),
2021. 1
[59] E. Remelli, A. Lukoianov, S. R. Richter, B. Guillard, T. Bagautdinov, P. Baque, and P. Fua.
Meshsdf: Differentiable iso-surface extraction. In Advances in Neural Information Processing
Systems (NeurIPS), 2020. 5
[60] V. Sitzmann, J. N. Martel, A. W. Bergman, D. B. Lindell, and G. Wetzstein. Implicit neural
representations with periodic activation functions. In Advances in Neural Information Processing
Systems (NeurIPS), 2020. 3
[61] V. Sitzmann, J. Thies, F. Heide, M. Nießner, G. Wetzstein, and M. Zollhöfer. Deepvoxels:
Learning persistent 3d feature embeddings. In Proc. IEEE Conf. on Computer Vision and
Pattern Recognition (CVPR), 2019. 1
[62] M. Tancik, P. Srinivasan, B. Mildenhall, S. Fridovich-Keil, N. Raghavan, U. Singhal, R. Ra-
mamoorthi, J. Barron, and R. Ng. Fourier features let networks learn high frequency functions
in low dimensional domains. In Advances in Neural Information Processing Systems (NeurIPS),
2020. 3

12
[63] S. Van der Walt, J. L. Schönberger, J. Nunez-Iglesias, F. Boulogne, J. D. Warner, N. Yager,
E. Gouillart, and T. Yu. scikit-image: image processing in python. PeerJ, 2014. 9
[64] N. Wang, Y. Zhang, Z. Li, Y. Fu, W. Liu, and Y.-G. Jiang. Pixel2mesh: Generating 3d mesh
models from single rgb images. In Proc. of the European Conf. on Computer Vision (ECCV),
2018. 2
[65] S. Wang, M. Mihajlovic, Q. Ma, A. Geiger, and S. Tang. Metaavatar: Learning animatable
clothed human models from few depth images. In Advances in Neural Information Processing
Systems (NeurIPS), 2021. 3
[66] W. Wang, Q. Xu, D. Ceylan, R. Mech, and U. Neumann. Disn: Deep implicit surface network
for high-quality single-view 3d reconstruction. In Advances in Neural Information Processing
Systems (NeurIPS), 2019. 3
[67] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon. Dynamic graph
cnn for learning on point clouds. ACM Trans. on Graphics, 2019. 2
[68] F. Williams, T. Schneider, C. Silva, D. Zorin, J. Bruna, and D. Panozzo. Deep geometric prior
for surface reconstruction. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition
(CVPR), 2019. 2, 3, 6, 7, 15, 18
[69] J. Wu, C. Zhang, T. Xue, B. Freeman, and J. Tenenbaum. Learning a probabilistic latent space
of object shapes via 3d generative-adversarial modeling. In Advances in Neural Information
Processing Systems (NeurIPS), 2016. 3
[70] G. Yang, X. Huang, Z. Hao, M. Liu, S. J. Belongie, and B. Hariharan. Pointflow: 3d point cloud
generation with continuous normalizing flows. In Proc. of the IEEE International Conf. on
Computer Vision (ICCV), 2019. 2
[71] Y. Yang, C. Feng, Y. Shen, and D. Tian. Foldingnet: Point cloud auto-encoder via deep grid
deformation. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2018. 2
[72] Z. Yang, Y. Chai, D. Anguelov, Y. Zhou, P. Sun, D. Erhan, S. Rafferty, and H. Kretzschmar.
Surfelgan: Synthesizing realistic sensor data for autonomous driving. In Proc. IEEE Conf. on
Computer Vision and Pattern Recognition (CVPR), 2020. 2
[73] L. Yariv, Y. Kasten, D. Moran, M. Galun, M. Atzmon, B. Ronen, and Y. Lipman. Multiview
neural surface reconstruction by disentangling geometry and appearance. In Advances in Neural
Information Processing Systems (NeurIPS), 2020. 1, 19
[74] Q. Zhou and A. Jacobson. Thingi10k: A dataset of 10,000 3d-printing models. arXiv preprint
arXiv:1605.04797, 2016. 6, 7, 15, 18
[75] Q.-Y. Zhou, J. Park, and V. Koltun. Open3D: A modern library for 3D data processing.
arXiv:1801.09847, 2018. 6

13
Supplementary Material for
Shape As Points: A Differentiable Poisson Solver

In this supplementary document, we first provide derivation details for our Differentiable Poisson
Solver in Section A. In Section B, we provide implementation details for our optimization-based and
learning-based methods. Next, we supply discussions on the easy initialization property of our Shape-
As-Points representation in Section C. Additional results and ablations for the optimization-based
and learning-based reconstruction can be found in Section D and Section E, respectively.

A Derivations for Differentiable Poisson Solver


A.1 Point Rasterization
Given the origin of the voxel grid c0 = (x0 , y0 , z0 ), and the size of each voxel s = (sx , sy , sz ),
we scatter the point normal values to the voxel grid vertices, weighted by the trilinear interpolation
weights. For a given point pi := (ci , ni ) ∈ {pi , i = 1, 2, · · · , N }, with point location ci =
(xi , yi , zi ) and pointj normal kn
li = (x̂mi , yˆi , zˆji ), we can
k l computem thejneighbor
k l indicesm as {j}, where
xi −x0 xi −x0 yi −y0 yi −y0 zi −z0 zi −z0
j = (jx , jy , jz ) ∈ ( sx , sx ) × ( sy , sy ) × ( sz , sz ). Here bc and
de denote the floor and ceil operators for rounding integers. We denote the trilinear sampling weight
function as T (cp , cv , s), where cp and cv denote the location of the point and the grid vertex. The
contribution from point pi to voxel grid vertex j can be computed as:
vj←i = T (ci , s j + c0 , s)ni (10)
Hence the value at grid index j ∈ r×r×r can be computed via summing over all neighborhood points:
X
vj = T (ci , s j + c0 , s)ni (11)
i∈Nj

where Nj denotes the set of point indices in the neighborhood of vertex j.


A.2 Spectral Methods for Solving PSR
We solve the PDEs using spectral methods [7]. In three dimensions, the multidimensional Fourier
Transform and Inverse Fourier Transform are defined as:
ZZZ ∞
f˜(u) := FFT(f (x)) = f (x)e−2πix·u dx (12)

ZZZ ∞
f (x) := IFFT(f˜(u)) = f˜(u)e2πix·u du (13)

where x := (x, y, z) are the spatial coordinates, and u := (u, v, w) represent the frequencies
corresponding to x, y and z. Derivatives in the spectral space can be analytically computed:
ZZZ ∞

f (x) = 2πixj f˜(u)e2πix·u du = IFFT(2πixj f˜(u))
∂xj ∞

In discrete form, we have the rasterized point normals v := (vx , vy , vz ), where vx , vy , vz ∈ Rn .


Hence in spectral domain, the divergence of the rasterized point normals can be written as:
FFT(∇ · v) = 2πi(u · ṽ) (14)
The Laplacian operator can be simply written as:
FFT(∇2 ) = −4π 2 ||u||2 (15)
Therefore, the unnormalized solution to the Poisson Equations χ̃, not accounting for boundary
conditions, can be written as:
iu ṽ  σ 2 ||u||2 
χ̃ = g̃σ,r (u) g̃σ,r (u) = exp − 2 (16)
−2π||u||2 r2

14
Where g̃σ,r (u) is a Gaussian smoothing kernel of bandwidth σ for grid resolution of r in the spectral
domain to mitigate the ringing effects as a result of the Gibbs phenomenon from rasterizing the
point normals. Please refer to Section D.3 for a more in-depth discussion to motivate the use of the
smoothing parameter, as well as related ablation studies on our parameter choice for σ.
The unnormalized indicator function in the physical domain χ0 can be obtained via inverse Fourier
Transform:

χ0 = IFFT(χ̃) (17)

We further normalize the indicator field to incorporate the boundary condition that the indicator field
is valued at zero at point locations and valued ±0.5 inside and outside the shapes.

m  1 X 0 
χ= 0
χ0 − χ |x=c (18)
abs(χ |x=0 ) |{c}|
| {z } c∈{c}
scale | {z }
subtract by mean

B Implementation Details

In this section, we provide implementation details for baselines and our method for both settings,
optimization-based and the learning-based reconstruction.

Optimization-based 3D reconstruction: We use the official implementation of IGR6 [19] and


Point2Mesh7 [23]. We optimize IGR for 15000 iterations on each object until convergence. For
Point2Mesh, we follow the official implementation and use 6000 iterations for each object. We
generate the initial mesh required by Point2Mesh following the description of the original paper.
Specifically, the initial mesh is provided as the convex hull of the input point cloud for objects with a
genus of zero. If the genus is larger than zero, we apply the watertight manifold algorithm [24] using
a low-resolution octree reconstruction on the output mesh of SPSR to obtain a coarse initial mesh.
For our method, we follow the coarse-to-fine and resampling strategy described in the main paper
(Section 3.2). To smooth the output mesh as well as to stabilze the optimization process, we gradually
increase the Gaussian smoothing parameter σ in (16) when increasing the grid resolution: σ = 2 for
a grid resolution of 323 and 643 , σ = 3 when the grid resolution is 1283 . At the final resolution of
2563 , we use σ = 3 for objects with more details (e.g. objects in SRB [68] and D-FAUST [6], and
σ = 5 for the input points with noises (Thingi10K [74]). We use the Adam optimizer [32] with a
learning rate decay. The learning rate is set to 2 × 10−3 at the initial resolution of 323 with a decay
of 0.7 after every increase of the grid resolution. Moreover, we run 1000 iterations at every grid
resolution of 323 , 643 and 1283 , and 200 iterations for 2563 . 20000 source points and normals are
used by our method to represent the final shapes for all objects.

6
https://github.com/amosgropp/IGR
7
https://github.com/ranahanocka/point2mesh

15
Thingi10K
SRB
D-FAUST

Input IGR [19] Point2Mesh [23]SPSR [31] Ours GT mesh

Figure 4: Optimization-based 3D Reconstruction. Here we show all the other 11 out of 15 objects
used for comparison (We have already shown 4 objects in Fig. 2 in main paper). Input point clouds
are downsampled for visualization. Note that the ground truth of SRB is provided as point clouds.
16
Time 0s 50 s 100 s 150 s 300 s

Sphere

0.502 0.118 0.074 0.058 0.047


Coarse Shape

0.136 0.059 0.051 0.048

Figure 5: Different Geometric Initialization under Optimization Setting. We compare the recon-
structions of SAP initialized from a sphere and the coarse geometry. The number below each image
indicates the Chamfer Distance to GT mesh.

Iterations 10K 50K 100K 200K Best


ConvONet [54] 0.082 0.058 0.055 0.050 0.044
Ours 0.041 0.036 0.035 0.034 0.034

Table 5: Training Progress. We show the Chamfer distance at different training iterations evaluated
in the Shapenet test set with 3K input points ((noise level=0.005). Our method uses geometric
initialization and converges much faster than ConvONet.

Learning-based 3D reconstruction: For AtlasNet [20], we use the official implementation8 with
25 parameterizations. We change the number of input points from 2500 (default) to 3000 for our
setting. Depending on the experiment, we adde different noise levels or outlier points (see Section
4.2 in main paper). We train ConvONet [54], PSGN [17], and 3D-R2N2 [11] for at least 300000
iterations, and use Adam optimizer [32] with a learning rate of 10−4 for all methods.
We train our method as well as Ours (w/o LDPSR ) for all 3 noise levels for 300000 iterations (roughly
2 days with 2 GTX 1080Ti GPUs) and use Adam optimizer with a learning rate of 5 × 10−4 . We
consider a batch size of 32. To generate the ground truth PSR indicator field χ in Eq. (9) of the main
paper, first we sample 100000 points and the corresponding point normals from the ground truth
mesh, and input to our DPSR at a grid resolution of 1283 .

C Discussions on “Easy Initialization” Property of SAP


As mentioned in the main paper, it is easy to initialize SAP with a given geometry such as template
shapes or noisy observations. Here we provide further discussions with some experiments under both
optimization and learning settings of SAP.
From all the results that we have shown so far under the optimization setting, we chose to start
from a sphere, since we intended to demonstrate that if our method is able to produce decent 3D
reconstruction even starting from a sphere, we can also faithfully reconstruct from a coarse or noisy
shape since it is a simpler task, and it should converge faster. In Fig. 5, we show a comparison
between initialization from a sphere and coarse shape. As can be observed, when starting from points
and normals sampled from a coarse shape, our method indeed converges faster. In terms of accuracy,
both results are equivalent, i.e., there is no better local minima attained by the optimization process.
In the learning setting, what we want to emphasize is geometric initialization instead of network
initialization. For neural-implicit methods, weight initializations can only be analytically derived
for simple shapes like spheres [1]. In contrast, since our networks only need to predict the offset
and normal fields for the input point cloud, this input point cloud is directly treated as the geometric
8
https://github.com/ThibaultGROUEIX/AtlasNet

17
Dataset Method Chamfer-L1 (↓) F-Score (↑) Normal C. (↑)
Ours (w/o resampling) 0.061 0.897 0.902
Thingi10K
Ours 0.053 0.941 0.947
Ours (w/o resampling) 0.077 0.813 –
DGP
Ours 0.067 0.848 –
Ours (w/o resampling) 0.044 0.964 0.952
D-FAUST
Ours 0.043 0.965 0.959

Table 6: Ablation Study of Resampling Strategy. On all datasets, our resampling strategy leads to
improved results. For D-FAUST, the increase is the lowest because the supervision point clouds are
noise free. Note that normal consistency cannot be evaluated on SRB as this dataset provides only
unoriented point clouds.
Thingi10K
SRB

Point cloud Mesh Point cloud Mesh


Ours w/o resampling Ours GT

Figure 6: Ablation Study of Resampling Strategy. We show the optimized point cloud and the
reconstructed mesh without and with the resampling strategy. Using the point resampling strategy
leads to a more uniformly distributed point cloud and better shape reconstruction.

initialization. In Table 5, we show the Chamfer distance at different training iterations evaluated in
the ShapeNet test set with 3K input points (noise level=0.005). As can be seen, our method enables
much faster training convergence than the neural-implicit method ConvONet [54], which illustrates
the effectiveness of the geometric initialization directly from input point clouds.

D Optimization-based 3D Reconstruction

D.1 Qualitative Comparison of All Objects in 3 Datasets


As a complementary to Fig. 2 in the main paper, Fig. 4 shows qualitative comparisons of the remaining
11 objects in the optimization-based setting, including 3 objects in Thingi10K [74], 4 in SRB [68]
and 4 in D-FAUST [6].
D.2 Ablation Study of Point Resampling Strategy
In Table 6 and Fig. 6, we compare the reconstructed shapes with and without the proposed resampling
strategy. Our method is able to produce reasonable reconstructions even without the resampling
strategy, but the shapes are much noisier. Since we directly optimize the source point positions and
normals without any additional constraints, the optimized point clouds can be unevenly distributed
as shown in Fig. 6. This limits the representational expressivity of the point clouds given the same
number of points. The resampling strategy acts as a regularization to enforce a uniformly distributed
point cloud, which leads to better surface reconstruction.

18
D.3 Analysis on the Gaussian Smoothing Parameter σ

Why we need a Gaussian: The Gaussian serves as a regularizer to the smoothness of the solved
implicit function. Not using a Gaussian is equivalent to using a Gaussian kernel with σ = 0. Fig. 7
below motivates the use of our sigma parameter.

Why use a Gaussian in the spectral domain: First, the FFT of a Gaussian remains a Gaussian.
Second, convolution of a Gaussian in the physical domain is equivalent to a dot product with a
Gaussian in the spectral domain and a dot product in the spectral domain is more efficient than
convolution in the physical domain: O(N log N ) vs O(N 2 ), where n is the resolution of a regular
grid and N = n3 .

Ablation study for the Gaussian smoothing parameter σ: We study the effect of the Gaussian
smoothing parameter σ at a resolution of 2563 . As visualized in Fig. 7, we can obtain faithful
reconstructions given different σ values. Nevertheless, we can notice that lower σ can preserve details
better but also is prone to noise, while high σ results in smooth shapes but can also lead to losing
of details. In practice, σ can be chosen according to the noise level of the target point cloud. In the
results depicted in Fig. 4 and Table 2 in main paper, we choose σ = 3 for SRB and D-FAUST dataset
and σ = 5 for Thingi10K dataset.
Thingi10K
D-FAUST

σ=1 σ=3 σ=5 σ=7 GT

Figure 7: Ablation Study of the Gaussian Smoothing Parameter σ. Low σ preserves details better
but is prone to noise, while high σ results in smooth shapes but also lead to detail losing.

D.4 Preliminary Results on Multi-view Reconstruction with Differentiable Rendering


We have validated the effectiveness of our Shape-As-Points representation with experiments on
3D reconstruction from 3D point clouds. To further show the flexibility of SAP, here we also
provide a proof-of-concept experiments for 3D reconstruction from multi-view images. Indeed, our
differentiable point-to-mesh layer under the optimization-based setting can be naturally integrated
with a differentiable renderer.
We proceed as follows. After obtaining the mesh in each iteration, for each 2D image, we rasterize the
3D mesh using rasterize_meshes in PyTorch3D [57] to find the corresponding surface points
for each pixel, and use an MLP to predict their RGB colors. An L2 loss is applied to the predicted
and input RGB images. In addition, we also apply a silhouette loss using a differentiable silhouette
renderer in PyTorch3D.
In Fig. 8 we show some preliminary results, where we test our implementation on 1 synthetic object
(cow) with 20 images and 2 real-world objects in DTU dataset [25] with around 50 images. As can
be observed, our method is indeed able to reconstruct detailed geometry using only 2D supervisions,
and we consider this as an exciting extension for future work.
We further remark that recent methods based on neural implicit representations [49, 51, 73] can attain
high-quality 3D reconstruction, but their optimization process is slow due to dense evaluation in the

19
Synthetic DTU [25]

Figure 8: SAP Multi-view Reconstruction with Differentiable Rendering. We show the recon-
struction results on a synthetic dataset and a real-world dataset [25].

ray marching step. In contrast, in each iteration our method outputs an explicit mesh. Therefore, no
expensive ray marching is required to find the surface, leading to faster inference.

E Learning-based 3D Reconstruction
E.1 Visualization of How SAP Handles Noise and Outliers
In this section, we visualize how our trained models handle noise and outliers during inference.

Noise Handling: We can see from the top row of Fig. 9 that, compared to the input point cloud, the
updated SAP points are densified because we predict k = 7 offsets per input point. More importantly,
all SAP points are located roughly on the surface, which leads to enhanced reconstruction quality.

Outlier Handling: We also visualize how SAP handles outlier points at the bottom row of Fig. 9.
The arrows’ length represents the magnitude of the predicted normals. There are two interesting
observations: a) A large amount of outlier points in the input are moved near to the surface. b) Some
outlier points still remain outliers. For these points, the network learns to predict normals with a very
small magnitude/norm as shown in the zoom-in view (we do not normalize the point normals to unit
length). In this way, those outlier points are “muted” when being passed to the DPSR layer such that
they do not contribute to the final reconstruction.

Input Ours - point clouds Ours - mesh GT mesh

Figure 9: Visualization of SAP Handling Noise and Outliers. The length of arrows represents the
magnitude of normals. SAP point clouds are downsampled for better visualization.

E.2 Additional Results for Reconstruction from Noisy Point Clouds


In Table 7 we provide quantitative results on all 13 classes of the ShapeNet subset of Choy et al. [11].
In Fig. 10, we show additional qualitative comparison on ShapeNet.

20
Chamfer-L1 F-Score Normal Consistency
SPSR PSGN 3D-R2N2 AtlasNet ConvONet Ours Ours SPSR PSGN 3D-R2N2 AtlasNet ConvONet Ours Ours SPSR PSGN 3D-R2N2 AtlasNet ConvONet Ours Ours
category (w/o LDPSR ) (w/o LDPSR ) (w/o LDPSR )

airplane 0.437 0.102 0.151 0.064 0.034 0.040 0.027 0.551 0.476 0.382 0.827 0.965 0.940 0.981 0.747 - 0.669 0.854 0.931 0.919 0.931
bench 0.544 0.129 0.153 0.073 0.035 0.041 0.032 0.430 0.266 0.431 0.786 0.965 0.949 0.979 0.649 - 0.691 0.820 0.921 0.915 0.920
cabinet 0.154 0.164 0.167 0.112 0.047 0.044 0.037 0.728 0.137 0.412 0.603 0.955 0.952 0.975 0.835 - 0.786 0.875 0.956 0.951 0.957
car 0.180 0.132 0.197 0.099 0.075 0.061 0.045 0.729 0.211 0.348 0.642 0.849 0.875 0.928 0.783 - 0.719 0.827 0.893 0.886 0.897
chair 0.369 0.168 0.181 0.114 0.046 0.047 0.036 0.473 0.152 0.393 0.629 0.939 0.939 0.979 0.715 - 0.673 0.829 0.943 0.940 0.952
display 0.280 0.160 0.170 0.089 0.036 0.036 0.030 0.544 0.175 0.401 0.727 0.971 0.975 0.990 0.749 - 0.747 0.905 0.968 0.967 0.972
lamp 0.278 0.207 0.243 0.137 0.059 0.069 0.047 0.586 0.204 0.333 0.562 0.892 0.897 0.959 0.765 - 0.598 0.759 0.900 0.899 0.921
loudspeaker 0.148 0.205 0.199 0.142 0.063 0.058 0.041 0.731 0.107 0.405 0.516 0.892 0.900 0.957 0.843 - 0.735 0.867 0.938 0.935 0.950
rifle 0.409 0.091 0.147 0.051 0.028 0.027 0.023 0.590 0.615 0.381 0.877 0.980 0.982 0.990 0.788 - 0.700 0.837 0.929 0.935 0.937
sofa 0.227 0.144 0.160 0.091 0.041 0.039 0.032 0.712 0.184 0.427 0.717 0.953 0.960 0.982 0.826 - 0.754 0.888 0.958 0.957 0.963
table 0.393 0.166 0.177 0.102 0.038 0.043 0.033 0.442 0.158 0.404 0.692 0.967 0.958 0.986 0.706 - 0.734 0.867 0.959 0.954 0.962
telephone 0.281 0.110 0.130 0.054 0.027 0.026 0.023 0.674 0.317 0.484 0.867 0.989 0.992 0.997 0.805 - 0.847 0.957 0.983 0.982 0.984
vessel 0.181 0.130 0.169 0.078 0.043 0.043 0.030 0.771 0.363 0.394 0.757 0.931 0.930 0.974 0.820 - 0.641 0.837 0.918 0.917 0.930
mean 0.299 0.147 0.173 0.093 0.044 0.044 0.034 0.612 0.259 0.400 0.708 0.942 0.942 0.975 0.772 - 0.715 0.855 0.938 0.935 0.944

(a) Noise = 0.005


Chamfer-L1 F-Score Normal Consistency
SPSR PSGN 3D-R2N2 AtlasNet ConvONet Ours Ours SPSR PSGN 3D-R2N2 AtlasNet ConvONet Ours Ours SPSR PSGN 3D-R2N2 AtlasNet ConvONet Ours Ours
category (w/o LDPSR ) (w/o LDPSR ) (w/o LDPSR )

airplane 0.716 0.107 0.147 0.103 0.052 0.059 0.045 0.268 0.457 0.413 0.558 0.883 0.857 0.915 0.550 - 0.665 0.787 0.904 0.897 0.905
bench 0.661 0.133 0.154 0.101 0.056 0.060 0.050 0.296 0.255 0.446 0.587 0.872 0.862 0.905 0.551 - 0.683 0.797 0.887 0.879 0.885
cabinet 0.323 0.166 0.165 0.118 0.065 0.067 0.051 0.383 0.138 0.435 0.554 0.883 0.863 0.920 0.671 - 0.784 0.855 0.937 0.927 0.938
car 0.338 0.137 0.188 0.115 0.104 0.091 0.071 0.415 0.200 0.388 0.528 0.739 0.749 0.817 0.632 - 0.714 0.792 0.875 0.862 0.871
chair 0.524 0.176 0.191 0.126 0.071 0.073 0.058 0.263 0.141 0.387 0.527 0.818 0.799 0.882 0.585 - 0.666 0.811 0.915 0.905 0.920
display 0.409 0.166 0.167 0.111 0.057 0.057 0.047 0.321 0.164 0.431 0.554 0.889 0.881 0.925 0.600 - 0.743 0.884 0.946 0.944 0.951
lamp 0.457 0.210 0.261 0.146 0.090 0.101 0.076 0.319 0.195 0.329 0.455 0.754 0.734 0.841 0.617 - 0.588 0.737 0.866 0.859 0.881
loudspeaker 0.320 0.205 0.203 0.144 0.090 0.092 0.065 0.369 0.109 0.407 0.471 0.793 0.778 0.853 0.675 - 0.734 0.850 0.920 0.910 0.925
rifle 0.848 0.097 0.144 0.119 0.047 0.044 0.042 0.218 0.575 0.403 0.439 0.905 0.917 0.928 0.541 - 0.691 0.746 0.888 0.895 0.896
sofa 0.452 0.152 0.153 0.109 0.065 0.061 0.051 0.337 0.166 0.457 0.572 0.857 0.865 0.908 0.631 - 0.744 0.860 0.936 0.934 0.941
table 0.514 0.169 0.177 0.115 0.057 0.061 0.049 0.293 0.158 0.431 0.564 0.885 0.869 0.923 0.597 - 0.729 0.847 0.936 0.929 0.940
telephone 0.521 0.112 0.128 0.105 0.038 0.038 0.033 0.329 0.311 0.508 0.520 0.959 0.964 0.976 0.591 - 0.847 0.917 0.975 0.975 0.976
vessel 0.403 0.135 0.173 0.111 0.073 0.072 0.058 0.399 0.341 0.397 0.527 0.796 0.794 0.856 0.612 - 0.639 0.788 0.881 0.875 0.886
mean 0.499 0.151 0.173 0.117 0.066 0.067 0.054 0.324 0.247 0.418 0.527 0.849 0.841 0.896 0.604 - 0.710 0.821 0.913 0.907 0.917

(b) Noise = 0.025


Chamfer-L1 F-Score Normal Consistency
SPSR PSGN 3D-R2N2 AtlasNet ConvONet Ours Ours SPSR PSGN 3D-R2N2 AtlasNet ConvONet Ours Ours SPSR PSGN 3D-R2N2 AtlasNet ConvONet Ours Ours
category (w/o LDPSR ) (w/o LDPSR ) (w/o LDPSR )

airplane 1.573 0.745 0.164 2.113 0.041 0.213 0.031 0.093 0.011 0.405 0.027 0.938 0.667 0.970 0.621 - 0.650 0.561 0.920 0.864 0.923
bench 1.499 0.573 0.166 1.856 0.041 0.076 0.036 0.126 0.007 0.431 0.053 0.945 0.829 0.965 0.575 - 0.695 0.630 0.910 0.876 0.909
cabinet 1.060 0.712 0.175 1.472 0.052 0.065 0.042 0.248 0.004 0.399 0.083 0.938 0.868 0.960 0.659 - 0.778 0.639 0.950 0.925 0.950
car 1.262 0.536 0.200 1.844 0.087 0.092 0.057 0.177 0.009 0.351 0.059 0.812 0.764 0.893 0.634 - 0.711 0.620 0.884 0.863 0.888
chair 0.984 0.689 0.228 1.478 0.055 0.087 0.041 0.186 0.005 0.362 0.076 0.903 0.783 0.959 0.628 - 0.672 0.644 0.930 0.898 0.941
display 1.312 0.965 0.201 1.685 0.041 0.060 0.034 0.188 0.004 0.374 0.062 0.956 0.878 0.980 0.627 - 0.747 0.593 0.962 0.945 0.967
lamp 1.402 0.958 0.399 2.080 0.073 0.110 0.047 0.123 0.004 0.283 0.037 0.838 0.699 0.941 0.630 - 0.587 0.569 0.885 0.844 0.910
loudspeaker 0.930 0.905 0.224 1.392 0.075 0.093 0.050 0.264 0.003 0.376 0.093 0.861 0.792 0.927 0.673 - 0.732 0.652 0.930 0.904 0.940
rifle 1.689 0.479 0.163 2.442 0.037 0.055 0.026 0.066 0.022 0.386 0.012 0.953 0.888 0.985 0.627 - 0.679 0.519 0.916 0.907 0.929
sofa 1.267 0.607 0.172 1.656 0.047 0.064 0.037 0.211 0.006 0.412 0.080 0.934 0.866 0.967 0.655 - 0.756 0.666 0.950 0.933 0.956
table 1.159 0.913 0.202 1.581 0.044 0.082 0.037 0.166 0.004 0.405 0.081 0.950 0.810 0.972 0.618 - 0.737 0.672 0.951 0.915 0.954
telephone 1.458 0.851 0.146 1.890 0.030 0.036 0.025 0.173 0.005 0.461 0.047 0.983 0.971 0.994 0.668 - 0.839 0.589 0.980 0.975 0.982
vessel 1.530 0.639 0.189 2.200 0.054 0.072 0.036 0.108 0.009 0.387 0.032 0.890 0.814 0.956 0.652 - 0.635 0.568 0.907 0.886 0.921
mean 1.317 0.736 0.202 1.822 0.052 0.085 0.038 0.164 0.007 0.387 0.057 0.916 0.818 0.959 0.636 - 0.709 0.609 0.929 0.903 0.936

(c) Noise = 0.005, Outliers = 50%


Table 7: 3D Reconstruction from Point Clouds on ShapeNet. This table shows a per-category
comparison of baselines and different variants of our approach. We train all methods on all 13 classes.

E.3 Comparison to Points2Surf [16]


Here we also provide a quantitative comparison to Points2Surf [16], which is also a learning-based
3D reconstruction method with noisy point clouds as input. We train and test on the lamp object class
in ShapeNet with a noise level of 0.005. The reason why we do not train on the entire ShapeNet
dataset and not testing on all 3 settings is the very long training and inference time of Points2Surf [16].
Due to the constraints of time and resources, we choose to compare only on ‘lamp‘, a challenging
class in ShapeNet, which has roughly 2000 objects for training and 450 objects for testing.
We use the official implementation9 and 8 GTX 1080Ti GPUs, and it takes over 50 hours to reach 150
epochs (the convergence time as reported in the paper). In contrast, we train our method for only 30K
iterations, which takes 2 GTX 1080Ti GPUs for less than 4 hours. Our results from 30K are close to
the best model from 300K iterations. As shown in the Table 8, our method outperforms Points2Surf
in all metrics, and the inference speed is 3 orders of magnitude faster.

9
https://github.com/ErlerPhilipp/points2surf

21
High Noise
Outliers

Input SPSR [31] 3D-R2N2 [11] AtlasNet [20] ConvONet [54] Ours GT mesh

Figure 10: 3D Reconstruction from Point Clouds on ShapeNet. Comparison of SAP to baselines
on 2 different setups. Compared to SPSR, our method can robustly estimate normals under noise and
selectively ignore outliers. Compared to 3D-R2N2 and AtlasNet, SAP has greater representational
expressivity that allows it to reconstruct fine geometric details. Compared to ConvONet, SAP
produces a higher quality reconstruction at a fraction of its inference time.

Chamfer-L1 F-Score Normal C. Inference time


Points2Surf [16] 0.078 0.838 0.844 70 s
Ours 0.048 0.950 0.918 0.064 s

Table 8: Quantitative Comparison to Points2Surf [16]. We train and test on the ShapeNet ‘lamp’
class. Our model is trained only for 30K iterations but outperforms Points2Surf in all metrics with
much shorter inference time.

22

You might also like