Tchapmi_TopNet_Structural_Point_Cloud_Decoder_CVPR_2019_paper
Tchapmi_TopNet_Structural_Point_Cloud_Decoder_CVPR_2019_paper
http://completion3d.stanford.edu
Lyne P. Tchapmi1 Vineet Kosaraju1 S. Hamid Rezatofighi1,2 Ian Reid2 Silvio Savarese1
1
Stanford University, 2 The University of Adelaide, Australia
Abstract
1383
proaches applicable to shape completion focus on point- the generated point cloud structure or topology. This model
cloud based shape completion [38, 13, 37]. The dominant choice is inspired by the formal definition of topology on
paradigm in these frameworks is to use a point cloud en- finite discrete point sets as detailed in Section 4.
coder to embed the partial point cloud input [21], and de- We evaluate our proposed decoder on the Shapenet
sign a decoder to generate a completed point cloud from the dataset and show a 34% relative improvement over the next-
embedding of the encoder. best performing methods for the task of point cloud shape
The key focus in most existing point cloud generation completion1 . Visualizations of the nodes of our tree de-
approaches is on the representation of 2D or 3D objects coder reveals various non-identical patterns learned by our
and also designation of a relevant point cloud based de- decoder.
coder for the proposed representation. The earliest method The main contributions of the paper are summarized as
uses a parallel multilayer perceptron network and a de- follows:
convolution network to decode the latent feature generated
by a 2D encoder [9] while using a permutation invariant • We propose a novel way to model arbitrary struc-
loss such as the earth mover’s ditance [26, 9] or Chamfer ture/topology on a point cloud using a rooted tree in
loss [9] to deal with the orderless nature of point clouds. which each node embeds a defining element of the
However, this framework does not explicitly consider any structure.
topology or structure that naturally exists in real-world 3D • We design a novel tree decoder network for point cloud
objects. The recent successful approaches concentrate on generation and completion which generates arbitrarily
the designation of decoders which generate structured point structured point clouds without explicitly enforcing a
clouds [38, 13, 37]. For example in [13], it is assumed specific structure.
the point clouds of a 3D object lie on a 2-manifold, for- • We show an intuitive visualization of the structure
mally defined as a topological space that locally resembles learned by our decoder by visualizing a node in the
the Euclidean plane. Therefore, the proposed decoder is en- tree decoder as a set of all its descendants.
forced to learn a set of mappings or deformations from the • Finally, our network sets a new state-of-the-art on ob-
euclidean plane to the object point cloud. Imposing these ject point cloud completion by more than 30% im-
structures into the learning process may result in superior provement over the next-best performing approach.
performance in generating 3D object point clouds compared
to the approaches ignoring structure. However what is usu- 2. Related Works
ally ignored is the potential impact that a specific represen-
tation for grouping the point clouds may have on a learning Our work follows a long line of frameworks on shape
process that uses an unstructured (i.e. permutation invari- completion which leverage various representations. We re-
ant) loss. Enforcing a single specific structure during learn- view a subset of these approaches among those leveraging
ing may not be optimal for training as the space of possible neural networks.
solutions is constrained. Volumetric 3D shape completion: Earlier and some
To address this issue, we propose a more general decoder recent works on shape completion leveraging neural net-
that can generate structured point clouds by implicitly mod- works favored voxel grids, and distance fields representa-
eling the point cloud structure in a rooted tree architecture. tions [6, 15, 29, 27, 18] which are well suited for process-
We show that given enough capacity and allowing for re- ing with 3D convolutional neural networks. These works
dundancies, a rooted tree allows us to model structure, in- have shown great success in the tasks of 3D Reconstruc-
cluding any topology on the point set, making our decoder a tion [5, 11], shape completion [6, 14, 36, 19], and shape
more general structured decoder. Since structure is only im- generation from embeddings [3]. However voxel grids re-
plicitly modeled in our decoder, our model is not bound to a quire large footprints and early works operate on low di-
pre-determined structure, and therefore can embed arbitrary mensional grids. This issue has been addressed using sparse
structure and/or topologies on the point set, as in Fig. 1. representations [30, 32, 33, 17, 12, 25] which makes pro-
More specifically for the shape completion task, we em- cessing voxels more efficient. However the process of vox-
bed the partial input point cloud as a feature vector or code elization still introduces a quantization effect which dis-
which is used by our proposed decoder to generate a com- cards some details of the data [34] and is not suitable for
pleted point cloud. Our decoder has a rooted tree structure representing fine-grained information. To avoid this limita-
in which the root node embeds the entire point set as the en- tion, recent works generate 3D shapes in point cloud space
coder feature vector, the leaf nodes are individual points in which is naturally able to represent fine-grained details, and
the point set, and each internal node embeds a subset of the 1 The code for this project and an associated object point cloud com-
entire point cloud made of all its descendant leaves. The set pletion benchmark with all evaluated methods are available at http:
of point cloud subsets represented by the tree nodes defines //completion3d.stanford.edu.
384
several of these works show superior performance to voxel- fold, we do not enforce our decoder to generate any specific
based methods [38, 9, 10]. topology which increases the space of potential topologies
Multiresolution Point Cloud Generation with Neural that can be generated by the decoder. Visualization of struc-
Networks: A few works on point cloud generation con- tural patterns learned by our decoder suggests that the de-
sider or introduce a multi-resolution structure in the process coder learns patterns which are not necessarily traditional
of point cloud generation. Yuan et al. [38] generate point 2-manifolds but occur across several objects.
clouds in 2 stages where the first stage is a lower resolution
point cloud and the 2nd stage is the final output. Gadelha 3. Background and Motivation
et al. [10] represents a point cloud as a 1D list of spatially We first provide some background on concepts relating
ordered points, and generates a point cloud through a tree to object structure and topology.
network in which each branch aims at representing different 2-manifolds and surface topology: Object structure is
resolutions of the point cloud which are connected through commonly modeled as a surface or 2-manifold which for-
multiresolution convolution. These works while highly per- mally is a topological space that locally resembles the Eu-
formant, constrain the network to focus on the multiresolu- clidean plane. This implies that 2-manifolds have a local
tion structure in the point cloud. smoothness property. Previous works have attempted to
Unstructured Point Cloud Generation with Neural explicitly enforce this property by learning mappings from
Networks: Fan et al. [9] proposed one of the earliest works smooth 2D grids to local surfaces of 3D objects[13, 37, 38].
in the literature addressing the task of generating 3D point While enforcing local smoothness may be helpful for learn-
clouds from an input image. They proposed an architecture ing explicitly smooth representations such as meshes, this
made of an encoder which encodes the input into a embed- assumption may be less relevant for point clouds due to their
ding, and a decoder which generates the point cloud from discrete nature which allows for various potentially more
the embedding. The decoder they propose is a 2 branch ar- suitable non-smooth representations.
chitecture, one multilayer perceptron (MLP) branch and one Topology on discrete point sets: Unlike 2-manifold
deconvolutional branch. They also introduce the Chamfer representations, we do not leverage the local smoothness
loss, a differentiable loss for point cloud comparison which assumption due to the discrete nature of point clouds. In-
we leverage in our work. However, the output generated stead we leverage one of the multiple more general equiva-
by their method is an unstructured point cloud, while real- lent definitions of topological space on finite discrete point
world object point clouds are structured and can be repre- sets which for a set S is defined as a nonempty collection of
sented as a collection of subsets, e.g. surfaces and parts. subsets of S [2](see Section 4). Note that this definition of
Recent approaches based on imposing one of the structured topology is significantly less constrained than 2-manifolds
representations, have shown to be superior in generating and does not impose restrictions on smoothness or point
point clouds of real-world objects. We review them next. neighborhoods within the topology. Leveraging this defini-
Manifold-based Point Cloud Generation with Neural tion allows us to design a point cloud decoder which is less
Networks: Several state-of-the-art methods on point cloud constrained than previous topological point cloud decoders.
generation and completion generate structured point clouds Indeed, we simply design a decoder which is able to group
by assuming and enforcing a 2-manifold topology on the the point cloud into subsets defining the point cloud struc-
point cloud. Groueix et al. [13] design a decoder that learns ture. The intuition behind designing a decoder that groups
a manifold by learning a mapping from the Euclidean plane a point cloud into subsets is that if a topology - defined as
to the ground-truth point cloud. Yang et al. [37] and Yuan a collection of nonempty subsets is adequate for the learn-
et al. [38] also learn to generate a point cloud structured as ing process, then the decoder has the ability to generate the
a manifold through a series of deformation (folding) oper- point cloud according to that topology by adequately group-
ations on the Euclidean plane. While several point clouds ing points.
are indeed derived from sampling manifolds, they exhibit Designing Topological Decoders: Given the more gen-
several other structures or topological representations that eral definition of topology on discrete finite point sets as a
can be leveraged during training. Therefore enforcing a collection of subsets, how can we design a decoder capa-
specific structured representation may constrain the learn- ble of generating collections of subsets for a set S? One
ing process by limiting the solution search space. To avoid straightforward approach is to train N multilayer percep-
this limitation, we propose a decoder which is able to rep- trons to generate separate point cloud subsets and merge
resent arbitrary structures and topologies on a point set. them into the final point cloud. This method trivially scales
The decoder has a rooted tree structure in which each node poorly the larger the size of the topology or structure de-
of the tree represents and generates a subset of the point fined on the point set. We build and improve on this basic
cloud defining the point cloud structure. Unlike [13, 37, 38] idea and instead propose a decoder modeled as a hierarchi-
which specifically enforce their decoder to generate a mani- cal rooted tree in which each node of the tree represents a
385
subset of the point cloud and the root of the tree represents Proposition 1: Any topology T on a point set S can be
the entire point cloud (Fig. 1). This rooted tree architecture modeled as a rooted tree structure G in which:
has several appealing properties including its ability to rep-
– Every leaf of G represents a singleton {s} where s is
resent arbitrary topologies and structure on discrete point
an individual point s ∈ S (P1a)
sets. The hierarchical nature of the decoder is also a more
efficient and compact representation since parts of the neu- – Each non-leaf node G(Si ) in the tree represents a non-
ral network are shared in generating overlapping subsets of empty element Si ∈ T (P1b)
the point cloud (see Section 5). Next, we present the the- – For any pair of nodes G(Si ), G(Sj ) in G, if G(Sj ) is
oretical foundation of our work starting with a formal defi- a child of G(Si ), then Si ⊆ Sj (P1c)
nition of topology on discrete point sets, and we show that Proof by existence: We want to show that for each topology
our proposed rooted tree structure is adequate to represent T on a set S, there exists at least one rooted tree structure
several point cloud structures including any arbitrary topol- G satisfying the conditions of Proposition 1. Lets define
ogy. T∗ = T ∪ {{s} : s ∈ S}. We create a graph G as follows:
4. Topology Representation – For each individual point s ∈ S and each non-empty
set Si ∈ T, we create a representative node G(s),
In this work, we leverage a general definition of topol- G(Si ) in G. (C1a)
ogy on point sets and propose a decoder that generates
– For each non-empty Si , Sj ∈ T∗ , if Si * Sj , we add a
a structured point cloud. To this end, we first provide a
directed edge E(Si , Sj ) from Si to Sj in G (C1b)
general definition of a topological space on point sets.
– Since ∀Si ∈ T∗ /S, Si * S, E(Si , S) is an edge in
There exists several equivalent definitions of topological
G and we designate G(S) as the root of our graph G
spaces, and for our purposes, we use the following [2]:
(C1d)
Definition: Assume S = {s1 , s2 , · · · , sn } is a set of By C1a, all non-empty subsets of T are represented by a
points, where si ∈ Rd is a point in d dimensional space, node in G and we now show that G satisfies the conditions
and T = {S1 , S2 , · · · , Sk } is a collection of open subsets in Proposition 1.
of S. Then, T is a topology on S if: P1a proof: Let Gl be a leaf in G representing a subset
1. The empty set ∅ and S are open, Si ⊆ S . By C1a, Si is non-empty which means ∃s ∈
S, s ∈ Si . Therefore s ⊆ Si . By contradiction, lets assume
2. The intersection of any finite number of subsets of T is
Si is not a singleton. By C1b, E(Si , {s}) is a directed edge
in T, i.e. ∩∀i Si ∈ T,
G which means Gl is not a leaf which is a contradiction. Si
3. The union of an arbitrary number of subsets of T is in
must therefore be a singleton.
T, i.e. ∪∀i Si ∈ T,
P1b proof: This is a direct consequence of the definition
Any finite point set can be defined as open and through- in C1a
out this work we use that assumption, such that ∅, S, and P1c proof: This is a direct consequence of the converse
all subsets of S are open. From the above definition, it be- of C1d
comes apparent that topology on a finite discrete point set Corollary 1: The set of all leaf descendants of a given node
can be generally conceived as a collection of subsets of the in G(Si ) in G is equal to Si . Based on this corollary, we
point set. Therefore a decoder modeling topology must be can visualize individual nodes in a point set's topological
able to generate or model groups of points in order to rep- tree by visualizing all its descendants leaves.
resent a topology on the point set. Among several possibil-
ities to accomplish the task of generating a point cloud as Proposition 2: Given a structured point cloud, repre-
a group of points, we choose to use a rooted tree topology senting a set of points S, and a rooted tree graph G with at
for two main reasons. The first reason for our choice is that least 3 non-leaf nodes, and the properties defined in Propo-
assuming enough capacity and allowing for redundancies, sition 1, there exist more than one topology that can be rep-
any topology T on a point set S can be represented as a resented by G.
rooted tree as shown in Proposition 1. The second reason Proof: This proposition can also be proven by exam-
for our choice is that any rooted tree with at least three non ple. There are two trivial topologies that can be represent
leaf nodes can embed at least 2 topologies (Proposition 2) in a rooted tree G of at least 3 non-leaf nodes. The first of
meaning this representation regardless of capacity, (as long which being T = {∅, S} which can be represent in G by
as there are at least 3 non leaf nodes) can quantitatively en- choosing the root node and every other node in the tree to
code more topologies than previous works in which a single represent S (our representation allows for duplicate nodes).
pre-determined topology is assumed. We now go into the The empty set is represented implicitly. The second trivial
details and proof of the two propositions above. topology that can be represented by G is T = {∅, S, S1 , S2 }
386
Decoder cludes an encoder as a first stage.
MLP
LP
structure in which the root node embeds and processes the
LP
M
M
P
P
P
P
P
ML
ent node embeddings concatenated with the global embed-
ML
ML
ML
ML
ML
MLP
MLP
MLP
MLP
MLP
MLP
MLP
MLP
MLP
MLP
387
al. [9] meets these two requirements and is defined as:
X X
dCD (S, SG ) = min kx − yk2 + min kx − yk2
y∈SG x∈S
x∈S y∈SG
(1)
For each point, the Chamfer distance finds the nearest
neighbor in the other set and computes their squared dis-
tances which are summed over both target and ground-truth
sets.
6. Experiments
We now evaluate our proposed decoder both quantita-
tively and qualitatively on the task of 3D point cloud shape Figure 3: Number of Parameters: We analyze the performance
completion. A given partial input point cloud is processed of our networks instantiations as a function of its number of pa-
through an existing point cloud encoder and the resulting rameters. Across all instantiations, our network outperforms pre-
embedding is processed by our decoder to generate a point vious works. A local minimum seems to emerge from this plot.
cloud. The encoder used in our final model is the 2-stage Note the Chamfer distance is reported multiplied by 104 .
PointNet-based encoder from Yuan et al. [38].
Dataset: We evaluate our dataset on a subset of the 6.1. Encoder analysis
Shapenet dataset [4] derived from the dataset in Yuan et
al. [38] The ground-truth in [38] was generated by uni- Methods proposed in previous works use a variety of en-
formly sampling 16384 points on the mesh surfaces and the coders. The point cloud generated is dependent not only
partial point clouds were generated by back-projecting 2.5D on the decoder but also on the feature generated by the en-
depth images into 3D. In our experiments we use N = 2048 coder. In this experiment we analyze the effect of encoder
for both input and ground-truth point clouds which are ob- choice on the performance of our decoder and that of pre-
tained by random subsampling/oversampling of the clouds vious decoder. The results of this analysis are tabulated in
in [38]. We keep the same train/test split as [38]. Table 2. The first encoder (A) used in previous works [13]
Implementation Details: Our final decoder has L = 6 and for our fully connected baseline is a PointNet[13]. The
levels and each MLP in the decoder tree generates a small second encoder is proposed in [38] in which it demonstates
node feature embedding of size F = 8. When generat- better performance than PointNet++ [24]. We show results
ing N = 2048 points, the root node has 4 children and all with these two encoders and compare against methods us-
other internal nodes in subsequent level generate 8 children, ing each respectively. Regardless of the encoder used, our
yielding a total of N = 4 × (83 ) = 2048 points generated method outperforms existing methods. There is a noticeable
by the decoder. Each MLP in the decoder is a has 3 stages gap in performance depending on the encoder chosen, but
with 256, 64, and C channels respectively, where C = 8 for comparing across methods using the same encoder, our net-
inner nodes and C = 3 for leaf nodes. works still shows a significant performance improvement.
Training Setup: We train all models for 300 epochs,
6.2. Ablation studies
with a batch size of 32, a learning rate of 1e-2 or 5e-3
depending on stability, and Adagrad optimizer. The best Design choices involved in our decoder include choosing
model is chosen based on the validation set. the number of features F generated for each node embed-
Evaluation: We evaluate our model across 8 classes ding and the number of tree levels L. We analyze the effect
from the Shapenet dataset, against state-of-the-art methods of these parameters for an output cloud size N = 2048 by
on point cloud completion. For each class, the Chamfer varying F in {8, 16, 32, 64}, and L in {2, 4, 6, 8}. This ab-
Distance is computed (averaged over the number of class lations study was used to pick the model’s final number of
instances). Our final metric is the mean Chamfer Distance layers and number of features. One important thing to note
averaged across classes. In addition, we train a fully con- is that since the number of output points is fixed at 2048
nected decoder baseline with 4 layers of output dimensions in this experiment, increasing the number of levels requires
256, 512, 1024, and 3 × N . For N = 2048, the results of decreasing the number of children per level. This opera-
the evaluation of our method against state-of-the-art meth- tion is therefore not similar to adding a new layer in con-
ods are shown in Table 1, with qualitative results in Figure ventional networks and a deeper tree may not necessarily
4. Our method significantly outperforms existing methods improve performance.
across all classes, and shows a 33.9% relative improvement In Figure 5a, we plot the Chamfer distance as a func-
over the next best method. tion of the number of levels L for different values of F. For
388
Method
Eval. Chamfer Distance (CD)
Plane Cabinet Car Chair Lamp Couch Table Watercraft Average
AtlasNet [13] 10.37 23.40 13.41 24.16 20.24 20.82 17.52 11.62 17.69
PointNetFCAE [base.] 10.31 19.07 11.83 24.69 20.31 20.09 17.57 10.50 16.80
Folding [37] 11.18 20.15 13.25 21.48 18.19 19.09 17.80 10.69 16.48
PCN [38] 8.09 18.32 10.53 19.33 18.52 16.44 16.34 10.21 14.72
Ours 5.50 12.02 8.90 12.56 9.54 12.20 9.57 7.51 9.72
Table 1: Point Cloud Completion Results on ShapeNet: Comparison of our approach against previous works for the resolution (N ∼=
2048). The Chamfer distance is reported multiplied by 104 .
F ∈ {8, 32, 64} the graphs exhibit different local minima points, we keep the ground-truth at 2048 points and only in-
but for F = 6, the performance oscillates around an aver- crease the output size: quantitative results are shown in Ta-
age value. In Figure 5b, we plot the Chamfer distance as ble 3. We notice that our network’s performance increases
a function of the number of features F for different values with the number of output points albeit the absolute im-
of L). The graphs for L ∈ {4, 6, 8} exhibit slightly simi- provement is less than that of PCN[38]. As a percentage,
lar trend though the pattern is non-convex. The graph for the improvement is still significant, ranging from 9.16% to
L = 2 exhibits a very different pattern compare to the oth- 18.46% for our method while PCN’s improvement ranges
ers. In all experiments, regardless of the value for L and F from -2.61% to 20.26%. In all resolutions, our method still
above, our method outperforms all previous method. shows superior absolute performance.
389
Resolution Method
(# points) AtlasNet [13] PointNetFCAE [base.] Folding [37] PCN [38] Ours
2048 17.69 16.8 16.48 14.72 9.72
4096 16.12 14.79 13.19 11.88 8.83
8192 15.32 12.57 12.95 12.19 7.20
16384 14.85 10.61 12.26 9.72 6.50
Table 3: Resolution analysis: Comparison of our approach against the best performing method (PCN) with varying point cloud resolution.
We train each method to generate N = 2048, 4096 points. The Chamfer distance is reported, multiplied by 104 . Our method outperforms
PCN for all different resolutions, but the performance gap between the methods is higher for low-resolution point clouds.
Figure 4: Point Cloud Completion Results. A partial point cloud is given as input and our method generates a completed point cloud.
(a) (b)
Figure 5: Ablation Experiments: We analyze the effect of varying different parameters in our network. We vary the number of node
features {8, 16, 32, 64}, and the number of tree levels {2, 4, 6, 8}, while keeping the number of outputs constants. For instance when the
number of levels L=2, the number of children per level is 32-64. When L=4, the number of children per level is 4, 4, 4, 8. All instantiations
of our method outperform previous works. The number of levels seem to suggest a local minimum, but the number of features does not
show a noticeable pattern. The Chamfer distance is reported multiplied by 104 .
would be to explore if the learned structure embeddings bed arbitrary topologies, it is only able to embed a single
have value as representations in classification. Another fu- topology at test time. One avenue of exploration would be
ture improvement of this work could be to propose a way to to combine several such decoders and evaluate whether they
generate structured point clouds without redundancies. all converge to the same topology for S or whether each de-
Multiple structures: While our proposed model can em- coder learns to embed disjoint subsets of S.
390
References [16] C. Häne, S. Tulsiani, and J. Malik. Hierarchical surface pre-
diction for 3d object reconstruction. In 3DV, 2017. 1
[1] P. Achlioptas, O. Diamanti, I. Mitliagkas, and L. Guibas.
[17] R. Klokov and V. Lempitsky. Escape from cells: Deep kd-
Learning representations and generative models for 3d point
networks for the recognition of 3d point cloud models. ICCV,
clouds, 2018. 1
2017. 2
[2] M. Armstrong. Basic Topology. Undergraduate Texts in [18] T. Le and Y. Duan. Pointgrid: A deep network for 3d shape
Mathematics. Springer, 1990. 3, 4 understanding. IEEE Conference on Computer Vision and
[3] A. Brock, T. Lim, J. M. Ritchie, and N. Weston. Generative Pattern Recognition (CVPR), June 2018. 1, 2
and discriminative voxel modeling with convolutional neural [19] D. Li, T. Shao, H. Wu, and K. Zhou. Shape completion from
networks. CoRR, abs/1608.04236, 2016. 1, 2 a single rgbd image. IEEE Transactions on Visualization and
[4] A. X. Chang, T. A. Funkhouser, L. J. Guibas, P. Hanrahan, Computer Graphics, 23(7):1809–1822, July 2017. 2
Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, [20] J. Li, B. M. Chen, and G. H. Lee. So-net: Self-
J. Xiao, L. Yi, and F. Yu. Shapenet: An information-rich 3d organizing network for point cloud analysis. arXiv preprint
model repository. CoRR, abs/1512.03012, 2015. 6 arXiv:1803.04249, 2018. 1
[5] C. B. Choy, D. Xu, J. Gwak, K. Chen, and S. Savarese. 3d- [21] Y. Li, R. Bu, M. Sun, and B. Chen. Pointcnn. CoRR,
r2n2: A unified approach for single and multi-view 3d object abs/1801.07791, 2018. 1, 2
reconstruction. In Proceedings of the European Conference [22] O. Litany, A. Bronstein, M. Bronstein, and A. Makadia. De-
on Computer Vision (ECCV), 2016. 2 formable shape completion with graph convolutional autoen-
[6] A. Dai, C. R. Qi, and M. Niener. Shape completion us- coders. In The IEEE Conference on Computer Vision and
ing 3D-encoder-predictor CNNs and shape synthesis. In Pattern Recognition (CVPR), June 2018. 1
IEEE Conference on Computer Vision and Pattern Recog- [23] C. R. Qi, H. Su, K. Mo, and L. J. Guibas. PointNet: Deep
nition (CVPR), 2017. 1, 2 learning on point sets for 3D classification and segmentation.
[7] A. Dai, D. Ritchie, M. Bokeloh, S. Reed, J. Sturm, and In IEEE Conference on Computer Vision and Pattern Recog-
M. Nießner. Scancomplete: Large-scale scene completion nition (CVPR), 2017. 1, 7
and semantic segmentation for 3d scans. In Proc. Computer [24] C. R. Qi, L. Yi, H. Su, and L. J. Guibas. Pointnet++: Deep hi-
Vision and Pattern Recognition (CVPR), IEEE, 2018. 1 erarchical feature learning on point sets in a metric space. In
[8] R. F. de Figueiredo, P. Moreno, and A. Bernardino. Auto- I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus,
matic object shape completion from 3d point clouds for ob- S. Vishwanathan, and R. Garnett, editors, Advances in Neu-
ject manipulation. In VISIGRAPP, 2017. 1 ral Information Processing Systems 30, pages 5099–5108.
[9] H. Fan, H. Su, and L. Guibas. A Point Set Generation Net- Curran Associates, Inc., 2017. 1, 6
work for 3D Object Reconstruction from a Single Image. [25] G. Riegler, A. O. Ulusoy, H. Bischof, and A. Geiger. Oct-
IEEE Conference on Computer Vision and Pattern Recog- netfusion: Learning depth fusion from data. In 3DV, pages
nition (CVPR), 2017. 1, 2, 3, 5, 6 57–66. IEEE Computer Society, 2017. 2
[10] M. Gadelha, R. Wang, and S. Maji. Multiresolution tree net- [26] Y. Rubner, C. Tomasi, and L. J. Guibas. The earth mover’s
works for 3d point cloud processing. In The European Con- distance as a metric for image retrieval. International journal
ference on Computer Vision (ECCV), September 2018. 3 of computer vision, 40(2):99–121, 2000. 2
[11] R. Girdhar, D. Fouhey, M. Rodriguez, and A. Gupta. Learn- [27] A. Sharma, O. Grau, and M. Fritz. Vconv-dae: Deep vol-
ing a predictable and generative vector representation for ob- umetric shape learning without object labels. In G. Hua
jects. In ECCV, 2016. 2 and H. Jégou, editors, Computer Vision – ECCV 2016 Work-
[12] B. Graham, M. Engelcke, and L. van der Maaten. 3d se- shops, pages 236–250, Cham, 2016. Springer International
mantic segmentation with submanifold sparse convolutional Publishing. 2
networks. CVPR, 2018. 2 [28] A. Sinha, A. Unmesh, Q. Huang, and K. Ramani. Surfnet:
[13] T. Groueix, M. Fisher, V. G. Kim, B. C. Russell, and Generating 3d shape surfaces using deep residual networks.
M. Aubry. AtlasNet: A Papier-M\ˆach\’e Approach to In CVPR, pages 791–800. IEEE Computer Society, 2017. 1
Learning 3D Surface Generation. In IEEE Conference on [29] D. Stutz and A. Geiger. Learning 3D Shape Completion from
Computer Vision and Pattern Recognition (CVPR), 2018. 1, Laser Scan Data with Weak Supervision. IEEE Conference
2, 3, 6, 7, 8 on Computer Vision and Pattern Recognition (CVPR), 2018.
[14] X. Han, Z. Li, H. Huang, E. Kalogerakis, and Y. Yu. High- 2
resolution shape completion using deep neural networks for [30] H. Su, V. Jampani, D. Sun, S. Maji, E. Kalogerakis, M.-H.
global structure and local geometry inference. In IEEE In- Yang, and J. Kautz. SPLATNet: Sparse lattice networks for
ternational Conference on Computer Vision (ICCV), October point cloud processing. In Proceedings of the IEEE Con-
2017. 1, 2 ference on Computer Vision and Pattern Recognition, pages
[15] X. Han, Z. Li, H. Huang, E. Kalogerakis, and Y. Yu. High- 2530–2539, 2018. 2
Resolution Shape Completion Using Deep Neural Networks [31] J. Varley, C. DeChant, A. Richardson, A. Nair, J. Ruales,
for Global Structure and Local Geometry Inference. IEEE and P. Allen. Shape completion enabled robotic grasping.
International Conference on Computer Vision (ICCV), 2017. In Intelligent Robots and Systems (IROS), 2017 IEEE/RSJ
2 International Conference on. IEEE, 2017. 1
391
[32] P.-S. Wang, Y. Liu, Y.-X. Guo, C.-Y. Sun, and X. Tong.
O-CNN: Octree-based Convolutional Neural Networks for
3D Shape Analysis. ACM Transactions on Graphics (SIG-
GRAPH), 36(4), 2017. 2
[33] P.-S. Wang, C.-Y. Sun, Y. Liu, and X. Tong. Adaptive O-
CNN: A Patch-based Deep Representation of 3D Shapes.
ACM Transactions on Graphics (SIGGRAPH Asia), 37(6),
2018. 2
[34] Z. Wang and F. Lu. Voxsegnet: Volumetric cnns for seman-
tic part segmentation of 3d shapes. CoRR, abs/1809.00226,
2018. 2
[35] J. Wu, C. Zhang, X. Zhang, Z. Zhang, W. T. Freeman, and
J. B. Tenenbaum. Learning shape priors for single-view 3d
completion and reconstruction. European Conference on
Computer Vision (ECCV), 2018. 1
[36] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and
J. Xiao. 3d shapenets: A deep representation for volumetric
shapes. In 2015 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), pages 1912–1920, June 2015.
2
[37] Y. Yang, C. Feng, Y. Shen, and D. Tian. FoldingNet: In-
terpretable Unsupervised Learning on 3D Point Clouds. In
IEEE Conference on Computer Vision and Pattern Recogni-
tion (CVPR), 2018. 1, 2, 3, 7, 8
[38] W. Yuan and D. Held. PCN : Point Completion Network.
International Conference on 3D Vision (3DV), 2018. 1, 2, 3,
6, 7, 8
392