0% found this document useful (0 votes)
18 views

LAKe-Net Topology-Aware Point Cloud Completion by Localizing Aligned Keypoints

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

LAKe-Net Topology-Aware Point Cloud Completion by Localizing Aligned Keypoints

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

LAKe-Net: Topology-Aware Point Cloud Completion


by Localizing Aligned Keypoints

Junshu Tang1 , Zhijun Gong1 , Ran Yi1 *, Yuan Xie2 , Lizhuang Ma1,2 *
1
Shanghai Jiao Tong University, 2 East China Normal University
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) | 978-1-6654-6946-3/22/$31.00 ©2022 IEEE | DOI: 10.1109/CVPR52688.2022.00177

{tangjs, gongzhijun, ranyi}@sjtu.edu.cn, [email protected], [email protected]

Abstract (a)

Point cloud completion aims at completing geometric


and topological shapes from a partial observation. How-
ever, some topology of the original shape is missing, exist- GRNet PMP-Net PoinTr Snowflake Ground Truth
ing methods directly predict the location of complete points, (b) Topology
without predicting structured and topological information
of the complete shape, which leads to inferior performance.
To better tackle the missing topology part, we propose
LAKe-Net, a novel topology-aware point cloud comple-
Partial localize Surface- Fine
tion model by localizing aligned keypoints, with a novel generate assist
Input Keypoints skeleton Output
Keypoints-Skeleton-Shape prediction manner. Specifically,
our method completes missing topology using three steps: (c)

1) Aligned Keypoint Localization. An asymmetric keypoint


locator, including an unsupervised multi-scale keypoint de-
tector and a complete keypoint generator, is proposed for lo-
calizing aligned keypoints from complete and partial point
clouds. We theoretically prove that the detector can cap- Figure 1. Illustration of (a) visual comparison results of current
completion methods, (b) completion process of our LAKe-Net,
ture aligned keypoints for objects within a sub-category.
and (c) aligned keypoints. Compared with GRNet [34], PMP-
2) Surface-skeleton Generation. A new type of skeleton,
Net [29], PoinTr [40] and Snowflake [32], LAKe-Net can effec-
named Surface-skeleton, is generated from keypoints based tively recover missing topology part (see Red ellipses).
on geometric priors to fully represent the topological infor-
mation captured from keypoints and better recover the local ing regions from partial observations, and shows its unique
details. 3) Shape Refinement. We design a refinement sub- significance in many fundamental applications.
net where multi-scale surface-skeletons are fed into each Recent works [7, 18, 26, 29, 32, 34, 35, 40, 41] for point
recursive skeleton-assisted refinement module to assist the cloud completion successfully utilized deep-learning meth-
completion process. Experimental results show that our ods and achieved more plausible and flexible results com-
method achieves the state-of-the-art performance on point pared with traditional geometric-based methods [8, 19, 23]
cloud completion. and alignment-based methods [11, 12, 16]. However, most
existing methods directly predict the location of complete
1. Introduction points, without predicting structured and topological infor-
The geometry and vision community has put huge ef- mation of the complete shape, which leads to coarse results
fort into point cloud processing, which is challenging due to in missing regions (see Figure 1(a)).
the unordered, unstructured characteristics, and complex se- Inspired by typical geometric modeling theory that a
mantics of the point clouds. However, in real applications, complete 3D object includes geometry and topology, e.g.,
occlusions and insufficient lighting lead to partial scans of coordinates and connectivity, we tend to predict both ge-
real shapes and degrade the performance of subsequent pro- ometric and structured topological information for point
cessing. Point cloud completion focuses on predicting miss- cloud completion, including keypoints and generated skele-
ton. To this end, we propose a novel Keypoints-Skeleton-
* Corresponding authors. Shape prediction manner, including three steps: keypoint

978-1-6654-6946-3/22/$31.00 ©2022 IEEE 1716


DOI 10.1109/CVPR52688.2022.00177

Authorized licensed use limited to: Tianjin University of Technology. Downloaded on April 26,2023 at 02:38:08 UTC from IEEE Xplore. Restrictions apply.
localization, skeleton generation, and shape refinement. Keypoints and Surface-skeleton, with a new Keypoints-
We firstly introduce the keypoint localization. Different Skeleton-Shape prediction manner. (2) We introduce an
from down-sampled points, keypoints are evenly distributed asymmetric keypoint locator including an unsupervised
across semantic parts of the shape, and are considered as multi-scale keypoint detector and a complete keypoint gen-
a crucial representation of geometric structure which are erator, which can capture accurate keypoints for complete
widely-used in many vision applications [17,27,37]. There- and partial objects in multiple categories, respectively. We
fore, we hold the belief that once the keypoints and their theoretically prove that our detector detects aligned key-
connectivity are correctly localized, the entire geometry is points within each sub-category. (3) We conduct point
determined. To this end, we wish to localize complete key- cloud completion experiments on two datasets, PCN and
points according to partial inputs under the supervision of ShapeNet55. Experimental results show that our LAKe-Net
the ground truth keypoints. However, obtaining keypoints achieves the state-of-the-art performance on both datasets.
annotation for a large number of 3D data is difficult and
expensive, so we propose an asymmetric keypoint loca- 2. Related Works
tor including an unsupervised multi-scale keypoint detec-
Point Cloud Completion. Point cloud completion focuses
tor (UMKD) and a complete keypoint generator (CKG) for
on predicting missing shapes from partial point cloud input.
complete and partial point cloud, respectively. For UMKD,
Recently, inspired by point cloud analysis approaches [20,
we extract Aligned Keypoints, which means the order of
21], PCN [41] first adopts an encoder-decoder architec-
keypoints are the same on different objects within a cer-
ture and a coarse-to-fine manner to generate the complete
tain category (Figure 1(c)), so as to represent more stable
shape. Several works [15, 26, 32, 43] follow this practice
and richer information, and provide stronger supervision for
and make modifications in network structure to obtain bet-
predicting complete keypoints from partial inputs by CKG.
ter performance. SA-Net [28] further extends the decod-
Since discrete and sparse keypoints are not enough for ing process into multiple stages by introducing hierarchi-
representing the whole objects, we leverage skeleton to cal folding. More recently, PoinTr [40] reformulates point
better represent topological details. Inspired by existing cloud completion as a set-to-set translation problem and de-
skeleton extraction methods [1, 5, 14], we propose a novel signs a new transformer-based encoder-decoder for point
Surface-skeleton, which is generated from keypoints based cloud completion. However, these methods mostly predict
on geometric priors. Compared with other types of skele- the location of complete points without predicting struc-
tons, our surface-skeleton is a mixture of curves and trian- tured and topological information, which leads to coarse
gle surfaces, and can represent more complex shape infor- results in missing regions. SK-PCN [18] is the most rele-
mation. We integrate surface-skeletons with different fine- vant work to ours which pre-processes the dataset and uses
ness generated by multi-scale keypoints into the shape re- meso-skeletons as supervision. However, SK-PCN doesn’t
finement step to recover finer results. Specifically, we pro- predict the structured and topological information of orig-
pose a folding-based refinement subnet including three re- inal shape. Our proposed LAKe-Net utilizes aligned key-
cursive skeleton-assisted refinement modules (RSR) follow- points and corresponding surface-skeleton which can cap-
ing some of other completion methods [28, 32]. ture the shared topological information as an assistant for
In general, we propose LAKe-Net, a novel topology- completion, and obtains better performance.
aware point cloud completion model by localizing aligned Skeleton Representation. Skeleton representation is
keypoints. The whole pipeline includes four parts: auto- widely-used in motion recognition [38, 39], human pose es-
encoder, asymmetric keypoints locator, surface-skeleton timation [2, 22] and human reconstruction [4, 42]. Jiang
generator and the refinement subnet. We leverage pairs of et al. [10] propose to incorporate skeleton awareness into
complete and partial point cloud during training. In detail, the deep learning-based regression for 3D human shape
the input point clouds (either complete or partial) are firstly reconstruction from point clouds. Tang et al. [25] uti-
fed into an auto-encoder to learn a feature embedding space lize topology preservation property of skeleton to per-
and generate coarse and complete results. Then, we localize form 3D surface reconstruction from a single RGB image.
multi-scale keypoints using asymmetric keypoints locator P2P-Net [36] learns bidirectional geometric transforma-
and generate corresponding surface-skeletons. The multi- tions between point-based shape representations from two
scale structures are fed into the refinement subnet to gener- domains, surface-skeletons and surfaces. Our method de-
ate fine output. The training process includes two stages for signs surface-skeleton representations generated by multi-
point cloud reconstruction and completion, respectively. scale keypoints in a more fine-grained manner to progres-
Overall, we summarize our main contributions as fol- sively aid point cloud completion.
lows: (1) We propose LAKe-Net, a novel topology-aware Unsupervised keypoint detection. While most hand-
point cloud completion model that utilizes a structured rep- crafted 3D keypoint detectors fail to detect accurate and
resentation of the surface as assistance, including Aligned well-aligned keypoints in complex objects, Li et al. [13]

1717

Authorized licensed use limited to: Tianjin University of Technology. Downloaded on April 26,2023 at 02:38:08 UTC from IEEE Xplore. Restrictions apply.
Auto-encoder 𝑬𝑬𝒊𝒊
1 Reconstruction Refinement Subnet 𝑹𝑹
Auto-encoder 𝑬𝑬𝟏𝟏

ConvTranspose
PCN

Resconv
Resconv

Resconv
Max

MLP

MLP
Encoder
PCN
𝑐𝑐 Coarse 𝑿𝑿/𝓧𝓧 𝑿𝑿𝒄𝒄 /𝓧𝓧𝒄𝒄
Encoder Generator
𝑹𝑹𝟏𝟏 𝑹𝑹𝟐𝟐 𝑹𝑹𝟑𝟑 gt fine Point-wise
Feature Global
Feature
Global
Feature
𝐗𝐗 𝐗𝐗𝐜𝐜 𝐗𝐗𝐟𝐟 Complete Keypoints Generator
Reference
ConvTranspose Repeat
𝑐𝑐̂
UMKD [1, 𝐷𝐷]
kp2

Resconv

𝑷𝑷i

MLP

MLP
𝐏𝐏𝟏𝟏 Surface-
...
Asymmetric 𝐏𝐏𝟑𝟑 𝐏𝐏𝟐𝟐 𝓢𝓢𝟏𝟏 𝓢𝓢𝟐𝟐 𝓢𝓢𝟑𝟑 [𝐾𝐾𝑖𝑖 , 3]
skeleton

FFS
Keypoint Locator 𝑓𝑓̂
Generation [𝑁𝑁p , 𝐷𝐷] [𝐾𝐾𝑖𝑖 , 𝐷𝐷] [𝐾𝐾𝑖𝑖 , 𝐹𝐹] [𝐾𝐾𝑖𝑖 , 𝐷𝐷]

Recursive Skeleton-assisted Refinement Module 𝑹𝑹𝒊𝒊

� 𝟑𝟑
𝑷𝑷 � 𝟐𝟐 � 𝟏𝟏
𝑷𝑷 �𝟏𝟏
𝓢𝓢 �𝟐𝟐
𝓢𝓢 �𝟑𝟑
𝓢𝓢 𝒄𝒄𝒊𝒊 /�𝒄𝒄𝒊𝒊 Max& Repeat Max
𝑷𝑷 Reference
Repeat
𝒄𝒄𝒊𝒊+𝟏𝟏 /�𝒄𝒄𝒊𝒊+𝟏𝟏
[1, 𝐷𝐷𝑖𝑖 ]

ConvTranspose
�𝑖𝑖 , 3]
[𝑁𝑁 [1, 𝐷𝐷𝑖𝑖+1 ]
Point-wise
... ...
C

Resconv
𝑿𝑿𝒊𝒊
Feature
𝑓𝑓̂

MLP

MLP

MLP
[𝑁𝑁𝑖𝑖 , 3]
𝑿𝑿𝒊𝒊+𝟏𝟏
PCN [𝑁𝑁𝑖𝑖+1 , 3]
Encoder 𝑹𝑹𝟏𝟏 𝑹𝑹𝟐𝟐 𝑹𝑹𝟑𝟑 � 𝒊𝒊
𝓢𝓢𝒊𝒊 /𝓢𝓢
Coarse �𝑖𝑖 , 𝐻𝐻] [𝑁𝑁
�𝑖𝑖 , 𝐻𝐻] [𝑁𝑁
[𝑁𝑁 �𝑖𝑖 , 𝐷𝐷𝑖𝑖 ] Upsample
[𝑆𝑆𝑖𝑖 , 3]
Generator Repeat
𝑐𝑐̂
Global
𝓧𝓧 Feature 𝓧𝓧𝒄𝒄 𝓧𝓧𝒄𝒄 𝓧𝓧𝒇𝒇 Farthest
Keypoint Element-wise
Auto-encoder 𝑬𝑬𝟐𝟐 Refinement Subnet 𝑹𝑹 FFS Feature C Concat
2 Completion Loss Sampling Add

Figure 2. The overall architecture of LAKe-Net, which consists of two parts including Point Cloud Reconstruction (Blue) and Point cloud
Completion (Red). We show the detailed structure of (a) Auto-encoder E, (b) Complete Keypoint Generator G and (c) Recursive Skeleton-
assisted Refinement module R on the right side. PCN encoder is firstly proposed in [41]. UMKD denotes the unsupervised multi-scale
keypoint detector. Surface-skeletons Si and Ŝi are generated by Pi and P̂i , respectively.

propose the first learning-based 3D keypoint detector USIP. Unsupervised Multi-scale Keypoints Detector 𝑫𝑫
Category-inviriant Convex Encoder Category-specific Offset Predictors
However, the detected keypoints are neither ordered nor se- 𝑶𝑶𝟏𝟏𝟏𝟏 𝑶𝑶𝟏𝟏𝟏𝟏 𝑶𝑶𝟏𝟏𝓠𝓠 Output 𝐏𝐏𝟏𝟏
mantically salient. Fernandez et al. [6] utilize symmetry Input 𝐗𝐗 PointNet++ 𝑾𝑾1𝑡𝑡

FC
FC 𝑾𝑾1

FC
𝑵𝑵𝒄𝒄 × 𝟑𝟑
prior in point clouds to capture keypoints in an unsuper- 𝑶𝑶𝟐𝟐𝟏𝟏 𝑶𝑶𝟐𝟐𝟐𝟐 𝑶𝑶𝟐𝟐𝓠𝓠 Output 𝐏𝐏𝟐𝟐
𝑾𝑾2𝑡𝑡
vised manner. Recently, Jakab et al. [9] further explore the 𝓠𝓠 𝝎𝝎 𝑾𝑾2
FC

Category Predicted
...

...

𝑶𝑶𝟑𝟑𝟏𝟏 𝑶𝑶𝟑𝟑𝟐𝟐 𝑲𝑲𝟑𝟑 × 𝑲𝑲𝟑𝟑 𝑶𝑶𝟑𝟑𝓠𝓠


application of unsupervised keypoints in shape deformation label label 𝑾𝑾3𝑡𝑡
Output 𝐏𝐏𝟑𝟑
𝑲𝑲𝟑𝟑 × 𝑵𝑵𝒄𝒄 𝑲𝑲𝟑𝟑 × 𝟑𝟑
task. SkeletonMerger [24] proposes a novel keypoint detec- 𝑾𝑾3
Classification Loss 𝑲𝑲𝟑𝟑 × 𝑵𝑵𝒄𝒄
𝐏𝐏𝟑𝟑 = 𝑾𝑾 � 𝐗𝐗 = (𝑾𝑾𝟑𝟑𝒕𝒕 + 𝑶𝑶𝟑𝟑𝟑𝟑 (𝑾𝑾𝟑𝟑𝒕𝒕 )) � 𝐗𝐗
tor based on an autoencoder architecture. However, differ-
Linear Layers Element-wise Add Matrix Multiply Select Softmax
ent models need to be trained for different categories. We
propose UMKD, an unsupervised multi-scale keypoint de-
Figure 3. The detailed structure of our proposed UMKD. Take P3
tector which can capture keypoints for objects in multiple
as an example, we show the calculation process and dimensions of
categories. We find that it reaches the best performance on relative tensors in Blue.
categories with shared topology on KeypointNet [37] and
produces more salient and semantic richer keypoints. {Pi }3i=1 , and an auto-encoder E1 , which maps the inputs
3. Proposed Method into a global feature space c and obtain coarse results Xc .
We propose a novel topology-aware point cloud com- Then, a surface-skeleton generation process is employed to
pletion network by localizing aligned keypoints (LAKe- leverage the topology information in keypoints to construct
Net), whose overall architecture is shown in Fig. 2. The a finer representation. Finally, a refinement subnet adopts
pipeline includes four parts: auto-encoder, asymmetric key- the topology information S and coarse results Xc to gen-
point locator, surface-skeleton generation and shape refine- erate high-resolution results Xf . The reconstruction part
ment. The training includes two stages: point cloud re- unsupervisedly trains the UMKD to learn aligned keypoints
construction and completion. The training data consists of for later completion stage and also provides a good initial-
pairs of complete and partial point clouds (X, X ), where ization of the network to predict complete shape.
X ∈ RNc ×3 and X ∈ RNp ×3 denote the coordinates of In the second stage, we fix the weights of keypoint detec-
complete and partial point clouds, Nc and Np denote the tor D and auto-encoder E1 , then input partial data X into
number of points of complete and partial data, respectively. a new auto-encoder E2 and predict complete coarse results
Firstly, see upper part of Fig. 2, we utilize complete data Xc . It is noteworthy that the auto-encoder E1 and E2 have
X as input to train an unsupervised multi-scale keypoint de- the same architecture which includes a PCN encoder [41]
tector (UMKD) D, which extracts the multi-scale keypoints and coarse point generator. The encoder of E2 embeds in-

1718

Authorized licensed use limited to: Tianjin University of Technology. Downloaded on April 26,2023 at 02:38:08 UTC from IEEE Xplore. Restrictions apply.
put points into point-wise local features fˆ ∈ RNp ×d , where a) Topology Prior c) Edge Interpolation
d denotes the dimension of feature embedding. We consider Adjacency Matrix A

the maximum value ĉ = maxNp (fˆ) ∈ R1×d as global fea- Keypoints 𝑷𝑷 1 ⋯


⋮ ⋱
0

b) Recovery Prior 0 ⋯ 1
tures. Then we fuse the local and global features and feed d) Surface Interpolation

the fused feature into a complete keypoint generator (CKG) Surface-skeleton 𝓢𝓢


Reference 𝑿𝑿
G to predict multi-scale keypoints {P̂i }3i=1 , supervised by
Figure 4. The surface-skeleton generation operation.
keypoints {Pi }3i=1 detected in the first stage. At last, we
send Xc and interpolated surface-skeleton Ŝ to a skeleton- the final convex weight W3 = W3t + W3o and normalize W3
assisted refinement subnet R to generate fine outputs Xf . by a softmax function and predict keypoint P3 .
We describe the technical details of our proposed modules The keypoints extracted by UMKD D follow a theory
and training losses in the following sections. that: The coordinates of detected keypoints P are irrele-
vant to the order of original points X. That is: P = D(X) =
3.1. Unsupervised Multi-scale Keypoint Detector D(R(X)), where R(·) denotes random permutation oper-
Given a set of Nc input complete point clouds X = ation. The theoretical proof is introduced in Supplementary
{xj |j = 1, · · · , Nc } ∈ RNc ×3 , our aim is to pre- Materials. Therefore, the detected keypoints are aligned
dict {Ki }3i=1 numbers of keypoints {Pi }3i=1 = {pk |k = among objects with shared topology within a sub-category
1, · · · , Ki } ∈ RKi ×3 . Specifically, we tend to predict con- (shown in Figure 1(c)).
vex combination weights Wi = {wij } ∈ RNc ×Ki of point
clouds instead of predicting the coordinates of keypoints di- 3.2. Surface-skeleton Generation
rectly to avoid deviating from the original shape. So the After extracting the keypoints of original point clouds,
predicted keypoints Pi are derived by: our aim is to reconstruct the point cloud according to the ex-
tracted keypoints. We consider skeleton as an intermediate
\small \begin {aligned} \textbf {P}_i=\textbf {W}_{i}^{T}\textbf {X}= \sum ^{N_c}_{j=1}w_{ij}^{T}x_{j}, s.t., w_{ij}>0, \sum ^{N_c}_{j=1}w_{ij}=1. \end {aligned} (1) representation between keypoints and original point clouds.
Previous methods like SkeletonMerger [24] used skeletons
connected by every pair of keypoints, which leads to high
To predict convex weight Wi using a single model for computational complexity and a lot of invalid points. More-
all categories which adapts to our pipeline, we assume over, given a surface, previous skeletons either are located
Wi = Wit + Wio , where Wit denotes a category-invariant near the medial axis of the surface [5], or are mostly located
template weight and Wio denotes a category-specific weight outside the surface [24]. We aim at extracting a skeleton
offset. Besides, we expect to predict multi-scale keypoints that are located near the surface, so that it can better as-
for subsequent tasks. To this end, we propose an Unsuper- sist the later refinement process. In order to represent both
vised Multi-scale Keypoint Detector (UMKD) which con- topological and geometric information of a complex shape,
sists of a category-invariant convex encoder and category- we design a surface-skeleton structure which is generated
specific offset predictors. The detailed structure of UMKD by keypoints and consists of a mixture of curves and trian-
is shown in Figure 3. In detail, we apply a PointNet++ [21] gle surfaces adapted to the local 3D geometry. Specifically,
as a backbone encoder to extract local and global features. given predicted keypoints P ∈ RK×3 and reference points
It includes four set abstraction layers to group and down- X, we follow the two shape priors proposed in [14] and
sample input points. Then the global and local features are generate the skeletal graph: (1) the topology prior that each
propagated back to each partial point. Then we input point- node has links to its top-2 nearest nodes; (2) the recovery
wise features into three fully connected blocks and extract prior that two keypoints are linked if they are two nearest
multi-scale convex features Wit ∈ RKi ×Nc progressively. keypoints of a reference point.
As for predicting category-specific offset Wio , we firstly After skeletal graph generation, we can get an adjacency
predict the category label ω for input shape and send the matrix A ∈ RK×K . We propose a surface interpolation
Wit to the relative offset predictor Oiω . We add two fully- strategy based on Delaunay-based triangulation region. The
connected layers after the last pointset abstraction layer in whole interpolation process consists of two steps: edge in-
PointNet++ as classification head to predict the category la- terpolation and triangle surface interpolation. Given a cer-
bel of every input geometries. Oij ∈ RKi ×Ki is a learnable tain graph, we firstly interpolate points into each connected
matrix. We set j ∈ [1, Q]. Q denotes the number of cat- edge. Next, we detect every triangle surface according to
egories. Take P3 as an example (shown in Figure 3), the the skeletal graph. Then we insert points into the triangle
input points X ∈ RNc ×3 are send to the convex encoder and region. The number of interpolated points is proportional
output template convex weights W3t ∈ RK3 ×Nc and pre- to the triangle area. The overall skeletal graph generation
dicted label ω = 2. Then W3t are input to the selected offset and surface interpolation process is shown in Fig 4. It is
predictor O32 and output W3o ∈ RK3 ×Nc . At last, we get noteworthy that we utilize the complete point clouds X as

1719

Authorized licensed use limited to: Tianjin University of Technology. Downloaded on April 26,2023 at 02:38:08 UTC from IEEE Xplore. Restrictions apply.
reference shape for reconstruction while using the coarse reconstruction. Firstly, in order to encourage the detected
results Xc for completion. keypoints P to be well-distributed and not deviate from the
Overall, we get the surface-skeleton S. Increasing global shape, we calculate the Chamfer Distance (CD) loss
the number of keypoints leads to more complex surface- between the predicted keypoints and sparse point clouds
skeletons which can represent finer shape details. We utilize X∗ downsampled from input data using FPS strategy. As
the structure as topology representation of geometric shape for training one detector within several categories, we also
which is crucial for reconstruction and completion. train a classification head. We denote the predicted output
is ω and certain category label is σ. We train a criterion
3.3. Complete Keypoint Generator loss Lcls . Besides, as mentioned in Sec. 3.2, we expect the
In the second stage, given the partial input X ∈ RNp ×3 , surface-skeleton can reconstruct the geometric shape of the
the aim of our proposed Complete Keypoint Generator ground truth. We calculate CD between multi-scale surface-
(CKG) module is to predict multi-scale complete keypoints skeletons {Si }3i=1 and the ground truth X. So the overall
from partial feature embedding. To this end, we utilize the loss for training keypoint detector is:
local and global feature fˆ and ĉ extracted by the encoder
\small \begin {aligned} \mathcal {L}_{CD} = \frac {1}{|X|}\sum _{x\in X}\min _{y\in Y}||x-y||^{2} + \frac {1}{|Y|}\sum _{y\in Y}\min _{x\in X}||y-x||^{2}, \end {aligned} \label {eq:cd} \vspace {-3mm}
in E2 as input. Similar to Farthest Point Sampling (FPS)
strategy for point cloud, in order to downsample the point (2)
features, we use Farthest Feature Sampling (FFS) strategy
to down sample point-wise local feature fˆ to sparse feature \small \begin {aligned} \mathcal {L}_{cls} = -\sum ^{\mathcal {Q}}_{i=1}(\sigma _{i}log\omega _i + (1-\sigma _{i})log(1-\omega _{i})), \end {aligned} (3)
fˆ∗ where we replace point coordinates in FPS by feature
embeddings. The number of sampling is the same as pre-
\small \begin {aligned} \mathcal {L}_{kp} = \mathcal {L}_{CD}(\textbf {P}, \textbf {X}^{*}) + \sum ^{3}_{i=1}\mathcal {L}_{CD}(\mathcal {S}_i, \textbf {X})+ \mathcal {L}_{cls}. \end {aligned} (4)
dicted keypoints. Then we utilize a de-convolution layer to
upsample the global feature ĉ and fuse them into a resid- At last, we calculates CD between the ground truth X
ual block and predict final keypoints. We train three similar and sparse output Xc , dense output Xf , respectively.
blocks for multi-scale keypoints prediction. Then we gen-
erate corresponding surface-skeleton Ŝ by surface interpo- \small \begin {aligned} \mathcal {L}_{rec} = \mathcal {L}_{CD}(\textbf {X}_c, \textbf {X}) + \mathcal {L}_{CD}(\textbf {X}_f, \textbf {X}). \end {aligned} (5)
lation introduced on Sec. 3.2. In general, the overall training loss in the first stage is:
3.4. Recursive Skeleton-assisted Refinement \small \begin {aligned} \mathcal {L}_{1} = \mathcal {L}_{rec} + \lambda _{kp}^{1}\mathcal {L}_{kp}, \end {aligned} (6)

Our proposed shape refinement subnet R includes three where λ1kp


denotes hyper-parameters to balance inference.
recursive skeleton-assisted refinement (RSR) modules that Point Cloud Completion. In the second stage, we fix the
aim to integrate multi-scale surface-skeletons and coarse weights of UMKD D and auto-encoder E1 , and train a new
output from previous auto-encoder to predict finer geo- auto-encoder E2 and CKG G. The refinement subnet R
metric details in a recursive way. The detailed design of pre-trained before continues to be optimized. We constrain
the module is shown in Fig 2. It follows existing meth- keypoints prediction using absolute distance between pre-
ods [28, 32, 35] using a coarse-to-fine strategy to learn dicted keypoints P̂ and ground truth keypoints P:
the offset of integrated points. Specifically, we progres-
sively concatenate coarse point clouds obtained from pre- \small \begin {aligned} \mathcal {L}^{c}_{kp}=\sum ^{3}_{i=1}\sum ^{K_i}_{j=1}||p_{ij} - \hat {p}_{ij}||^{2}. \end {aligned} (7)
vious steps and surface-skeleton generated by correspond-
ing keypoints described in Sec. 3.2. We denote the input Same as other concurrent network [31], we align global
coarse points as Xi−1 = {xj }j=1
Ni−1
and surface-skeleton features c and ĉ ∈ R1×d encoded by auto-encoders in two
S
i−1 stages for hidden feature space learning:
as Si−1 = {pj }j=1 . The integrated points X̂i−1 =
concat(Xi−1 , Si−1 ) on the i-th step where concat(·) refers
\small \begin {aligned} \mathcal {L}_{feat} =\frac {1}{d} \sum ^{d}_{i=1}||c_{i}-\hat {c}_{i}||^{2}. \end {aligned} (8)
to concatenate operation. Ni−1 and Si−1 denote numbers
of coarse points and surface-skeleton, respectively. In this
paper, we set Ni = Si in each step. Therefore, the up- As for typical training on completion task, we follow the
dated points output by the i-th RSR module Xi = X̂i−1 + coarse-to-fine process in the first stage. The coarse output
R(X̂i−1 ) will be sent to the next step. Xc and fine output Xf are optimized using CD loss:
\small \begin {aligned} \mathcal {L}_{com} = \mathcal {L}_{CD}(\mathcal {X}_c, \textbf {X}) + \mathcal {L}_{CD}(\mathcal {X}_f, \textbf {X}). \end {aligned} (9)
3.5. Training and Losses
In the summary, the full objective of point cloud comple-
Point Cloud Reconstruction. In the first stage, the UMKD tion in the second stage is:
D, auto-encoder E1 and refinement subnet R are trained
\small \begin {aligned} \mathcal {L}_{2} = \mathcal {L}_{com} + \lambda _{kp}^{2}\mathcal {L}_{kp}^{c} + \lambda _{feat}\mathcal {L}_{feat}, \end {aligned} (10)
together. The training losses are divided into two parts,
one is to constrain the keypoint detection, the other is data where (λ2kp , λf eat ) denote hyper-parameters.

1720

Authorized licensed use limited to: Tianjin University of Technology. Downloaded on April 26,2023 at 02:38:08 UTC from IEEE Xplore. Restrictions apply.
4. Experiments
Input
4.1. Dataset Setting and Evaluation metric
PCN: The PCN dataset is a widely-used benchmark for GRNet
point cloud completion, which is created by [41], includ-
ing different objects from 8 categories: plane, cabinet, car,
chair, lamp, sofa, table, and vessel. The training set contains Spare
28,974 objects, while validation and test set contains 800
and 1,200 objects, respectively. The complete point cloud
PMP
consists of 16,384 points which are uniformly sampled on
the original CAD model. Partial point cloud, consisting of
2,048 points, is created by back-projecting 2.5D depth im- PoinTr
ages into 3D from 8 random viewpoints.
ShapeNet55: To explore the performance of our method
Snow
on a large number of categories, we evaluate our method on
all 55 categories of ShapeNet [3], named ShapeNet55. The
ShapeNet55 dataset was first created by PoinTr [40]. The Ours
training set contains 41,952 objects, while test set contains
10,518 objects. We randomly sample 80% objects in each Ground
category to form training set and use the rest 20% to form Truth
validation set.
Evaluation Metrics: We utilize two evaluation metrics
between output point cloud and the ground truth, Cham- Figure 5. Visualization of point cloud completion comparison re-
sults on PCN dataset with other recent methods.
fer Distance (CD) using L2 norm and Earth Mover’s Dis-
tance (EMD), following most of the methods on PCN and
plement other methods using their open source code and
ShapeNet55 test set. CD is introduced in Equation 2 and
hyper-parameters for fair comparison. Table 2 and 3 show
EMD is defined as:
the quantitative comparison results of our method and other
\small \begin {aligned} EMD(X, Y) = \min _{\phi : X\to Y}\frac {1}{|X|}\sum _{x\in X} ||x-\phi (x)||_{2}, \end {aligned} (11) point cloud completion methods on PCN datasets, from
which we can see that our method achieves the best perfor-
where ϕ is a bijection. It is noteworthy that we compute mance over all counterparts on both CD and EMD metrics.
these metrics using 16,384 and 8,192 points for PCN and Specifically, compared with the second-ranked Snowflake
ShapeNet55, respectively. which also proposed progressive decoding modules, our
method has better performance with the help of aligned key-
4.2. Implementation Details points and surface-skeletons. Besides, according to exper-
The whole training of LAKe-Net is a two-stage process: imental results, our proposed LAKe-Net is more powerful
point cloud reconstruction and point cloud completion. The to predict symmetrical geometries and their topology infor-
input of the first stage (reconstruction) is a set of complete mation compared with SnowflakeNet.
point clouds with coordinates and object category labels Moreover, we also show the visualization of qualitative
from training set of all datasets. We train the keypoint de- comparison results and some recent methods in Figure 5,
tection for 60 epochs and progressively extract 256, 128, which show that our method has better performance on
64 keypoints. The refinement subnet includes three RSR completing missing topology. Specifically, methods which
modules, the up factors of de-convolution are [1,1,2]. For also utilize progressive coarse-to-fine decoding like PMP-
the second stage (the bottom completion branch of Fig. 2), Net and SnowflakeNet, tended to predict coarse missing
we only input partial point clouds from training set with its shape and generate scattered points, especially for geom-
coordinate information. We utilize Adam optimization to etry with a plane or surface. Other methods like GRNet,
train the whole architecture of point cloud completion for SpareNet and PoinTr are weak on recovering the local de-
100 epochs with batchsize 64 and learning rate 0.001. The tails and some missing topology like table legs. Our method
hyper-parmeters λ1kp = λ2kp = 10, λf eat = 1000. The can predict geometries with more clear topology structure
inference time of our method is 34.5ms per sample. and fewer noises.
4.3. Results on PCN dataset. 4.4. Results on ShapeNet55 dataset
We compare the performance of our proposed LAKe- Moreover, to evaluate the generalization and powerful
Net and other state-of-the-art completion methods. We im- of our method on a large account of categories of data

1721

Authorized licensed use limited to: Tianjin University of Technology. Downloaded on April 26,2023 at 02:38:08 UTC from IEEE Xplore. Restrictions apply.
Category Bed Bench Bookshelf FileCabinet Faucet Telephone Can Flowerpot Tower Pillow Average
Metrics CD EMD CD EMD CD EMD CD EMD CD EMD CD EMD CD EMD CD EMD CD EMD CD EMD CD EMD
Folding [35] 3.17 73.6 1.45 50.1 2.48 64.4 1.94 65.3 3.19 66.2 0.69 39.1 1.76 60.2 4.11 82.9 1.83 59.5 1.64 63.2 2.06 60.2
PCN [41] 2.50 49.4 0.96 28.9 2.39 44.7 1.49 37.2 1.96 40.3 0.54 24.0 1.30 30.9 2.58 48.7 1.34 33.7 1.09 31.5 1.36 34.0
GRNet [34] 0.93 29.7 0.86 25.8 0.93 29.6 1.57 24.9 0.83 27.6 0.87 26.2 1.15 32.3 1.24 33.5 0.87 25.3 1.06 28.8 1.15 28.2
PoinTr [40] 2.18 37.6 0.93 21.4 1.86 37.1 3.23 42.7 1.75 42.4 0.55 20.8 2.13 31.2 2.68 42.7 1.73 35.9 1.40 31.8 1.70 31.7
Ours 0.72 28.4 0.71 18.2 0.89 29.7 0.97 16.4 0.34 20.5 0.48 20.8 0.63 29.0 1.19 35.9 0.60 21.9 0.97 29.8 0.89 31.0

Table 1. Quantitative comparison results with other completion methods on ShapeNet55 dataset using CD-l2 (×103 ) and EMD(×103 )
metrics. We report the detailed results for each method on 10 sampled categories and overall average results on all 55 categories.
Input Keypoints Surface-Skeleton Output GroundTruth
CD-l2 (×104 ) Airplane Cabinet Car Chair Lamp Sofa Table Vessel Average
Folding [35] 3.151 7.943 4.676 9.225 9.234 8.895 6.691 7.325 7.142
PCN [41] 1.400 4.450 2.445 4.838 6.238 5.129 3.569 4.062 4.016
AtlasNet [7] 1.753 5.101 3.237 5.226 6.342 5.990 4.359 4.177 4.523
MSN [15] 1.543 7.249 4.711 4.539 6.479 5.894 3.797 3.853 4.758
GRNet [34] 1.531 3.620 2.752 2.945 2.649 3.613 2.552 2.122 2.723
PMP-Net [29] 1.205 4.189 2.878 3.495 2.178 4.267 2.921 1.894 2.878
SpareNet [33] 1.756 6.635 3.614 6.163 6.313 7.893 4.987 3.835 5.149
PointTr [40] 0.993 4.809 2.529 3.683 3.077 6.535 3.103 2.029 3.345
Snowflake [32] 0.913 3.322 2.246 2.642 1.898 3.966 2.011 1.692 2.336
Ours 0.646 2.594 1.743 2.149 2.759 2.186 1.876 1.602 1.944

Table 2. Quantitative comparison results with other methods of


point cloud completion on PCN using CD-l2 (lower is better).
Figure 6. Visualization of completion on ShapeNet55 dataset by
EMD(×102 ) Airplane Cabinet Car Chair Lamp Sofa Table Vessel Average our proposed method. We also show the predicted keypoints and
Folding [35] 1.682 2.576 2.183 2.847 3.062 3.003 2.500 2.357 2.526 generated surface-skeletons on the second and third columns.
PCN [41] 2.426 1.888 2.744 2.200 2.383 2.062 1.242 2.208 2.144
AtlasNet [7] 1.324 2.582 2.085 2.442 2.718 2.829 2.160 2.114 2.282
MSN [15] 1.334 2.251 2.062 2.346 2.449 2.712 1.977 2.001 2.142
5. Method Analysis
GRNet [34] 1.376 2.128 1.918 2.127 2.150 2.468 1.852 1.876 1.987 In this section, we examine the effectiveness of our mo-
PMP-Net [29] 1.259 2.058 2.520 1.798 1.280 2.579 1.651 1.760 1.863
SpareNet [33] 1.131 2.014 1.783 2.050 2.063 2.333 1.729 1.790 1.862 tivations in LAKe-Net. We conduct several ablation studies
PointTr [40] 0.938 1.986 1.851 1.892 1.740 2.242 1.931 1.532 1.764
Snowflake [32] 1.375 2.633 2.591 2.086 1.599 3.070 1.616 1.957 2.116
from different points of view. For fair comparison, all meth-
Ours 0.958 1.830 1.564 1.667 1.782 1.755 1.499 1.402 1.557
ods are trained and tested on PCN dataset for completion
and KeypointNet for keypoint detection.
Table 3. Quantitative comparison results with other methods of Unsupervised Muti-scale Keypoint Detector. To prove
point cloud completion on PCN using EMD (lower is better). the effectiveness and accuracy of our proposed UMKD, we
evaluate our extracted keypoints compared with two recent
to adapt to real-world scenarios, we conduct experiments unsupervised keypoint detectors Fernandez et al. [6] and
on ShapeNet55 dataset and compare with other completion SkeletonMerger [24]. All methods are trained and tested
methods. We drop 75% of the complete point cloud and re- on KeypointNet [37], which has keypoints annotations with
sample the remaining partial point clouds to 2,048 points as semantic correspondence labels. We evaluate these meth-
input for all methods. Table 1 shows the quantitative com- ods on five categories: airplane, car, chair, table and ves-
parison results on 10 sampled categories. The last column sel. Specifically, we detect 16, 32, 64 keypoints for all cat-
shows the overall average results of 55 categories. We can egories. As for other methods, we train five models and de-
see that our method achieves the best result on CD metric tect 16 keypoints for each category. We firstly down-sample
and have competitive results on EMD metric. Specifically, the same number of predict keypoints as the annotated key-
the results on Bed, Bench, Bookshelf, FileCabinet, Faucet, points using the nearest neighbor strategy.
which are similar as samples in PCN datasets or have shared For evaluation metrics, we follow SkeletonMerger and
topology within a sub-category, show that our method can utilize mean Intersection over Unions (mIoU) metrics to
recover geometries more efficiently using topology assis- evaluate the keypoint silence and accuracy. It is calculated
tant. Moreover, the results on other categories show that with a threshold of 0.1 using euclidean distance. The quan-
our method is more powerful in completing geometries with titative results are shown in Table 4, which illustrates that
regular and symmetrical contours, similar as Vessel in PCN our proposed keypoint detector trained on multiple cate-
dataset. We visualize the completion process of our method gories has competitive performance, or even better, than
on samples from ShapeNet55 in Figure 6. It can be seen that other unsupervised methods trained on a single category.
our method can localize effect keypoints and recover miss- The results on airplane, chair and table also show that our
ing topological and geometric information with the help of method has better performance on geometries with obvious
surface-skeletons. topological structures. The visualization is shown in Fig-

1722

Authorized licensed use limited to: Tianjin University of Technology. Downloaded on April 26,2023 at 02:38:08 UTC from IEEE Xplore. Restrictions apply.
(a) GroundTruth SkeletonMerger Ours-16 Ours-32 Ours-64
EMD(×102 ) Airplane Cabinet Car Chair Lamp Sofa Table Vessel Average
Ours 0.958 1.830 1.564 1.667 1.782 1.755 1.499 1.402 1.557
-use FPS 1.117 2.295 1.978 2.157 1.916 2.607 1.810 1.823 1.963
-w/o S-sk 1.469 2.638 2.386 2.380 2.221 2.989 1.906 2.020 2.251
PointDisturb 1.031 1.902 1.554 2.012 1.945 2.037 1.684 1.437 1.700
ClassDisturb 0.963 1.846 1.576 1.786 1.831 1.780 1.545 1.397 1.590

Table 5. Ablation studies and robustness test on PCN dataset using


EMD metrics. w/o S-sk denotes without surface-skeleton.
(b) Ours-32 without csop (c) FPS Points GroundTruth Curve-skeleton Meso-skeleton Surface-skeleton in Multi-scale

Figure 7. Visualization of (a) multi-scale keypoint detected by


ours and SkeletonMerger on KeypointNet dataset; (b) failed case
by our method without csop; (c) drawbacks of FPS 16 points. csop Figure 8. Visualization of different type of skeletons, including
denotes category-specific offset predictors. curve skeleton generated by [24], meso-skeleton from [30] and
our surface-skeleton in multi-scale.
mIoU Airplane Car Chair Table Vessel
CD loss between two unordered and sparse point clouds is
Fernandez et al. 69.7 50.5 51.2 49.3 53.5
SkeletonMerger 72.7 64.6 63.2 59.6 62.0
harder to be optimized to convergence.
We also visualize different types of skeletons com-
Ours-16 73.2 58.1 69.2 62.5 61.3
Ours-32 73.7 60.2 70.5 63.2 62.5 pared with our multi-scale surface-skeleton in Figure 8.
Ours-64 74.0 62.9 71.3 65.4 64.0 It can be seen that our surface-skeleton focuses on repre-
Ours-32 w/o csop 35.4 13.2 27.8 23.9 11.0
senting surfaces of original shape, and can get competi-
Table 4. Quantitative comparison results with other unsupervised tive performance with meso-skeletons detected by typical
keypoint detector on KeypointNet using mIoU (higher is better). method [30], and better than curve skeleton from [24].
Robustness Test. We also conduct ablation studies to
ure 7(a). Our detector can produce more salient and seman- investigate the robustness of our method in some extreme
tic richer keypoints. We also show our detected keypoints cases. We firstly randomly disturb 5% of the detected GT
in multi-scale which represent finer geometric details. keypoints with a threshold of 0.1 after the first stage. Sec-
Besides, to evaluate the effectiveness of our proposed ondly, we deliberately misclassify geometries with simiar
category-specific offset predictors (csop) in UMKD, we re- shapes: table, chair and sofa. The results are reported on
place all offset predictors with a single category invariant Table 5. Experimental results show that our method is ro-
offset predictor. The qualitative results is visualized in Fig- bust to errors in keypoints detection in the first stage.
ure 7(b) and quantitative results are shown in Table 4. It is
6. Conclusion
obvious that a single offset predictor cannot handle geome-
tries in multiple categories. The predicted keypoints tend to In this paper, we propose a novel topology-aware point
aggregate together to reduce the loss. cloud completion method, named LAKe-Net, which fo-
Keypoints and Surface-skeletons. To evaluate the ne- cuses on completing missing topology by localizing aligned
cessity of using aligned keypoints and surface-skeletons for keypoints, with a novel Keypoints-Skeleton-Shape pre-
point cloud completion, we conduct several ablated exper- diction manner, including aligned keypoints localization,
iments. We consider the auto-encoder E2 and refinement surface-skeleton generation and shape refinement. Exper-
subnet R in the second stage as baseline. In particular, we imental results show that our LAKe-Net achieves the state-
replace multi-scale complete keypoints detected in the first of-the-art performance on point cloud completion.
stage with multi-scale down-sampled points from ground 7. Acknowledgements
truth using FPS strategy. And we use CD loss between This work was supported by the National Key Research
down-sampled points and predicted keypoints in the sec- and Development Program of China (2019YFC1521104),
ond stage. We also remove the assist of generated surface- National Natural Science Foundation of China (72192821,
skeletons and change the up factors into [2,2,4] for fair 61972157, 62176092), Shanghai Municipal Science and
comparison. The quantitative results are illustrated in Ta- Technology Major Project (2021SHZDZX0102), Shang-
ble 5. We can see that the down-sampled points cannot hai Science and Technology Commission (21511101200,
represent efficient topology information (as shown in Fig- 22YF1420300, 21511100700), CAAI-Huawei MindSpore
ure 7(c)) especially on some joint parts, and are not helpful Open Fund, and Art major project of National Social Sci-
for completing missing geometries in our pipeline. Besides, ence Fund (I8ZD22).

1723

Authorized licensed use limited to: Tianjin University of Technology. Downloaded on April 26,2023 at 02:38:08 UTC from IEEE Xplore. Restrictions apply.
References clouds. In Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, pages 4277–
[1] J. Cao, A. Tagliasacchi, M. Olson, H. Zhang, and Z. Su. 4286, 2021. 2, 4
Point cloud skeletons via laplacian based contraction. In [15] M. Liu, L. Sheng, S. Yang, J. Shao, and S.-M. Hu. Morph-
2010 Shape Modeling International Conference, pages 187– ing and sampling network for dense point cloud completion.
197. IEEE, 2010. 2 In Proceedings of the AAAI conference on artificial intelli-
[2] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh. Realtime multi- gence, volume 34, pages 11596–11603, 2020. 2, 7
person 2d pose estimation using part affinity fields. In Pro-
[16] A. Martinovic and L. Van Gool. Bayesian grammar learn-
ceedings of the IEEE conference on computer vision and pat-
ing for inverse procedural modeling. In Proceedings of the
tern recognition, pages 7291–7299, 2017. 2
IEEE Conference on Computer Vision and Pattern Recogni-
[3] A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, tion, pages 201–208, 2013. 1
Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su,
[17] A. S. Mian, M. Bennamoun, and R. Owens. Three-
et al. Shapenet: An information-rich 3d model repository.
dimensional model-based object recognition and segmenta-
arXiv preprint arXiv:1512.03012, 2015. 6
tion in cluttered scenes. IEEE transactions on pattern anal-
[4] K.-L. Cheng, R.-F. Tong, M. Tang, J.-Y. Qian, and M. Sarkis.
ysis and machine intelligence, 28(10):1584–1601, 2006. 2
Parametric human body reconstruction based on sparse key
points. IEEE transactions on visualization and computer [18] Y. Nie, Y. Lin, X. Han, S. Guo, J. Chang, S. Cui, and
graphics, 22(11):2467–2479, 2015. 2 J. Zhang. Skeleton-bridged point completion: From global
[5] N. D. Cornea, D. Silver, and P. Min. Curve-skeleton prop- inference to local adjustment. In Advances in Neural Infor-
erties, applications, and algorithms. IEEE Transactions on mation Processing Systems, pages 16119–16130, 2020. 1,
visualization and computer graphics, 13(3):530, 2007. 2, 4 2
[6] C. Fernandez-Labrador, A. Chhatkuli, D. P. Paudel, J. J. [19] M. Pauly, N. J. Mitra, J. Giesen, M. H. Gross, and L. J.
Guerrero, C. Demonceaux, and L. V. Gool. Unsupervised Guibas. Example-based 3d scan completion. In Symposium
learning of category-specific symmetric 3d keypoints from on Geometry Processing, number CONF, pages 23–32, 2005.
point sets. In Computer Vision–ECCV 2020: 16th Euro- 1
pean Conference, Glasgow, UK, August 23–28, 2020, Pro- [20] C. R. Qi, H. Su, K. Mo, and L. J. Guibas. Pointnet: Deep
ceedings, Part XXV 16, pages 546–563. Springer, 2020. 3, learning on point sets for 3d classification and segmentation.
7 In Proceedings of the IEEE conference on computer vision
[7] T. Groueix, M. Fisher, V. G. Kim, B. C. Russell, and and pattern recognition, pages 652–660, 2017. 2
M. Aubry. A papier-mâché approach to learning 3d surface [21] C. R. Qi, L. Yi, H. Su, and L. J. Guibas. Pointnet++: Deep
generation. In Proceedings of the IEEE conference on com- hierarchical feature learning on point sets in a metric space.
puter vision and pattern recognition, pages 216–224, 2018. arXiv preprint arXiv:1706.02413, 2017. 2, 4
1, 7 [22] H. Rhodin, M. Salzmann, and P. Fua. Unsupervised
[8] F. Han and S.-C. Zhu. Bottom-up/top-down image parsing geometry-aware representation for 3d human pose estima-
with attribute grammar. IEEE transactions on pattern anal- tion. In Proceedings of the European Conference on Com-
ysis and machine intelligence, 31(1):59–73, 2008. 1 puter Vision (ECCV), pages 750–767, 2018. 2
[9] T. Jakab, R. Tucker, A. Makadia, J. Wu, N. Snavely, and [23] T. Shao, W. Xu, K. Zhou, J. Wang, D. Li, and B. Guo. An
A. Kanazawa. Keypointdeformer: Unsupervised 3d keypoint interactive approach to semantic modeling of indoor scenes
discovery for shape control. In Proceedings of the IEEE/CVF with an rgbd camera. ACM Transactions on Graphics (TOG),
Conference on Computer Vision and Pattern Recognition, 31(6):1–11, 2012. 1
pages 12783–12792, 2021. 3
[24] R. Shi, Z. Xue, Y. You, and C. Lu. Skeleton merger: an
[10] H. Jiang, J. Cai, and J. Zheng. Skeleton-aware 3d human
unsupervised aligned keypoint detector. In Proceedings of
shape reconstruction from point clouds. In Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern
the IEEE/CVF International Conference on Computer Vi-
Recognition, pages 43–52, 2021. 3, 4, 7, 8
sion, pages 5431–5441, 2019. 2
[25] J. Tang, X. Han, J. Pan, K. Jia, and X. Tong. A skeleton-
[11] E. Kalogerakis, S. Chaudhuri, D. Koller, and V. Koltun. A
bridged deep learning approach for generating meshes of
probabilistic model for component-based shape synthesis.
complex topologies from single rgb images. In Proceedings
Acm Transactions on Graphics (TOG), 31(4):1–11, 2012. 1
of the IEEE/CVF Conference on Computer Vision and Pat-
[12] V. G. Kim, W. Li, N. J. Mitra, S. Chaudhuri, S. DiVerdi,
tern Recognition, pages 4541–4550, 2019. 2
and T. Funkhouser. Learning part-based templates from large
collections of 3d shapes. ACM Transactions on Graphics [26] L. P. Tchapmi, V. Kosaraju, H. Rezatofighi, I. Reid, and
(TOG), 32(4):1–12, 2013. 1 S. Savarese. Topnet: Structural point cloud decoder. In Pro-
[13] J. Li and G. H. Lee. Usip: Unsupervised stable interest ceedings of the IEEE/CVF Conference on Computer Vision
point detection from 3d point clouds. In Proceedings of the and Pattern Recognition, pages 383–392, 2019. 1, 2
IEEE/CVF International Conference on Computer Vision, [27] H. Wang, J. Guo, D.-M. Yan, W. Quan, and X. Zhang. Learn-
pages 361–370, 2019. 2 ing 3d keypoint descriptors for non-rigid shape matching. In
[14] C. Lin, C. Li, Y. Liu, N. Chen, Y.-K. Choi, and W. Wang. Proceedings of the European Conference on Computer Vi-
Point2skeleton: Learning skeletal representations from point sion (ECCV), pages 3–19, 2018. 2

1724

Authorized licensed use limited to: Tianjin University of Technology. Downloaded on April 26,2023 at 02:38:08 UTC from IEEE Xplore. Restrictions apply.
[28] X. Wen, T. Li, Z. Han, and Y.-S. Liu. Point cloud completion [41] W. Yuan, T. Khot, D. Held, C. Mertz, and M. Hebert. Pcn:
by skip-attention network with hierarchical folding. In Pro- Point completion network. In 2018 International Conference
ceedings of the IEEE/CVF Conference on Computer Vision on 3D Vision (3DV), pages 728–737. IEEE, 2018. 1, 2, 3, 6,
and Pattern Recognition, pages 1939–1948, 2020. 2, 5 7
[29] X. Wen, P. Xiang, Z. Han, Y.-P. Cao, P. Wan, W. Zheng, [42] A. Zanfir, E. Marinoiu, M. Zanfir, A.-I. Popa, and C. Smin-
and Y.-S. Liu. Pmp-net: Point cloud completion by learn- chisescu. Deep network for the integrated 3d sensing of mul-
ing multi-step point moving paths. In Proceedings of the tiple people in natural images. Advances in Neural Informa-
IEEE/CVF Conference on Computer Vision and Pattern tion Processing Systems, 31:8410–8419, 2018. 2
Recognition, pages 7443–7452, 2021. 1, 7 [43] W. Zhang, Q. Yan, and C. Xiao. Detail preserved point cloud
[30] S. Wu, H. Huang, M. Gong, M. Zwicker, and D. Cohen-Or. completion via separated feature aggregation. In Computer
Deep points consolidation. ACM Transactions on Graphics Vision–ECCV 2020: 16th European Conference, Glasgow,
(ToG), 34(6):1–13, 2015. 8 UK, August 23–28, 2020, Proceedings, Part XXV 16, pages
[31] Y. Xia, Y. Xia, W. Li, R. Song, K. Cao, and U. Stilla. Asfm- 512–528. Springer, 2020. 2
net: Asymmetrical siamese feature matching network for
point completion. arXiv preprint arXiv:2104.09587, 2021.
5
[32] P. Xiang, X. Wen, Y.-S. Liu, Y.-P. Cao, P. Wan, W. Zheng,
and Z. Han. Snowflakenet: Point cloud completion by
snowflake point deconvolution with skip-transformer. In
Proceedings of the IEEE/CVF International Conference on
Computer Vision, pages 5499–5509, 2021. 1, 2, 5, 7
[33] C. Xie, C. Wang, B. Zhang, H. Yang, D. Chen, and F. Wen.
Style-based point generator with adversarial rendering for
point cloud completion. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition,
pages 4619–4628, 2021. 7
[34] H. Xie, H. Yao, S. Zhou, J. Mao, S. Zhang, and W. Sun. Gr-
net: Gridding residual network for dense point cloud com-
pletion. In European Conference on Computer Vision, pages
365–381. Springer, 2020. 1, 7
[35] Y. Yang, C. Feng, Y. Shen, and D. Tian. Foldingnet: Point
cloud auto-encoder via deep grid deformation. In Proceed-
ings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 206–215, 2018. 1, 5, 7
[36] K. Yin, H. Huang, D. Cohen-Or, and H. Zhang. P2p-net:
Bidirectional point displacement net for shape transform.
ACM Transactions on Graphics (TOG), 37(4):1–13, 2018.
2
[37] Y. You, Y. Lou, C. Li, Z. Cheng, L. Li, L. Ma, C. Lu, and
W. Wang. Keypointnet: A large-scale 3d keypoint dataset
aggregated from numerous human annotations. In Proceed-
ings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pages 13647–13656, 2020. 2, 3, 7
[38] T. Yu, K. Guo, F. Xu, Y. Dong, Z. Su, J. Zhao, J. Li, Q. Dai,
and Y. Liu. Bodyfusion: Real-time capture of human motion
and surface geometry using a single depth camera. In Pro-
ceedings of the IEEE International Conference on Computer
Vision, pages 910–919, 2017. 2
[39] T. Yu, Z. Zheng, K. Guo, J. Zhao, Q. Dai, H. Li, G. Pons-
Moll, and Y. Liu. Doublefusion: Real-time capture of human
performances with inner body shapes from a single depth
sensor. In Proceedings of the IEEE conference on computer
vision and pattern recognition, pages 7287–7296, 2018. 2
[40] X. Yu, Y. Rao, Z. Wang, Z. Liu, J. Lu, and J. Zhou. Pointr:
Diverse point cloud completion with geometry-aware trans-
formers. In Proceedings of the IEEE/CVF International
Conference on Computer Vision, pages 12498–12507, 2021.
1, 2, 6, 7

1725

Authorized licensed use limited to: Tianjin University of Technology. Downloaded on April 26,2023 at 02:38:08 UTC from IEEE Xplore. Restrictions apply.

You might also like