LAKe-Net Topology-Aware Point Cloud Completion by Localizing Aligned Keypoints
LAKe-Net Topology-Aware Point Cloud Completion by Localizing Aligned Keypoints
Junshu Tang1 , Zhijun Gong1 , Ran Yi1 *, Yuan Xie2 , Lizhuang Ma1,2 *
1
Shanghai Jiao Tong University, 2 East China Normal University
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) | 978-1-6654-6946-3/22/$31.00 ©2022 IEEE | DOI: 10.1109/CVPR52688.2022.00177
Abstract (a)
Authorized licensed use limited to: Tianjin University of Technology. Downloaded on April 26,2023 at 02:38:08 UTC from IEEE Xplore. Restrictions apply.
localization, skeleton generation, and shape refinement. Keypoints and Surface-skeleton, with a new Keypoints-
We firstly introduce the keypoint localization. Different Skeleton-Shape prediction manner. (2) We introduce an
from down-sampled points, keypoints are evenly distributed asymmetric keypoint locator including an unsupervised
across semantic parts of the shape, and are considered as multi-scale keypoint detector and a complete keypoint gen-
a crucial representation of geometric structure which are erator, which can capture accurate keypoints for complete
widely-used in many vision applications [17,27,37]. There- and partial objects in multiple categories, respectively. We
fore, we hold the belief that once the keypoints and their theoretically prove that our detector detects aligned key-
connectivity are correctly localized, the entire geometry is points within each sub-category. (3) We conduct point
determined. To this end, we wish to localize complete key- cloud completion experiments on two datasets, PCN and
points according to partial inputs under the supervision of ShapeNet55. Experimental results show that our LAKe-Net
the ground truth keypoints. However, obtaining keypoints achieves the state-of-the-art performance on both datasets.
annotation for a large number of 3D data is difficult and
expensive, so we propose an asymmetric keypoint loca- 2. Related Works
tor including an unsupervised multi-scale keypoint detec-
Point Cloud Completion. Point cloud completion focuses
tor (UMKD) and a complete keypoint generator (CKG) for
on predicting missing shapes from partial point cloud input.
complete and partial point cloud, respectively. For UMKD,
Recently, inspired by point cloud analysis approaches [20,
we extract Aligned Keypoints, which means the order of
21], PCN [41] first adopts an encoder-decoder architec-
keypoints are the same on different objects within a cer-
ture and a coarse-to-fine manner to generate the complete
tain category (Figure 1(c)), so as to represent more stable
shape. Several works [15, 26, 32, 43] follow this practice
and richer information, and provide stronger supervision for
and make modifications in network structure to obtain bet-
predicting complete keypoints from partial inputs by CKG.
ter performance. SA-Net [28] further extends the decod-
Since discrete and sparse keypoints are not enough for ing process into multiple stages by introducing hierarchi-
representing the whole objects, we leverage skeleton to cal folding. More recently, PoinTr [40] reformulates point
better represent topological details. Inspired by existing cloud completion as a set-to-set translation problem and de-
skeleton extraction methods [1, 5, 14], we propose a novel signs a new transformer-based encoder-decoder for point
Surface-skeleton, which is generated from keypoints based cloud completion. However, these methods mostly predict
on geometric priors. Compared with other types of skele- the location of complete points without predicting struc-
tons, our surface-skeleton is a mixture of curves and trian- tured and topological information, which leads to coarse
gle surfaces, and can represent more complex shape infor- results in missing regions. SK-PCN [18] is the most rele-
mation. We integrate surface-skeletons with different fine- vant work to ours which pre-processes the dataset and uses
ness generated by multi-scale keypoints into the shape re- meso-skeletons as supervision. However, SK-PCN doesn’t
finement step to recover finer results. Specifically, we pro- predict the structured and topological information of orig-
pose a folding-based refinement subnet including three re- inal shape. Our proposed LAKe-Net utilizes aligned key-
cursive skeleton-assisted refinement modules (RSR) follow- points and corresponding surface-skeleton which can cap-
ing some of other completion methods [28, 32]. ture the shared topological information as an assistant for
In general, we propose LAKe-Net, a novel topology- completion, and obtains better performance.
aware point cloud completion model by localizing aligned Skeleton Representation. Skeleton representation is
keypoints. The whole pipeline includes four parts: auto- widely-used in motion recognition [38, 39], human pose es-
encoder, asymmetric keypoints locator, surface-skeleton timation [2, 22] and human reconstruction [4, 42]. Jiang
generator and the refinement subnet. We leverage pairs of et al. [10] propose to incorporate skeleton awareness into
complete and partial point cloud during training. In detail, the deep learning-based regression for 3D human shape
the input point clouds (either complete or partial) are firstly reconstruction from point clouds. Tang et al. [25] uti-
fed into an auto-encoder to learn a feature embedding space lize topology preservation property of skeleton to per-
and generate coarse and complete results. Then, we localize form 3D surface reconstruction from a single RGB image.
multi-scale keypoints using asymmetric keypoints locator P2P-Net [36] learns bidirectional geometric transforma-
and generate corresponding surface-skeletons. The multi- tions between point-based shape representations from two
scale structures are fed into the refinement subnet to gener- domains, surface-skeletons and surfaces. Our method de-
ate fine output. The training process includes two stages for signs surface-skeleton representations generated by multi-
point cloud reconstruction and completion, respectively. scale keypoints in a more fine-grained manner to progres-
Overall, we summarize our main contributions as fol- sively aid point cloud completion.
lows: (1) We propose LAKe-Net, a novel topology-aware Unsupervised keypoint detection. While most hand-
point cloud completion model that utilizes a structured rep- crafted 3D keypoint detectors fail to detect accurate and
resentation of the surface as assistance, including Aligned well-aligned keypoints in complex objects, Li et al. [13]
1717
Authorized licensed use limited to: Tianjin University of Technology. Downloaded on April 26,2023 at 02:38:08 UTC from IEEE Xplore. Restrictions apply.
Auto-encoder 𝑬𝑬𝒊𝒊
1 Reconstruction Refinement Subnet 𝑹𝑹
Auto-encoder 𝑬𝑬𝟏𝟏
ConvTranspose
PCN
Resconv
Resconv
Resconv
Max
MLP
MLP
Encoder
PCN
𝑐𝑐 Coarse 𝑿𝑿/𝓧𝓧 𝑿𝑿𝒄𝒄 /𝓧𝓧𝒄𝒄
Encoder Generator
𝑹𝑹𝟏𝟏 𝑹𝑹𝟐𝟐 𝑹𝑹𝟑𝟑 gt fine Point-wise
Feature Global
Feature
Global
Feature
𝐗𝐗 𝐗𝐗𝐜𝐜 𝐗𝐗𝐟𝐟 Complete Keypoints Generator
Reference
ConvTranspose Repeat
𝑐𝑐̂
UMKD [1, 𝐷𝐷]
kp2
Resconv
�
𝑷𝑷i
MLP
MLP
𝐏𝐏𝟏𝟏 Surface-
...
Asymmetric 𝐏𝐏𝟑𝟑 𝐏𝐏𝟐𝟐 𝓢𝓢𝟏𝟏 𝓢𝓢𝟐𝟐 𝓢𝓢𝟑𝟑 [𝐾𝐾𝑖𝑖 , 3]
skeleton
FFS
Keypoint Locator 𝑓𝑓̂
Generation [𝑁𝑁p , 𝐷𝐷] [𝐾𝐾𝑖𝑖 , 𝐷𝐷] [𝐾𝐾𝑖𝑖 , 𝐹𝐹] [𝐾𝐾𝑖𝑖 , 𝐷𝐷]
� 𝟑𝟑
𝑷𝑷 � 𝟐𝟐 � 𝟏𝟏
𝑷𝑷 �𝟏𝟏
𝓢𝓢 �𝟐𝟐
𝓢𝓢 �𝟑𝟑
𝓢𝓢 𝒄𝒄𝒊𝒊 /�𝒄𝒄𝒊𝒊 Max& Repeat Max
𝑷𝑷 Reference
Repeat
𝒄𝒄𝒊𝒊+𝟏𝟏 /�𝒄𝒄𝒊𝒊+𝟏𝟏
[1, 𝐷𝐷𝑖𝑖 ]
ConvTranspose
�𝑖𝑖 , 3]
[𝑁𝑁 [1, 𝐷𝐷𝑖𝑖+1 ]
Point-wise
... ...
C
Resconv
𝑿𝑿𝒊𝒊
Feature
𝑓𝑓̂
MLP
MLP
MLP
[𝑁𝑁𝑖𝑖 , 3]
𝑿𝑿𝒊𝒊+𝟏𝟏
PCN [𝑁𝑁𝑖𝑖+1 , 3]
Encoder 𝑹𝑹𝟏𝟏 𝑹𝑹𝟐𝟐 𝑹𝑹𝟑𝟑 � 𝒊𝒊
𝓢𝓢𝒊𝒊 /𝓢𝓢
Coarse �𝑖𝑖 , 𝐻𝐻] [𝑁𝑁
�𝑖𝑖 , 𝐻𝐻] [𝑁𝑁
[𝑁𝑁 �𝑖𝑖 , 𝐷𝐷𝑖𝑖 ] Upsample
[𝑆𝑆𝑖𝑖 , 3]
Generator Repeat
𝑐𝑐̂
Global
𝓧𝓧 Feature 𝓧𝓧𝒄𝒄 𝓧𝓧𝒄𝒄 𝓧𝓧𝒇𝒇 Farthest
Keypoint Element-wise
Auto-encoder 𝑬𝑬𝟐𝟐 Refinement Subnet 𝑹𝑹 FFS Feature C Concat
2 Completion Loss Sampling Add
Figure 2. The overall architecture of LAKe-Net, which consists of two parts including Point Cloud Reconstruction (Blue) and Point cloud
Completion (Red). We show the detailed structure of (a) Auto-encoder E, (b) Complete Keypoint Generator G and (c) Recursive Skeleton-
assisted Refinement module R on the right side. PCN encoder is firstly proposed in [41]. UMKD denotes the unsupervised multi-scale
keypoint detector. Surface-skeletons Si and Ŝi are generated by Pi and P̂i , respectively.
propose the first learning-based 3D keypoint detector USIP. Unsupervised Multi-scale Keypoints Detector 𝑫𝑫
Category-inviriant Convex Encoder Category-specific Offset Predictors
However, the detected keypoints are neither ordered nor se- 𝑶𝑶𝟏𝟏𝟏𝟏 𝑶𝑶𝟏𝟏𝟏𝟏 𝑶𝑶𝟏𝟏𝓠𝓠 Output 𝐏𝐏𝟏𝟏
mantically salient. Fernandez et al. [6] utilize symmetry Input 𝐗𝐗 PointNet++ 𝑾𝑾1𝑡𝑡
FC
FC 𝑾𝑾1
FC
𝑵𝑵𝒄𝒄 × 𝟑𝟑
prior in point clouds to capture keypoints in an unsuper- 𝑶𝑶𝟐𝟐𝟏𝟏 𝑶𝑶𝟐𝟐𝟐𝟐 𝑶𝑶𝟐𝟐𝓠𝓠 Output 𝐏𝐏𝟐𝟐
𝑾𝑾2𝑡𝑡
vised manner. Recently, Jakab et al. [9] further explore the 𝓠𝓠 𝝎𝝎 𝑾𝑾2
FC
Category Predicted
...
...
1718
Authorized licensed use limited to: Tianjin University of Technology. Downloaded on April 26,2023 at 02:38:08 UTC from IEEE Xplore. Restrictions apply.
put points into point-wise local features fˆ ∈ RNp ×d , where a) Topology Prior c) Edge Interpolation
d denotes the dimension of feature embedding. We consider Adjacency Matrix A
1719
Authorized licensed use limited to: Tianjin University of Technology. Downloaded on April 26,2023 at 02:38:08 UTC from IEEE Xplore. Restrictions apply.
reference shape for reconstruction while using the coarse reconstruction. Firstly, in order to encourage the detected
results Xc for completion. keypoints P to be well-distributed and not deviate from the
Overall, we get the surface-skeleton S. Increasing global shape, we calculate the Chamfer Distance (CD) loss
the number of keypoints leads to more complex surface- between the predicted keypoints and sparse point clouds
skeletons which can represent finer shape details. We utilize X∗ downsampled from input data using FPS strategy. As
the structure as topology representation of geometric shape for training one detector within several categories, we also
which is crucial for reconstruction and completion. train a classification head. We denote the predicted output
is ω and certain category label is σ. We train a criterion
3.3. Complete Keypoint Generator loss Lcls . Besides, as mentioned in Sec. 3.2, we expect the
In the second stage, given the partial input X ∈ RNp ×3 , surface-skeleton can reconstruct the geometric shape of the
the aim of our proposed Complete Keypoint Generator ground truth. We calculate CD between multi-scale surface-
(CKG) module is to predict multi-scale complete keypoints skeletons {Si }3i=1 and the ground truth X. So the overall
from partial feature embedding. To this end, we utilize the loss for training keypoint detector is:
local and global feature fˆ and ĉ extracted by the encoder
\small \begin {aligned} \mathcal {L}_{CD} = \frac {1}{|X|}\sum _{x\in X}\min _{y\in Y}||x-y||^{2} + \frac {1}{|Y|}\sum _{y\in Y}\min _{x\in X}||y-x||^{2}, \end {aligned} \label {eq:cd} \vspace {-3mm}
in E2 as input. Similar to Farthest Point Sampling (FPS)
strategy for point cloud, in order to downsample the point (2)
features, we use Farthest Feature Sampling (FFS) strategy
to down sample point-wise local feature fˆ to sparse feature \small \begin {aligned} \mathcal {L}_{cls} = -\sum ^{\mathcal {Q}}_{i=1}(\sigma _{i}log\omega _i + (1-\sigma _{i})log(1-\omega _{i})), \end {aligned} (3)
fˆ∗ where we replace point coordinates in FPS by feature
embeddings. The number of sampling is the same as pre-
\small \begin {aligned} \mathcal {L}_{kp} = \mathcal {L}_{CD}(\textbf {P}, \textbf {X}^{*}) + \sum ^{3}_{i=1}\mathcal {L}_{CD}(\mathcal {S}_i, \textbf {X})+ \mathcal {L}_{cls}. \end {aligned} (4)
dicted keypoints. Then we utilize a de-convolution layer to
upsample the global feature ĉ and fuse them into a resid- At last, we calculates CD between the ground truth X
ual block and predict final keypoints. We train three similar and sparse output Xc , dense output Xf , respectively.
blocks for multi-scale keypoints prediction. Then we gen-
erate corresponding surface-skeleton Ŝ by surface interpo- \small \begin {aligned} \mathcal {L}_{rec} = \mathcal {L}_{CD}(\textbf {X}_c, \textbf {X}) + \mathcal {L}_{CD}(\textbf {X}_f, \textbf {X}). \end {aligned} (5)
lation introduced on Sec. 3.2. In general, the overall training loss in the first stage is:
3.4. Recursive Skeleton-assisted Refinement \small \begin {aligned} \mathcal {L}_{1} = \mathcal {L}_{rec} + \lambda _{kp}^{1}\mathcal {L}_{kp}, \end {aligned} (6)
1720
Authorized licensed use limited to: Tianjin University of Technology. Downloaded on April 26,2023 at 02:38:08 UTC from IEEE Xplore. Restrictions apply.
4. Experiments
Input
4.1. Dataset Setting and Evaluation metric
PCN: The PCN dataset is a widely-used benchmark for GRNet
point cloud completion, which is created by [41], includ-
ing different objects from 8 categories: plane, cabinet, car,
chair, lamp, sofa, table, and vessel. The training set contains Spare
28,974 objects, while validation and test set contains 800
and 1,200 objects, respectively. The complete point cloud
PMP
consists of 16,384 points which are uniformly sampled on
the original CAD model. Partial point cloud, consisting of
2,048 points, is created by back-projecting 2.5D depth im- PoinTr
ages into 3D from 8 random viewpoints.
ShapeNet55: To explore the performance of our method
Snow
on a large number of categories, we evaluate our method on
all 55 categories of ShapeNet [3], named ShapeNet55. The
ShapeNet55 dataset was first created by PoinTr [40]. The Ours
training set contains 41,952 objects, while test set contains
10,518 objects. We randomly sample 80% objects in each Ground
category to form training set and use the rest 20% to form Truth
validation set.
Evaluation Metrics: We utilize two evaluation metrics
between output point cloud and the ground truth, Cham- Figure 5. Visualization of point cloud completion comparison re-
sults on PCN dataset with other recent methods.
fer Distance (CD) using L2 norm and Earth Mover’s Dis-
tance (EMD), following most of the methods on PCN and
plement other methods using their open source code and
ShapeNet55 test set. CD is introduced in Equation 2 and
hyper-parameters for fair comparison. Table 2 and 3 show
EMD is defined as:
the quantitative comparison results of our method and other
\small \begin {aligned} EMD(X, Y) = \min _{\phi : X\to Y}\frac {1}{|X|}\sum _{x\in X} ||x-\phi (x)||_{2}, \end {aligned} (11) point cloud completion methods on PCN datasets, from
which we can see that our method achieves the best perfor-
where ϕ is a bijection. It is noteworthy that we compute mance over all counterparts on both CD and EMD metrics.
these metrics using 16,384 and 8,192 points for PCN and Specifically, compared with the second-ranked Snowflake
ShapeNet55, respectively. which also proposed progressive decoding modules, our
method has better performance with the help of aligned key-
4.2. Implementation Details points and surface-skeletons. Besides, according to exper-
The whole training of LAKe-Net is a two-stage process: imental results, our proposed LAKe-Net is more powerful
point cloud reconstruction and point cloud completion. The to predict symmetrical geometries and their topology infor-
input of the first stage (reconstruction) is a set of complete mation compared with SnowflakeNet.
point clouds with coordinates and object category labels Moreover, we also show the visualization of qualitative
from training set of all datasets. We train the keypoint de- comparison results and some recent methods in Figure 5,
tection for 60 epochs and progressively extract 256, 128, which show that our method has better performance on
64 keypoints. The refinement subnet includes three RSR completing missing topology. Specifically, methods which
modules, the up factors of de-convolution are [1,1,2]. For also utilize progressive coarse-to-fine decoding like PMP-
the second stage (the bottom completion branch of Fig. 2), Net and SnowflakeNet, tended to predict coarse missing
we only input partial point clouds from training set with its shape and generate scattered points, especially for geom-
coordinate information. We utilize Adam optimization to etry with a plane or surface. Other methods like GRNet,
train the whole architecture of point cloud completion for SpareNet and PoinTr are weak on recovering the local de-
100 epochs with batchsize 64 and learning rate 0.001. The tails and some missing topology like table legs. Our method
hyper-parmeters λ1kp = λ2kp = 10, λf eat = 1000. The can predict geometries with more clear topology structure
inference time of our method is 34.5ms per sample. and fewer noises.
4.3. Results on PCN dataset. 4.4. Results on ShapeNet55 dataset
We compare the performance of our proposed LAKe- Moreover, to evaluate the generalization and powerful
Net and other state-of-the-art completion methods. We im- of our method on a large account of categories of data
1721
Authorized licensed use limited to: Tianjin University of Technology. Downloaded on April 26,2023 at 02:38:08 UTC from IEEE Xplore. Restrictions apply.
Category Bed Bench Bookshelf FileCabinet Faucet Telephone Can Flowerpot Tower Pillow Average
Metrics CD EMD CD EMD CD EMD CD EMD CD EMD CD EMD CD EMD CD EMD CD EMD CD EMD CD EMD
Folding [35] 3.17 73.6 1.45 50.1 2.48 64.4 1.94 65.3 3.19 66.2 0.69 39.1 1.76 60.2 4.11 82.9 1.83 59.5 1.64 63.2 2.06 60.2
PCN [41] 2.50 49.4 0.96 28.9 2.39 44.7 1.49 37.2 1.96 40.3 0.54 24.0 1.30 30.9 2.58 48.7 1.34 33.7 1.09 31.5 1.36 34.0
GRNet [34] 0.93 29.7 0.86 25.8 0.93 29.6 1.57 24.9 0.83 27.6 0.87 26.2 1.15 32.3 1.24 33.5 0.87 25.3 1.06 28.8 1.15 28.2
PoinTr [40] 2.18 37.6 0.93 21.4 1.86 37.1 3.23 42.7 1.75 42.4 0.55 20.8 2.13 31.2 2.68 42.7 1.73 35.9 1.40 31.8 1.70 31.7
Ours 0.72 28.4 0.71 18.2 0.89 29.7 0.97 16.4 0.34 20.5 0.48 20.8 0.63 29.0 1.19 35.9 0.60 21.9 0.97 29.8 0.89 31.0
Table 1. Quantitative comparison results with other completion methods on ShapeNet55 dataset using CD-l2 (×103 ) and EMD(×103 )
metrics. We report the detailed results for each method on 10 sampled categories and overall average results on all 55 categories.
Input Keypoints Surface-Skeleton Output GroundTruth
CD-l2 (×104 ) Airplane Cabinet Car Chair Lamp Sofa Table Vessel Average
Folding [35] 3.151 7.943 4.676 9.225 9.234 8.895 6.691 7.325 7.142
PCN [41] 1.400 4.450 2.445 4.838 6.238 5.129 3.569 4.062 4.016
AtlasNet [7] 1.753 5.101 3.237 5.226 6.342 5.990 4.359 4.177 4.523
MSN [15] 1.543 7.249 4.711 4.539 6.479 5.894 3.797 3.853 4.758
GRNet [34] 1.531 3.620 2.752 2.945 2.649 3.613 2.552 2.122 2.723
PMP-Net [29] 1.205 4.189 2.878 3.495 2.178 4.267 2.921 1.894 2.878
SpareNet [33] 1.756 6.635 3.614 6.163 6.313 7.893 4.987 3.835 5.149
PointTr [40] 0.993 4.809 2.529 3.683 3.077 6.535 3.103 2.029 3.345
Snowflake [32] 0.913 3.322 2.246 2.642 1.898 3.966 2.011 1.692 2.336
Ours 0.646 2.594 1.743 2.149 2.759 2.186 1.876 1.602 1.944
1722
Authorized licensed use limited to: Tianjin University of Technology. Downloaded on April 26,2023 at 02:38:08 UTC from IEEE Xplore. Restrictions apply.
(a) GroundTruth SkeletonMerger Ours-16 Ours-32 Ours-64
EMD(×102 ) Airplane Cabinet Car Chair Lamp Sofa Table Vessel Average
Ours 0.958 1.830 1.564 1.667 1.782 1.755 1.499 1.402 1.557
-use FPS 1.117 2.295 1.978 2.157 1.916 2.607 1.810 1.823 1.963
-w/o S-sk 1.469 2.638 2.386 2.380 2.221 2.989 1.906 2.020 2.251
PointDisturb 1.031 1.902 1.554 2.012 1.945 2.037 1.684 1.437 1.700
ClassDisturb 0.963 1.846 1.576 1.786 1.831 1.780 1.545 1.397 1.590
1723
Authorized licensed use limited to: Tianjin University of Technology. Downloaded on April 26,2023 at 02:38:08 UTC from IEEE Xplore. Restrictions apply.
References clouds. In Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, pages 4277–
[1] J. Cao, A. Tagliasacchi, M. Olson, H. Zhang, and Z. Su. 4286, 2021. 2, 4
Point cloud skeletons via laplacian based contraction. In [15] M. Liu, L. Sheng, S. Yang, J. Shao, and S.-M. Hu. Morph-
2010 Shape Modeling International Conference, pages 187– ing and sampling network for dense point cloud completion.
197. IEEE, 2010. 2 In Proceedings of the AAAI conference on artificial intelli-
[2] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh. Realtime multi- gence, volume 34, pages 11596–11603, 2020. 2, 7
person 2d pose estimation using part affinity fields. In Pro-
[16] A. Martinovic and L. Van Gool. Bayesian grammar learn-
ceedings of the IEEE conference on computer vision and pat-
ing for inverse procedural modeling. In Proceedings of the
tern recognition, pages 7291–7299, 2017. 2
IEEE Conference on Computer Vision and Pattern Recogni-
[3] A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, tion, pages 201–208, 2013. 1
Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su,
[17] A. S. Mian, M. Bennamoun, and R. Owens. Three-
et al. Shapenet: An information-rich 3d model repository.
dimensional model-based object recognition and segmenta-
arXiv preprint arXiv:1512.03012, 2015. 6
tion in cluttered scenes. IEEE transactions on pattern anal-
[4] K.-L. Cheng, R.-F. Tong, M. Tang, J.-Y. Qian, and M. Sarkis.
ysis and machine intelligence, 28(10):1584–1601, 2006. 2
Parametric human body reconstruction based on sparse key
points. IEEE transactions on visualization and computer [18] Y. Nie, Y. Lin, X. Han, S. Guo, J. Chang, S. Cui, and
graphics, 22(11):2467–2479, 2015. 2 J. Zhang. Skeleton-bridged point completion: From global
[5] N. D. Cornea, D. Silver, and P. Min. Curve-skeleton prop- inference to local adjustment. In Advances in Neural Infor-
erties, applications, and algorithms. IEEE Transactions on mation Processing Systems, pages 16119–16130, 2020. 1,
visualization and computer graphics, 13(3):530, 2007. 2, 4 2
[6] C. Fernandez-Labrador, A. Chhatkuli, D. P. Paudel, J. J. [19] M. Pauly, N. J. Mitra, J. Giesen, M. H. Gross, and L. J.
Guerrero, C. Demonceaux, and L. V. Gool. Unsupervised Guibas. Example-based 3d scan completion. In Symposium
learning of category-specific symmetric 3d keypoints from on Geometry Processing, number CONF, pages 23–32, 2005.
point sets. In Computer Vision–ECCV 2020: 16th Euro- 1
pean Conference, Glasgow, UK, August 23–28, 2020, Pro- [20] C. R. Qi, H. Su, K. Mo, and L. J. Guibas. Pointnet: Deep
ceedings, Part XXV 16, pages 546–563. Springer, 2020. 3, learning on point sets for 3d classification and segmentation.
7 In Proceedings of the IEEE conference on computer vision
[7] T. Groueix, M. Fisher, V. G. Kim, B. C. Russell, and and pattern recognition, pages 652–660, 2017. 2
M. Aubry. A papier-mâché approach to learning 3d surface [21] C. R. Qi, L. Yi, H. Su, and L. J. Guibas. Pointnet++: Deep
generation. In Proceedings of the IEEE conference on com- hierarchical feature learning on point sets in a metric space.
puter vision and pattern recognition, pages 216–224, 2018. arXiv preprint arXiv:1706.02413, 2017. 2, 4
1, 7 [22] H. Rhodin, M. Salzmann, and P. Fua. Unsupervised
[8] F. Han and S.-C. Zhu. Bottom-up/top-down image parsing geometry-aware representation for 3d human pose estima-
with attribute grammar. IEEE transactions on pattern anal- tion. In Proceedings of the European Conference on Com-
ysis and machine intelligence, 31(1):59–73, 2008. 1 puter Vision (ECCV), pages 750–767, 2018. 2
[9] T. Jakab, R. Tucker, A. Makadia, J. Wu, N. Snavely, and [23] T. Shao, W. Xu, K. Zhou, J. Wang, D. Li, and B. Guo. An
A. Kanazawa. Keypointdeformer: Unsupervised 3d keypoint interactive approach to semantic modeling of indoor scenes
discovery for shape control. In Proceedings of the IEEE/CVF with an rgbd camera. ACM Transactions on Graphics (TOG),
Conference on Computer Vision and Pattern Recognition, 31(6):1–11, 2012. 1
pages 12783–12792, 2021. 3
[24] R. Shi, Z. Xue, Y. You, and C. Lu. Skeleton merger: an
[10] H. Jiang, J. Cai, and J. Zheng. Skeleton-aware 3d human
unsupervised aligned keypoint detector. In Proceedings of
shape reconstruction from point clouds. In Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern
the IEEE/CVF International Conference on Computer Vi-
Recognition, pages 43–52, 2021. 3, 4, 7, 8
sion, pages 5431–5441, 2019. 2
[25] J. Tang, X. Han, J. Pan, K. Jia, and X. Tong. A skeleton-
[11] E. Kalogerakis, S. Chaudhuri, D. Koller, and V. Koltun. A
bridged deep learning approach for generating meshes of
probabilistic model for component-based shape synthesis.
complex topologies from single rgb images. In Proceedings
Acm Transactions on Graphics (TOG), 31(4):1–11, 2012. 1
of the IEEE/CVF Conference on Computer Vision and Pat-
[12] V. G. Kim, W. Li, N. J. Mitra, S. Chaudhuri, S. DiVerdi,
tern Recognition, pages 4541–4550, 2019. 2
and T. Funkhouser. Learning part-based templates from large
collections of 3d shapes. ACM Transactions on Graphics [26] L. P. Tchapmi, V. Kosaraju, H. Rezatofighi, I. Reid, and
(TOG), 32(4):1–12, 2013. 1 S. Savarese. Topnet: Structural point cloud decoder. In Pro-
[13] J. Li and G. H. Lee. Usip: Unsupervised stable interest ceedings of the IEEE/CVF Conference on Computer Vision
point detection from 3d point clouds. In Proceedings of the and Pattern Recognition, pages 383–392, 2019. 1, 2
IEEE/CVF International Conference on Computer Vision, [27] H. Wang, J. Guo, D.-M. Yan, W. Quan, and X. Zhang. Learn-
pages 361–370, 2019. 2 ing 3d keypoint descriptors for non-rigid shape matching. In
[14] C. Lin, C. Li, Y. Liu, N. Chen, Y.-K. Choi, and W. Wang. Proceedings of the European Conference on Computer Vi-
Point2skeleton: Learning skeletal representations from point sion (ECCV), pages 3–19, 2018. 2
1724
Authorized licensed use limited to: Tianjin University of Technology. Downloaded on April 26,2023 at 02:38:08 UTC from IEEE Xplore. Restrictions apply.
[28] X. Wen, T. Li, Z. Han, and Y.-S. Liu. Point cloud completion [41] W. Yuan, T. Khot, D. Held, C. Mertz, and M. Hebert. Pcn:
by skip-attention network with hierarchical folding. In Pro- Point completion network. In 2018 International Conference
ceedings of the IEEE/CVF Conference on Computer Vision on 3D Vision (3DV), pages 728–737. IEEE, 2018. 1, 2, 3, 6,
and Pattern Recognition, pages 1939–1948, 2020. 2, 5 7
[29] X. Wen, P. Xiang, Z. Han, Y.-P. Cao, P. Wan, W. Zheng, [42] A. Zanfir, E. Marinoiu, M. Zanfir, A.-I. Popa, and C. Smin-
and Y.-S. Liu. Pmp-net: Point cloud completion by learn- chisescu. Deep network for the integrated 3d sensing of mul-
ing multi-step point moving paths. In Proceedings of the tiple people in natural images. Advances in Neural Informa-
IEEE/CVF Conference on Computer Vision and Pattern tion Processing Systems, 31:8410–8419, 2018. 2
Recognition, pages 7443–7452, 2021. 1, 7 [43] W. Zhang, Q. Yan, and C. Xiao. Detail preserved point cloud
[30] S. Wu, H. Huang, M. Gong, M. Zwicker, and D. Cohen-Or. completion via separated feature aggregation. In Computer
Deep points consolidation. ACM Transactions on Graphics Vision–ECCV 2020: 16th European Conference, Glasgow,
(ToG), 34(6):1–13, 2015. 8 UK, August 23–28, 2020, Proceedings, Part XXV 16, pages
[31] Y. Xia, Y. Xia, W. Li, R. Song, K. Cao, and U. Stilla. Asfm- 512–528. Springer, 2020. 2
net: Asymmetrical siamese feature matching network for
point completion. arXiv preprint arXiv:2104.09587, 2021.
5
[32] P. Xiang, X. Wen, Y.-S. Liu, Y.-P. Cao, P. Wan, W. Zheng,
and Z. Han. Snowflakenet: Point cloud completion by
snowflake point deconvolution with skip-transformer. In
Proceedings of the IEEE/CVF International Conference on
Computer Vision, pages 5499–5509, 2021. 1, 2, 5, 7
[33] C. Xie, C. Wang, B. Zhang, H. Yang, D. Chen, and F. Wen.
Style-based point generator with adversarial rendering for
point cloud completion. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition,
pages 4619–4628, 2021. 7
[34] H. Xie, H. Yao, S. Zhou, J. Mao, S. Zhang, and W. Sun. Gr-
net: Gridding residual network for dense point cloud com-
pletion. In European Conference on Computer Vision, pages
365–381. Springer, 2020. 1, 7
[35] Y. Yang, C. Feng, Y. Shen, and D. Tian. Foldingnet: Point
cloud auto-encoder via deep grid deformation. In Proceed-
ings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 206–215, 2018. 1, 5, 7
[36] K. Yin, H. Huang, D. Cohen-Or, and H. Zhang. P2p-net:
Bidirectional point displacement net for shape transform.
ACM Transactions on Graphics (TOG), 37(4):1–13, 2018.
2
[37] Y. You, Y. Lou, C. Li, Z. Cheng, L. Li, L. Ma, C. Lu, and
W. Wang. Keypointnet: A large-scale 3d keypoint dataset
aggregated from numerous human annotations. In Proceed-
ings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pages 13647–13656, 2020. 2, 3, 7
[38] T. Yu, K. Guo, F. Xu, Y. Dong, Z. Su, J. Zhao, J. Li, Q. Dai,
and Y. Liu. Bodyfusion: Real-time capture of human motion
and surface geometry using a single depth camera. In Pro-
ceedings of the IEEE International Conference on Computer
Vision, pages 910–919, 2017. 2
[39] T. Yu, Z. Zheng, K. Guo, J. Zhao, Q. Dai, H. Li, G. Pons-
Moll, and Y. Liu. Doublefusion: Real-time capture of human
performances with inner body shapes from a single depth
sensor. In Proceedings of the IEEE conference on computer
vision and pattern recognition, pages 7287–7296, 2018. 2
[40] X. Yu, Y. Rao, Z. Wang, Z. Liu, J. Lu, and J. Zhou. Pointr:
Diverse point cloud completion with geometry-aware trans-
formers. In Proceedings of the IEEE/CVF International
Conference on Computer Vision, pages 12498–12507, 2021.
1, 2, 6, 7
1725
Authorized licensed use limited to: Tianjin University of Technology. Downloaded on April 26,2023 at 02:38:08 UTC from IEEE Xplore. Restrictions apply.