Density-preserving Deep Point Cloud Compression
Density-preserving Deep Point Cloud Compression
Yun He1∗ Xinlin Ren1∗ Danhang Tang2 Yinda Zhang2 Xiangyang Xue1 Yanwei Fu1
1
Fudan University 2 Google
Abstract
2333
sion [12, 27, 28, 37, 38]. While this allows leveraging con- can also be categorized into voxel-based [27, 28, 37, 38] and
ventional methods [7, 22], it obviously loses the local den- point-based [17, 40, 41]. While sharing the discussed pros
sity, and has a precision capped by the voxel size. Recent and cons in point cloud analysis, point-based methods en-
methods [17, 41] utilize PointNet [24] or PointNet++ [25] able preserving local density for taking the raw 3D points as
to ignore the cardinality and permutation with max pool- inputs. Specifically, Yan et al. [41] integrates PointNet [24]
ing, and preserve density to some extent. However, the de- into an auto-encoder framework, while Huang et al. [17]
compressed point clouds always lose local details and suf- uses PointNet++ [25] instead. Architecture wise, Wies-
fer from clustered points issue, since most of the local ge- mann et al. [40] proposes to downsample the point cloud
ometry has been discarded by max pooling. Depoco [40] while encoding and upsample during decoding. Moreover,
adopts KPConv [35] to capture more local spatial informa- the research on deep entropy model [6,16,29] is also active,
tion than pooling, but clustered points artifact still exists while it is nearly lossless since its loss is only from quan-
due to feature replication, see Fig 1. Alternatively, Zhao et tization. In this paper we are focusing on the more lossy
al. [46] introduces attention mechanism to handle different compression in favor of higher compression ratio.
cardinalities and permutations, though it is not designed for
Point Cloud Upsampling. Point cloud upsampling aims
compression purpose.
to upsample a sparse point cloud to a dense and uniform
In this paper, we propose a novel density-preserving
one. And previous methods always design various fea-
deep point cloud compression method which yields superior
ture expansion modules to achieve it. In particular, Yu et
rate-distortion trade-off to prior arts, and more importantly
al. [44] replicates features and transforms them by multi-
preserves the local density. Our method has an auto-encoder
branch MLPs. And some other methods [19,20,43] employ
architecture, trained with an entropy encoder end-to-end.
folding-based [42] upsampling, which also duplicates fea-
The contributions of our paper are summarized as follows.
tures first. Specifically, Wang et al. [43] assigns each dupli-
On the encoder side: three types of feature embeddings are
cated feature a 1D code. Li et al. [19] and Li et al. [20]
designed to capture local geometry distribution and density.
concatenate each replicated feature with a point sampled
On the decoder side: to mitigate the clustered points issue,
from a 2D grid. However, the upsampled features generated
we propose 1) the sub-point convolution to promote feature
from these methods could be too similar to each other due
diversity during upsampling; 2) learnable number of upsam-
to replication, which inevitably results in clustered points.
pling points, and scale for their offsets in different regions.
We conduct extensive experiments and ablation studies
to justify these contributions. Additionally, we demonstrate 3. Methodology
that our method can be easily extended to jointly compress
attributes such as normal. The proposed density-preserving deep point cloud com-
pression framework is based on a symmetric auto-encoder
2. Related Work architecture, where the encoder has S downsampling stages
indexed by 0, 1...S − 1, and the decoder also has S upsam-
Point Cloud Analysis. Point clouds are typically unstruc- pling stages indexed reversely by S − 1, S − 2...0. For stage
tured, irregular and unordered, which cannot be immedi- s of the encoder, the input point cloud is notated as Ps and
ately processed by conventional convolution. To tackle this the output as Ps+1 . Reversely on the decoder side, the in-
issue, many works [21, 30] first voxelize points and then put and output of stage s are P̂s+1 and P̂s respectively, as
apply 3D convolution, which however could be computa- shown in Fig 2. Note that to distinguish from encoding, the
tionally expensive. Another type of approach directly op- hat symbol is used for reconstructed point clouds and asso-
erates on point clouds, hence termed point-based. For ex- ciated features.
ample, PointNet [24] and PointNet++ [25] use max pool- The input point cloud P0 is first partitioned into smaller
ing to ignore the order of points. DGCNN [39] proposes blocks which will be compressed individually. For simplic-
dynamic graph convolution for non-local feature aggrega- ity, we use the same notation P0 for a block. Specifically,
tion. And Point Transformer [46] introduces a purely self- on the encoder side, the input Ps is downsampled to Ps+1
attention [36] based network. by a factor fs at each stage s, while local geometry and den-
Point Cloud Compression. Traditional point cloud com- sity are also encoded into features Fs+1 . At the bottleneck,
pression algorithms [10–12, 23, 31, 32] usually rely on oc- features FS are then fed into an end-to-end trained entropy
tree [22] or KD-tree [5] structures for storage efficiency. encoder for further compression. When decompressing, we
Inspired by the great success of deep learning technology recover the downsampled point cloud P̂S , along with the
in point cloud analysis [24, 25, 39, 46] and image compres- features F̂S extracted by the entropy decoder. Our upsam-
sion [2, 3], the community begins to focus on the learning pling module then utilizes F̂S to upsample P̂S back to the
based point cloud compression. Similarly, lossy methods reconstructed point cloud P̂0 stage by stage.
2334
: Downsampled Point Cloud
: Downsampled Features
…
…
…
…
DS DS DS Entropy US US US
Block Block Block Encoder Block Block Block
Figure 2. Our pipeline first partitions the point cloud into small blocks. Each block is then downsampled three times while the local density
and geometry patterns of collapsed points are encoded into features. At the bottleneck, downsampled features are further compressed by an
entropy encoder. The decoder can then use the features to adaptively upsample the downsampled point cloud back to the original geometry
and density. The details of downsampling (DS) block and upsampling (US) block are shown in Fig 3 and Fig 5 respectively.
3.1. Density-preserving Encoder where the direction (3D) and distance (scalar) are repre-
sented by this 4D vector. Consequently, the local point dis-
Downsampling. At each stage s of the encoder, an input tribution centered at p can be represented by a u × 4 feature,
point cloud block Ps will be downsampled to Ps+1 by a which is mapped to a higher dimensional (u × d) space with
factor of fs using farthest point sampling (FPS), which en- MLPs, before attention mechanism [36] is applied to aggre-
courages the sampled points to have a good coverage of the gate them into a d-dimensional embedding FP .
point cloud Ps . Please refer to the supplementary section While the density and position embedding capture the
for the ablation study of different sampling techniques. local density and geometry at stage s, it is necessary to
Feature embedding. As Ps+1 itself does not preserve the pass along these information from previous stages without
discarded points distribution of Ps . Simply upsampling adding much rate cost. To this end, we employ the point
Ps+1 by 1/fs will end up with a reconstruction with poor transformer layer [46] to aggregate the previous stage fea-
accuracy and uniform density. To address this, for each tures of the collapsed points set C(p) into the representative
point p ∈ Ps+1 , we calculate three different embeddings: sampled point p, due to its simplicity and effectiveness. We
density embedding, local position embedding and ancestor term this d-dimensional vector FA as ancestor embedding.
embedding, to capture the geometry and density of the dis- At last, an MLP fuses these three embeddings
carded points Ps − Ps+1 in a compact form with low en- (FP , FD , FA ) into a new d-dimensional feature Fs+1 for
tropy. the next stage. This process is illustrated in Fig 3.
First we define the concept of a collapsed points set C(p). Entropy encoding. At the bottleneck, we have a downsam-
After the downsampled points set is decided, each discarded pled point cloud PS and per-point features FS . For PS , we
point is deemed to collapse into its nearest downsampled use half float representation to reduce bitrate. And FS are
point exclusively. Thus all the points that collapse into a further compressed by an entropy encoder. Following re-
downsampled point p form a collapsed points set C(p), and cent success in deep image compression [2, 3], we integrate
we term u = |C(p)| as the downsampling factor of point p. an arithmetic encoder into the training process to jointly op-
The density embedding FD captures the cardinality timize the entropy of the features. This process is accompa-
of C(p) by mapping the downsampling factor u to a d- nied by a rate loss function that will be introduced later in
dimensional embedding via MLPs. Secondly, the local po- Sec 3.3.
sition embedding captures the distribution of C(p). Specifi-
cally, for each pk ∈ C(p), the direction and distance of the 3.2. Density-recovering Decoder
offset pk − p are calculated as below:
Overview. During decoding, symmetrically, we have S up-
sampling stages. At the bottleneck, we have the downsam-
pled point cloud P̂S and decoded features F̂S extracted by
(\frac {\point _k-\point }{||\point _k-\point ||_2}, ||\point _k-\point ||_2), \point \in \pointcloud _{\stage +1}, \point _k \in \collapsedpointsset (\point ) (1) the entropy decoder. Recall that during encoding, for each
2335
Input Upsampled
Point Features Features
Transformer
Duplicate
Directions
ܷܰ ൈ ͵
Sub-point convolution. At upsampling stage s, guided by
Sub-point
the features F̂s+1 , we aim to upsample each point p̂ ∈ P̂s+1 Convolution Weights Upsampled
by the predicted upsampling factor û. Additionally, F̂s+1 ܷܰ ൈ ܯ Offsets Points
also need to be expanded to û features F̂s for the next ܷܰ ൈ ͵
Sub-point
stage. To achieve so, prior upsampling methods either use Convolution Scales
multi-branch MLPs for feature expansion [40, 44] or apply ܷܰ ൈ ͳ
Input Sub-point Upsampled Upsampled
folding-based [42] upsampling modules [19, 20, 43]. De- Convolution
Features Features Features
spite efforts of regularization and refinement, they still suf- ܰ ൈ݀ ܷܰ ൈ ݀ ܷܰ ൈ ݀
fer from the aforementioned clustered points artifact due to Duplicate
2336
points and features, a refinement layer is added to finetune normal compression into our framework. To avoid extra
the upsampled points and features. It is essentially an up- cost of bitrate, we fix the same network architecture and
sampling block with upsampling factor û = 1. hyperparameters. The only difference is the input/output
dimension has changed from 3D to 6D (position+normal).
3.3. Loss Function To facilitate this, we employ a simple L2 loss to minimize
We employ the standard rate-distortion loss function dur- the normal reconstruction error.
ing training for better trade-off.
4. Evaluation
\loss = \distortionloss + \rateweight \rateloss , (3)
In this section, we evaluate our method by comparing to
where D penalizes distortion and R penalizes bitrate. state-of-the-art methods on compression rate, reconstruc-
Distortion loss. For distortion (reconstuction error), we tion accuracy and local density recovering. We then pro-
utilize the symmetric point-to-point Chamfer Distance [16] vide ablation studies to justify the design choices. Lastly,
to measure the difference between the reconstructed point we demonstrate that additional attributes like normal can be
cloud P̂s and ground truth Ps . Since the decoder has S also compressed. Please refer to the supplementary section
stages, to avoid error accumulation, we compute the distor- for implementation details and parameter settings.
tion loss at each stage and aggregate them as Dcha . 4.1. Experiment Setup
A density term is also designed to encourage recovering
local density. At stage s of the decoder, a point p̂ is upsam- Datasets. We conduct our main experiments on Se-
ˆ (see Sec 3.2). We then
pled to a new chosen points set C(p̂) manticKITTI [4] and ShapeNet [9]. For SemanticKITTI,
find its nearest counter point p on the encoder side, which we follow the official training/testing split [4]. For
is collapsed from a set C(p) (see Sec 3.1). Hence we can ShapeNet, we follow [17] to split training/testing sets and
define the density loss Dden as: sample points from meshes based on [15]. All point clouds
are first normalized to 100m3 cubes and divided into non-
overlapping blocks of 12m3 and 22m3 for SemanticKITTI
\densityloss = \sum _{s=0}^{\totalstage -1} \sum _{\reconstructedpoint \in \reconstructedpointcloud _{\stage +1}} \frac { \left ||\collapsedpointsset (\point )| - |\upsampledpointsset (\reconstructedpoint )| \right | + \gamma \left |\mean {\collapsedpointsset (\point )} - \mean {\upsampledpointsset (\reconstructedpoint )}\right |}{|\reconstructedpointcloud _{\stage +1}|} and ShapeNet respectively, while each block is further nor-
malized to [-1, 1]. For downstream surface reconstruction
(4) task, we use the RenderPeople [1] dataset.
where the first term of numerator calculates the cardinality Baselines. We compare to both state-of-the-art non-
difference between the two sets, the second calculates the learning based methods: G-PCC [12], Google Draco [11],
difference between the mean distances of all points in sets MPEG Anchor [23]; and learning-based methods: De-
to center points p or p̂, and γ is the weight. peco [40], PCGC [38]. Note that all learning-based methods
To further facilitate the density estimation, for each stage have been retrained on the same datasets as our method.
s, we utilize another loss to measure the cardinality differ-
Evaluation metrics. Following [6, 16], we adopt the sym-
ence of ground truth Ps and reconstructed point cloud P̂s :
metric point-to-point Chamfer Distance (CD) and point-to-
plane PSNR for geometry accuracy and Bits per Point (Bpp)
\numpointsloss = \sum _{s=0}^{\totalstage -1} \left ||\pointcloud _{\stage }| - |\reconstructedpointcloud _{\stage }|\right | (5) for compression rate. Moreover, we design a new metric to
measure the local density differences. And all these metrics
Finally, the overall distortion loss is as follows: are evaluated on each block. Specifically, for each point
p, we notate its neighbor points within radius r = 0.15 as
\distortionloss = \chamferloss +\weightdensityloss \densityloss + \weightnumpointsloss \numpointsloss (6) K(p). Since the cardinalities of ground truth P0 and recon-
structed point cloud P̂0 are not necessarily the same, we
where α and β are the weights of respective terms. define a symmetric density metric DM as:
Rate loss. Since entropy encoding is non-differentiable,
a differentiable proxy is applied during training. Follow- \begin {aligned} DM(\pointcloud _0, \reconstructedpointcloud _0) &= \frac {1}{|\pointcloud _0|} \sum _{p \in \pointcloud _0} \delta (p, \hat {p}) + \frac {1}{|\reconstructedpointcloud _0|} \sum _{\hat {p} \in \reconstructedpointcloud _0} \delta (\hat {p},p),\\ \textrm {where}~ \delta (a, b) &= \frac {\left ||\neighbors (a)| -|\neighbors (b)|\right |}{|\neighbors (a)|} + \mu \frac {\left |\mean {\neighbors (a)} - \mean {\neighbors (b)}\right |}{\mean {\neighbors (a)}} \end {aligned}
ing [2, 3], we replace the quantization step with an additive
uniform noise, and estimate the number of bits as the rate
loss R. During inference, features are properly quantized
and compressed by a range encoder.
(7)
3.4. Attribute Compression
where b is the nearest counter point of a, µ is the weight,
Our framework can also compress point cloud attributes |K(a)| denotes the cardinality of K(a) and K(a) denotes
such as color, normal, etc. As an example, we incorporate the mean distance of all points in K(a) to a.
2337
GT
Ours
Bpp: 1.94 PSNR: 44.73 Bpp: 4.23 PSNR: 47.98 Bpp: 1.67 PSNR: 39.65 Bpp: 4.06 PSNR: 44.00
G-PCC
Bpp: 1.95 PSNR: 39.77 Bpp: 4.52 PSNR: 45.29 Bpp: 1.71 PSNR: 36.78 Bpp: 4.21 PSNR: 42.87
Draco
Bpp: 2.89 PSNR: 26.50 Bpp: 4.83 PSNR: 38.32 Bpp: 1.85 PSNR: 32.32 Bpp: 4.25 PSNR: 41.31
MPEG Anchor
Bpp: 2.56 PSNR: 24.61 Bpp: 4.89 PSNR: 38.65 Bpp: 1.99 PSNR: 34.79 Bpp: 4.16 PSNR: 40.93
Depoco
Bpp: 2.39 PSNR: 34.34 Bpp: 4.98 PSNR: 40.01 Bpp: 1.76 PSNR: 35.52 Bpp: 4.13 PSNR: 39.12
PCGC
Bpp: 2.54 PSNR: 36.02 Bpp: 4.91 PSNR: 40.22 Bpp: 1.69 PSNR: 29.44 Bpp: 4.09 PSNR: 39.97
Figure 6. Qualitative results on SemanticKITTI (the first two columns) and ShapeNet (the last two columns). From top to bottom: Ground
Truth, Ours, G-PCC [12], Draco [11], MPEG Anchor [23], Depeco [40] and PCGC [38]. We utilize the distance between each point in
decompressed point clouds and its nearest neighbor in ground truth as the error. And the Bpp and PSNR metrics are averaged by each
block of the full point clouds. It is obvious that our method successfully achieves both the most accurate geometry and lowest bitrates.
2338
Figure 7. Quantitative results on SemanticKITTI (the first row) and ShapeNet (the second row). Our method consistently achieves more
accurate geometry and recovering density across the full range of bitrates.
4.2. Comparison with SOTA Methods Enc. time (ms) Dec. time (ms) Size (MB)
G-PCC [12] 180/165 163/152 3.49
We first compare our method with SOTA on the rate- Draco [11] 147/153 147/153 2.49
distortion trade-off. In Fig 7, we show the per-block Cham- MPEG Anchor [23] 151/142 136/130 27.8
fer Distance, PSNR and density metric of all methods Depoco [40] 32/126 2/2 0.54
PCGC [38] 130/96 24/19 7.73
against Bits per Point (Bpp). Our method yields more accu-
rate reconstruction consistently across the full spectrum of Ours 80/81 24/31 0.70
Bpp on both SemanticKITTI and ShapeNet datasets. Note Table 1. The average per-block encoding time, decoding time and
the differences are more evident under the density metric. model size of different methods on SemanticKITTI/ShapeNet, us-
Fig 6 shows qualitative results at various bitrates. Draco ing a TITAN X GPU.
[11] and MPEG Anchor [23] typically need a high Bpp (e.g.
>4) to achieve a satisfactory reconstruction. Plus they per- we use their checkpoint sizes. Our model is competitive in
form poorly at low bitrates due to quantization. Depoco [40] computational efficiency, only second to Depoco [40] but
often generates clustered points caused by feature replica- achieves a better rate-distortion trade-off.
tion. PCGC [38] tends to miss a continuous chunk of points,
because it regards decompression as a binary classification 4.3. Ablation Study
process (occupied or not), which has extremely imbalanced For fair comparison, we conduct all the ablation experi-
data due to the intrinsic sparsity of point clouds. Besides, it ments on SemanticKITTI while fixing the Bpp at 2.1.
also significantly alters the density. Although G-PCC [12]
Effectiveness of each component. We build a baseline
recovers the overall geometry successfully, due to voxeliza-
model consisting of a point transformer encoder [46], en-
tion, it loses local details. Our method achieves the highest
tropy encoder and multi-branch MLPs decoder [44]. The
compression performance in terms of both geometry and lo-
proposed components, including dynamic upsampling fac-
cal density while spending the lowest bitrates.
tor û, local position embedding FP , density embedding
Complexity analysis. Table 1 shows the per-block latency FD , scale-adaptive upsampling block, sub-point convolu-
and memory footprint of different methods. For G-PCC tion and upsampling refinement layer, are then added incre-
[12], Draco [11] and MPEG Anchor [23], we use the sizes mentally, as shown in Table 2. All the modules contribute
of their executable files. For Depoco [40] and PCGC [38], to the reconstruction quality under a fixed Bpp.
2339
Components CD (10−2 ) ↓ PSNR ↑ DM ↓
Baseline 2.61 38.82 4.17
+û 2.29 39.64 3.23
+FP 1.67 40.96 3.02
+FD 1.32 41.68 2.58
+Adaptive Scale 0.98 42.49 2.31
+Sub-point Conv 0.45 43.73 2.07
+Refinement 0.36 44.03 1.98
Table 2. The effectiveness of each component in our method. Figure 9. Quantitative results of downstream tasks. Left: surface
Each row a component is added on top of the previous row. reconstruction on RenderPeople; right: semantic segmentation on
SemanticKITTI.
Effectiveness of our decoder. To show that our decoder,
consisting of our upsampling block and sub-point convo- MPEG Anchor [23], as shown in Fig 8. Our method consis-
lution, is more effective in leveraging the information pro- tently outperforms others, especially by a large margin on
vided by the encoder for recovering density, we utilize var- the SemanticKITTI dataset.
ious point upampling modules from previous works as the
4.5. Impact on Downstream Tasks
decoders to jointly train with our encoder, as shown in Ta-
ble 3. Our decoder significantly outperforms others on all Point cloud compression, as an upstream task, should not
the reconstruction quality metrics, indicating that our de- affect the performance of downstream applications much.
coder preserves geometry and local density better. In this section, we compare the impact of different com-
pression algorithms on two downstream tasks: surface re-
Decoders CD (10−2 ) ↓ PSNR ↑ DM ↓ construction and semantic segmentation. Since some meth-
Yu et al. [44] 1.25 41.51 2.60 ods do not support attribute compression, all methods only
Wang et al. [43] 1.03 42.54 2.46 compress the positions for fair comparison.
Li et al. [19] 0.98 42.57 2.45 In the surface reconstruction experiments, Poisson re-
Li et al. [20] 0.90 42.83 2.32 construction [18] is run on the full decompressed point
Qian et al. [26] 0.81 43.06 2.25
clouds. Reconstructed meshes are then compared with the
Ours 0.36 44.03 1.98 ground truth with the symmetric point-to-plane Chamfer
Table 3. The effectiveness of our decoder. In each row, we replace Distance [34]. For semantic segmentation, we train Polar-
our decoder with the decoder from another work. Net [45] on raw point clouds from SemanticKITTI training
set, and test on the full decompressed point clouds. The
mean intersection-over-union (IOU) is used as metric, fol-
lowing [16]. As shown in Fig 9, our method consistently
yields the best rate-distortion trade-off, which reiterates the
importance of recovering local density. Please refer to the
supplementary section for qualitative comparisons.
5. Conclusion
We introduce a novel deep point cloud compression
Figure 8. Quantitative normal compression results. Left: Se- framework that can preserve local density. Not only does
manticKITTI; right: ShapeNet. Our method consistently performs it yield the best rate-distortion trade-off against prior arts, it
better than Draco [11], G-PCC [12] and MPEG Anchor [23] across also recovers local density more accurately under our den-
the bitrate spectrum. sity metric. Qualitative results show that our algorithm can
mitigate the two main density issues of other methods: uni-
4.4. Normal Compression formly distributed and clustered points. Complexity wise
Besides positions, we also evaluate the capability of our method is only second to Depoco while with much bet-
compressing attributes, using normals as an example. The ter accuracy.
normals are concatenated with the point locations and fed Acknowledgments. This work was supported in part by
into our model. The decompressed locations and normals NSFC under Grant (No. 62076067), SMSTM Project
are then compared with the inputs by per-block F1 score [6]. (2021SHZDZX0103), and Shanghai Research and Innova-
As modifying learning based approaches such as PCGC tion Functional Program (17DZ2260900). Danhang Tang,
[38] and Depoco [40] to have attribute compression is non- Yinda Zhang and Yanwei Fu are the corresponding au-
trivial, we only compare to Draco [11], G-PCC [12] and thours.
2340
References [15] Pedro Hermosilla, Tobias Ritschel, Pere-Pau Vázquez, Àlvar
Vinacua, and Timo Ropinski. Monte carlo convolution for
[1] Renderpeople. https://renderpeople.com/free- learning on non-uniformly sampled point clouds. ACM
3d-people, 2018. 5 Transactions on Graphics (TOG), 37(6):1–12, 2018. 5
[2] Johannes Ballé, Valero Laparra, and Eero P Simoncelli.
[16] Lila Huang, Shenlong Wang, Kelvin Wong, Jerry Liu,
End-to-end optimized image compression. arXiv preprint
and Raquel Urtasun. Octsqueeze: Octree-structured en-
arXiv:1611.01704, 2016. 2, 3, 5
tropy model for lidar compression. In Proceedings of the
[3] Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin IEEE/CVF Conference on Computer Vision and Pattern
Hwang, and Nick Johnston. Variational image compression Recognition, pages 1313–1323, 2020. 2, 5, 8
with a scale hyperprior. arXiv preprint arXiv:1802.01436,
[17] Tianxin Huang and Yong Liu. 3d point cloud geometry
2018. 2, 3, 5
compression on deep learning. In Proceedings of the 27th
[4] Jens Behley, Martin Garbade, Andres Milioto, Jan Quen- ACM International Conference on Multimedia, pages 890–
zel, Sven Behnke, Cyrill Stachniss, and Jurgen Gall. Se- 898, 2019. 2, 5
mantickitti: A dataset for semantic scene understanding of
[18] Michael Kazhdan and Hugues Hoppe. Screened poisson sur-
lidar sequences. In Proceedings of the IEEE/CVF Interna-
face reconstruction. ACM Transactions on Graphics (ToG),
tional Conference on Computer Vision, pages 9297–9307,
32(3):1–13, 2013. 8
2019. 5
[19] Ruihui Li, Xianzhi Li, Chi-Wing Fu, Daniel Cohen-Or, and
[5] Jon Louis Bentley. Multidimensional binary search trees
Pheng-Ann Heng. Pu-gan: a point cloud upsampling ad-
used for associative searching. Communications of the ACM,
versarial network. In Proceedings of the IEEE/CVF Inter-
18(9):509–517, 1975. 2
national Conference on Computer Vision, pages 7203–7212,
[6] Sourav Biswas, Jerry Liu, Kelvin Wong, Shenlong Wang,
2019. 2, 4, 8
and Raquel Urtasun. Muscle: Multi sweep compres-
sion of lidar using deep entropy models. arXiv preprint [20] Ruihui Li, Xianzhi Li, Pheng-Ann Heng, and Chi-Wing Fu.
arXiv:2011.07590, 2020. 2, 5, 8 Point cloud upsampling via disentangled refinement. In Pro-
ceedings of the IEEE/CVF Conference on Computer Vision
[7] Andrew Brock, Theodore Lim, James M Ritchie, and
and Pattern Recognition, pages 344–353, 2021. 2, 4, 8
Nick Weston. Generative and discriminative voxel mod-
eling with convolutional neural networks. arXiv preprint [21] Daniel Maturana and Sebastian Scherer. Voxnet: A 3d con-
arXiv:1608.04236, 2016. 2 volutional neural network for real-time object recognition.
[8] Christian Bueno and Alan Hylton. On the representation In 2015 IEEE/RSJ International Conference on Intelligent
power of set pooling networks. In Thirty-Fifth Conference Robots and Systems (IROS), pages 922–928. IEEE, 2015. 2
on Neural Information Processing Systems, 2021. 1 [22] Donald Meagher. Geometric modeling using octree encod-
[9] Angel X Chang, Thomas Funkhouser, Leonidas Guibas, ing. Computer graphics and image processing, 19(2):129–
Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, 147, 1982. 2
Manolis Savva, Shuran Song, Hao Su, et al. Shapenet: [23] Rufael Mekuria, Kees Blom, and Pablo Cesar. Design, im-
An information-rich 3d model repository. arXiv preprint plementation, and evaluation of a point cloud codec for tele-
arXiv:1512.03012, 2015. 5 immersive video. IEEE Transactions on Circuits and Sys-
[10] Ricardo L De Queiroz and Philip A Chou. Compression of tems for Video Technology, 27(4):828–842, 2016. 2, 5, 6, 7,
3d point clouds using a region-adaptive hierarchical trans- 8
form. IEEE Transactions on Image Processing, 25(8):3947– [24] Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas.
3956, 2016. 2 Pointnet: Deep learning on point sets for 3d classification
[11] Frank Galligan, Michael Hemmer, Ondrej Stava, Fan Zhang, and segmentation. In Proceedings of the IEEE conference
and Jamieson Brettle. Google/draco: a library for com- on computer vision and pattern recognition, pages 652–660,
pressing and decompressing 3d geometric meshes and point 2017. 2
clouds. https : / / github . com / google / draco, [25] Charles R Qi, Li Yi, Hao Su, and Leonidas J Guibas. Point-
2018. 2, 5, 6, 7, 8 net++: Deep hierarchical feature learning on point sets in a
[12] D Graziosi, O Nakagami, S Kuma, A Zaghetto, T Suzuki, metric space. arXiv preprint arXiv:1706.02413, 2017. 2
and A Tabatabai. An overview of ongoing point cloud com- [26] Guocheng Qian, Abdulellah Abualshour, Guohao Li, Ali
pression standardization activities: video-based (v-pcc) and Thabet, and Bernard Ghanem. Pu-gcn: Point cloud upsam-
geometry-based (g-pcc). APSIPA Transactions on Signal and pling using graph convolutional networks. In Proceedings of
Information Processing, 9, 2020. 1, 2, 5, 6, 7, 8 the IEEE/CVF Conference on Computer Vision and Pattern
[13] Yulan Guo, Hanyun Wang, Qingyong Hu, Hao Liu, Li Liu, Recognition, pages 11683–11692, 2021. 8
and Mohammed Bennamoun. Deep learning for 3d point [27] Maurice Quach, Giuseppe Valenzise, and Frederic Dufaux.
clouds: A survey. IEEE transactions on pattern analysis and Learning convolutional transforms for lossy point cloud ge-
machine intelligence, 2020. 1 ometry compression. In 2019 IEEE International Confer-
[14] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. ence on Image Processing (ICIP), pages 4320–4324. IEEE,
Deep residual learning for image recognition. In Proceed- 2019. 2
ings of the IEEE conference on computer vision and pattern [28] Maurice Quach, Giuseppe Valenzise, and Frederic Dufaux.
recognition, pages 770–778, 2016. 4 Improved deep point cloud geometry compression. In 2020
2341
IEEE 22nd International Workshop on Multimedia Signal In Proceedings of the IEEE Conference on Computer Vision
Processing (MMSP), pages 1–6. IEEE, 2020. 2 and Pattern Recognition, pages 206–215, 2018. 2, 4
[29] Zizheng Que, Guo Lu, and Dong Xu. Voxelcontext-net: An [43] Wang Yifan, Shihao Wu, Hui Huang, Daniel Cohen-Or, and
octree based framework for point cloud compression. In Pro- Olga Sorkine-Hornung. Patch-based progressive 3d point set
ceedings of the IEEE/CVF Conference on Computer Vision upsampling. In Proceedings of the IEEE/CVF Conference
and Pattern Recognition, pages 6042–6051, 2021. 2 on Computer Vision and Pattern Recognition, pages 5958–
[30] Gernot Riegler, Ali Osman Ulusoy, and Andreas Geiger. 5967, 2019. 2, 4, 8
Octnet: Learning deep 3d representations at high resolutions. [44] Lequan Yu, Xianzhi Li, Chi-Wing Fu, Daniel Cohen-Or, and
In Proceedings of the IEEE conference on computer vision Pheng-Ann Heng. Pu-net: Point cloud upsampling network.
and pattern recognition, pages 3577–3586, 2017. 2 In Proceedings of the IEEE Conference on Computer Vision
[31] Radu Bogdan Rusu and Steve Cousins. 3d is here: Point and Pattern Recognition, pages 2790–2799, 2018. 2, 4, 7, 8
cloud library (pcl). In 2011 IEEE international conference [45] Yang Zhang, Zixiang Zhou, Philip David, Xiangyu Yue, Ze-
on robotics and automation, pages 1–4. IEEE, 2011. 2 rong Xi, Boqing Gong, and Hassan Foroosh. Polarnet: An
[32] Ruwen Schnabel and Reinhard Klein. Octree-based point- improved grid representation for online lidar point clouds se-
cloud compression. In PBG@ SIGGRAPH, pages 111–120, mantic segmentation. In Proceedings of the IEEE/CVF Con-
2006. 2 ference on Computer Vision and Pattern Recognition, pages
[33] Wenzhe Shi, Jose Caballero, Ferenc Huszár, Johannes Totz, 9601–9610, 2020. 8
Andrew P Aitken, Rob Bishop, Daniel Rueckert, and Zehan [46] Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip HS Torr, and
Wang. Real-time single image and video super-resolution Vladlen Koltun. Point transformer. In Proceedings of the
using an efficient sub-pixel convolutional neural network. In IEEE/CVF International Conference on Computer Vision,
Proceedings of the IEEE conference on computer vision and pages 16259–16268, 2021. 2, 3, 7
pattern recognition, pages 1874–1883, 2016. 4
[34] Danhang Tang, Saurabh Singh, Philip A Chou, Christian
Hane, Mingsong Dou, Sean Fanello, Jonathan Taylor, Philip
Davidson, Onur G Guleryuz, Yinda Zhang, et al. Deep im-
plicit volume compression. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition,
pages 1293–1303, 2020. 8
[35] Hugues Thomas, Charles R Qi, Jean-Emmanuel Deschaud,
Beatriz Marcotegui, François Goulette, and Leonidas J
Guibas. Kpconv: Flexible and deformable convolution for
point clouds. In Proceedings of the IEEE/CVF International
Conference on Computer Vision, pages 6411–6420, 2019. 2
[36] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko-
reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia
Polosukhin. Attention is all you need. In Advances in neural
information processing systems, pages 5998–6008, 2017. 2,
3
[37] Jianqiang Wang, Dandan Ding, Zhu Li, and Zhan Ma. Multi-
scale point cloud geometry compression. In 2021 Data Com-
pression Conference (DCC), pages 73–82. IEEE, 2021. 2
[38] Jianqiang Wang, Hao Zhu, Haojie Liu, and Zhan Ma. Lossy
point cloud geometry compression via end-to-end learning.
IEEE Transactions on Circuits and Systems for Video Tech-
nology, 2021. 2, 5, 6, 7, 8
[39] Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma,
Michael M Bronstein, and Justin M Solomon. Dynamic
graph cnn for learning on point clouds. Acm Transactions
On Graphics (tog), 38(5):1–12, 2019. 2
[40] Louis Wiesmann, Andres Milioto, Xieyuanli Chen, Cyrill
Stachniss, and Jens Behley. Deep compression for dense
point cloud maps. IEEE Robotics and Automation Letters,
6(2):2060–2067, 2021. 1, 2, 4, 5, 6, 7, 8
[41] Wei Yan, Shan Liu, Thomas H Li, Zhu Li, Ge Li, et al.
Deep autoencoder-based lossy geometry compression for
point clouds. arXiv preprint arXiv:1905.03691, 2019. 2
[42] Yaoqing Yang, Chen Feng, Yiru Shen, and Dong Tian. Fold-
ingnet: Point cloud auto-encoder via deep grid deformation.
2342