GN-CNN: A Point Cloud Analysis Method For Metaverse Applications
GN-CNN: A Point Cloud Analysis Method For Metaverse Applications
Article
GN-CNN: A Point Cloud Analysis Method for
Metaverse Applications
Qian Sun 1 , Yueran Xu 2 , Yidan Sun 3, * , Changhua Yao 1 , Jeannie Su Ann Lee 4 and Kan Chen 4
1 School of Electronic and Information Engineering, Nanjing University of Information Science and Technology,
Nanjing 210044, China
2 National Supercomputing Center in Tianjin, Tianjin 300456, China
3 Cyber Security Research Centre @ NTU, Nanyang Technological University, Singapore 637335, Singapore
4 Infocomm Technology Cluster, Singapore Institute of Technology, Singapore 138683, Singapore
* Correspondence: [email protected]
Abstract: Metaverse applications often require many new 3D point cloud models that are unlabeled
and that have never been seen before; this limited information results in difficulties for data-driven
model analyses. In this paper, we propose a novel data-driven 3D point cloud analysis network
GN-CNN that is suitable for such scenarios. We tackle the difficulties with a few-shot learning
(FSL) approach by proposing an unsupervised generative adversarial network GN-GAN to generate
prior knowledge and perform warm start pre-training for GN-CNN. Furthermore, the 3D models
in the Metaverse are mostly acquired with a focus on the models’ visual appearances instead of the
exact positions. Thus, conceptually, we also propose to augment the information by unleashing and
incorporating local variance information, which conveys the appearance of the model. This is realized
by introducing a graph convolution-enhanced combined multilayer perceptron operation (CMLP),
namely GCMLP, to capture the local geometric relationship as well as a local normal-aware GeoConv,
namely GNConv. The GN-GAN adopts an encoder–decoder structure and the GCMLP is used as the
core operation of the encoder. It can perform the reconstruction task. The GNConv is used as the
convolution-like operation in GN-CNN. The classification performance of GN-CNN is evaluated on
ModelNet10 with an overall accuracy of 95.9%. Its few-shot learning performance is evaluated on
ModelNet40, when the training set size is reduced to 30%, the overall classification accuracy can reach
91.8%, which is 2.5% higher than Geo-CNN. Experiments show that the proposed method could
Citation: Sun, Q.; Xu, Y.; Sun, Y.; improve the accuracy in 3D point cloud classification tasks and under few-shot learning scenarios,
Yao, C.; Lee, J.S.A.; Chen, K. compared with existing methods such as PointNet, PointNet++, DGCNN, and Geo-CNN, making it a
GN-CNN: A Point Cloud Analysis beneficial method for Metaverse applications.
Method for Metaverse Applications.
Electronics 2023, 12, 273. https:// Keywords: 3D point cloud; Metaverse; convolution neural network; generative adversarial network;
doi.org/10.3390/electronics12020273 few-shot
Academic Editor: Stefanos Kollias
feature (experience) vector. Then the vector is plugged in as prior knowledge into the
backbone network of GNConv, which is used as the convolution-like operation in GN-CNN.
Using the original and downsampled input helps to learn multi-scale point cloud features.
Figure 1. The overall architecture for warm starting GN-CNN by pre-training a GN-GAN. GN-GAN
generates and captures prior knowledge as the experience knowledge (vector) of the point clouds.
The experience vector is plugged into GN-CNN for warm starting. For GN-GAN, the left orange
point clouds are the ground truth. We use two resolutions—the top is downsampled and the bottom
is the original one; this helps to learn multi-scale point cloud features. The green point clouds are the
generated results.
Furthermore, the 3D models in the Metaverse are usually acquired with a focus
on the model’s visual appearances instead of the exact point positions, for example, for
driving [7] and indoor navigation [8], more attention is usually paid to the model’s appear-
ance to make sure the models look good, instead of the model actual accuracy, which is
usually the focus of computer-aided design (CAD). Based on this, unlike existing (few-
shot) learning methods [9,10], we propose unleashing the model appearance to further
augment the information. The model’s local surface variance and normal information
determine how light is reflected, and how the model’s appearance is perceived by the
viewers. Therefore, they are utilized in the proposed pipeline: First, in the encoder
of the encoder/decoder framework of the GN-GAN generator, we introduced a graph
convolution-enhanced combined multilayer perceptron operation (CMLP [11]), namely
GCMLP, by adding several graph convolution layers. Second, in GN-CNN, we utilize local
neighborhood normal information.
Different from the existing graph convolution-based methods, such as DGCNN [12]
and GeoConv [13], we used graph convolution to enhance CMLP [11], which does not
handle geometric relationships well. By integrating the capability of CMLP in learning
different levels of feature information and the capability of graph convolution in learning
local spatial connection relationships, the local surface variance information can be better
learned. Together with the normal information, the appearance of the model can be
better captured.
In this paper, we focus on classification and few-shot learning (GN-CNN) as well
as reconstruction (GU-GAN) tasks of 3D point cloud analysis. Our contributions are
summarized as the following:
(1) We propose a convolution neural network framework GN-CNN that is suitable for
Metaverse applications. We propose a few-shot learning approach using an unsuper-
vised generative adversarial network GU-GAN to warm start GN-CNN by generating
prior knowledge.
Electronics 2023, 12, 273 3 of 16
2. Related Work
2.1. Deep Learning on 3D Point Cloud Analysis
Unlike the 2D image, the 3D point cloud does not have a straightforward regu-
lar grid (pixels), so it cannot use CNN directly. Therefore, some early work converts
point clouds to other regularized representations, such as volumetric 2D views [14–19] or
representations [20–24]. Although voxelization-based methods and view-based methods
can achieve good results on point cloud analysis tasks, the resolution of voxel grids always
seriously affects the efficiency of the neural network, and view-based methods always
consume much time in data preparation. PointNet [1] and PointNet++ [25] first use the 3D
coordinates of the point as direct input. Some works [26–29] also input point clouds directly
and construct their own networks based on PointNet [1] and PointNet++ [25]. However,
they extract features on each point independently, which cannot easily learn geometric
relationships among points of the point cloud.
2.4. Summary
Please refer to Table 1 for a comparison with other methods.
Current few-shot learning methods [9,10,33] of 3D point clouds mostly learn semantic
information from the support point sets and then generalize it to analyze the query point
sets. Different from the existing methods, we use a GAN-based and appearance-aware
approach. Our methods can improve the accuracy of results compared to the voxel-based
and non-graph-based methods; please refer to the results section.
3. Method
This section presents the proposed GN-CNN and GN-GAN and the core operations in
the two networks. We will first introduce our overall architecture, GN-CNN, and warm
start strategy, then describe the structure of GCMLP and GN-GAN.
Figure 2. The overall architecture of the proposed method. The architecture includes two essential
networks, GN-CNN and GN-GAN. GN-CNN is our backbone network for classification and few-shot
learning. GN-GAN is for the reconstruction task and is pre-trained to obtain the prior knowledge
(represented by the red box) to warm start GN-CNN.
Rout = GNCNND ( A( Fv , Ev ))
(1)
= GNCNND ( A( GNCNNE ( P), GNE ( P))),
where GNE () represents the encoder of GN-GAN, P is a point cloud, A() is the aggregation
function, we use element-wise add as the aggregation function.
3.3. GCMLP
CMLP [11] takes the maximum value in each channel on low-level and high-level
through MLP and then concatenates the vector after the max pooling. CMLP can integrate
Electronics 2023, 12, 273 6 of 16
low-level and high-level features to learn multi-level information better, but it does not
handle the geometric relationship information among points.
Since graph convolution can handle geometric relationships among points, espe-
cially for the local neighborhood learning of point clouds, we propose GCMLP, as shown
in Figure 4, to improve CMLP performance by using a graph convolution layer named
GMLP. In GCMLP, three MLP layers in CMLP are replaced with GMLP layers, while still
concatenating the low-level and high-level feature vectors after max pooling. For a GMLP
layer, we first use k-NN for each point pi (i = 1, 2, . . . , N ) within N points to find the k
nearest neighbors p j ( j = 1, 2, . . . , k) and take k neighboring points p j as source points and
pi as target points and construct an edge p j − pi (including the self-loop from pi to pi ).
Finally, we concatenate [ p j − pi , pi ], and then go through the MLP layer.
Figure 4. GCMLP in the generator of GN-GAN. We use three GMLP layers to replace MLP layers
in CMLP. Combining both low-level and high-level features, the local surface information can be
captured better.
GCMLP first uses an MLP to encode the dimensions of the original point cloud to
[64], then uses three GMLP layers to encode the dimensions to [64, 128, 256], and finally
is followed by two MLP layers to encode the dimensions to [512, 1024]. We max pool the
output of the last four layers and concatenate the four vectors. The lengths of the four
vectors are [128, 256, 512, 1024].
Figure 6. The structure of the decoder in the GN-GAN generator. The decoder uses reshape, Conv
layer, and FC layer to output two point clouds with the same resolutions as input.
The reconstruction loss of GN-GAN is defined using the Chamfer distance. Chamfer
distance is the average nearest squared distance between two point clouds. Define the
ground truth point cloud as Preal = { p1 , p2 , . . . , p N }, and the reconstructed point cloud is
Pf ake = { p10 , p20 , . . . , p0N }, then Chamfer distance LCD between Pf ake and Preal is divided into
two parts CD1 and CD2 :
N
1
CD1 =
Preal ∑ j∈[min
1,N ]
k pi − p0j k22 (2)
i =1
N
1
CD2 =
Pf ake ∑ i∈[min
1,N ]
k pi − p0j k22 . (3)
j =1
where LCDh is the Chamfer distance between the high-resolution reconstructed point
cloud and the ground truth, and LCDl is the Chamfer distance between the low-resolution
reconstructed point cloud and the corresponding ground truth, and α is a hyperparameter.
The structure of the discriminator is shown in Figure 5, and we divide the adversarial
loss of the GN-GAN discriminator into two parts, which correspond to inputting the
ground truth point cloud into the discriminator and inputting the reconstructed point
cloud into the discriminator, respectively. The generator mapping process is defined as
G (·) and the discriminator mapping process is defined as D (·), then the adversarial loss is
defined as:
3.5. GNConv
GNConv is designed to capture the visual variance of the local surface formed by the
point cloud and augment the information with it. Since the normal of a point can depict
the information and appearance of the local neighborhood, so we use normals. As shown
in Figure 7, dnormal represents the distance between the two normals, reflecting the visual
variance of the surface. Conceptually, when dnormal is large, the local surface appearances
vary violently, which means there are more features (important information).
Figure 7. We use the distance of two normals to reflect the visual variance of the local surface formed
by the point cloud. The blue arrows represent the normals of the two triangle meshes. dnormal is the
distance between the two normals.
Inspired by GeoConv [13], GNConv includes the distance of normals between the
center point and the neighboring points on the local neighborhood graph in each iteration.
Define the point cloud as P = { p1 , p2 , ..., p N }, where pi = { xi , yi , zi }. Taking each point of P
as the center point, take the spherical neighborhood S( pi ) = { p j |k pi − p j k ≤ r } with radius
r, and the normal set is NP = {n p1 , n p2 , . . . , n p N }. As the iteration progresses, gradually
increase r to obtain a larger receptive field. Define X lpi ∈ RC as the feature of pi obtained
after the lth iteration of GNConv, then the definition of GNConv is as follows:
Electronics 2023, 12, 273 9 of 16
f d ( pi , p j , r ) f n ( pi , p j )
X lpi = Wc X lp−i 1 + We ∑ p ∈S( p ) . (9)
j i f d ( pi , p j , r )
As shown in Equation (9), the feature of pi on the lth layer has two parts. The first
part is the output of the previous layerX lp−i 1 multiplied by the center point weight Wc to
extract the center point feature, and the second part is the edge property between the center
point and neighboring points in the spherical neighborhood. The corresponding feature is
multiplied by the weight We and We to enlarge the feature dimension to be the same as the
dimension of the first part. In the second part, f n ( pi , p j ) is defined as:
Before each iteration, we take each point pi as the center of the local spherical neigh-
borhood and determine the neighborhood S( pi ) with a radius of r. The normal edge feature
n( pi , p j ) between the center point and the neighboring points in the spherical neighborhood
is defined as:
n ( p i , p j ) = n p j − n pi , (11)
where p j ∈ S( p j ), Wn is the weight of normal edge features. The edge feature X le−
ij
1 is
defined as:
X le−
ij
1
= X lp−j 1 − X lp−i 1 . (12)
f d ( pi , p j , r ) = (r − k pi − p j k)2 . (13)
4. Experiments
4.1. Implementation Details
In our experiments, the GPU used in the classification experiment and the ablation
experiment was NVIDIA GeForce GTX 2080ti; the GPU used in the few-shot learning
experiment and the visualization of GN-GAN reconstruction results was NVIDIA GeForce
GTX 1070.
In the 3D point cloud classification experiment and the few-shot learning experiment,
the input 3D point cloud of GN-CNN contained 1024 points. The radii for three GNConv
layers were 0.15, 0.3, and 0.6. The first branch and the second branch both used the Adam
optimizer and the cosine annealing strategy. The training batch_size was 10, and the test
batch_size was 8.
When pre-training GN-GAN, the input point cloud resolution was 1024 points and 64
points after IFPS downsampling with K = 16. The generator used the Adam optimizer and
StepLR to update the parameters every 40 epochs. The discriminator also used the Adam
optimizer and StepLR to update the parameters every 40 epochs. The training batch_size
was 16, and the test batch_size was 8.
Electronics 2023, 12, 273 10 of 16
4.2. Dataset
Our experiments used ModelNet40 [20] and ModelNet10 [20]. For classification,
we carried out experiments on both ModelNet10 and ModelNet40. The input point
cloud resolution was 1024 points, and we used all samples from the training and test sets.
For few-shot learning, we used ModelNet40 and randomly reduced 50%, 70%, 90%, and
95% of the total number of training samples; that is, using 50%, 30%, 10%, and 5% of the
total number of training samples. We kept the number of testing samples unchanged, so
the few-shot learning experiment used five training set sizes, including the original training
set. We used ModelNet40 to pre-train GN-GAN for experience knowledge.
4.3. Classification
We used warm-start GN-CNN to perform classification experiments on ModelNet40
and ModelNet10. The experiment’s results are shown in Tables 2 and 3, where MA is the
mean per-class accuracy and OA is the overall accuracy.
Method OA (%)
3DShapeNets [20] 83.5
VoxNet [21] 92.0
MLH-MV [40] 94.8
KCNet [41] 94.4
SO-Net [42] 95.7
LP-3DCNN [43] 94.4
MHBN [19] 95.0
3DCapsule [44] 94.7
VIPGAN [45] 94.1
Point2Sequence [46] 95.3
Ours 95.9
The results on ModelNet40 are shown in Table 2. Using our method, MA is 90.5%
and OA is 93.0%; OA increased by 3.8% compared with PointNet [1], and 2.3% compared
with PointNet++ [25]. Through a warm start strategy, the OA of our method increased
by 0.1% compared to the milestone work DGCNN [12] in the field of graph convolution.
Note that our result did not reach the reported OA of 93.4% in the original Geo-CNN [13]
when training 200 epochs using the same network structure and parameter settings, but it
is slightly better than the OA of the reproduced Geo-CNN, which is 92.9% (the same as
DGCNN). In our opinion, there are two main reasons: first, the reproduction process is
different from the original paper, which causes a different training result, and second, the
hardware and the number of epochs are different. Using more epochs, similar results as in
the original paper should be achieved.
Electronics 2023, 12, 273 11 of 16
The OA of our method on ModelNet10 can reach 95.9%. As shown in Table 3, the OA
increased by 12.4% compared with 3DShapeNets [20], and increased by 1.5% compared
with KCNet. Compared with some work proposed in recent years, such as LP-3DCNN,
3DCapsule, VIPGAN, Point2Sequence, and MHBN, our result on ModelNet10 is the best.
Moreover, on the official website of Princeton ModelNet, the overall accuracy ranking of
the work performed on ModelNet10 for classification tasks is provided as the ModelNet
benchmark leaderboard, and the performance of GN-CNN is ranked fifth.
We can see from Table 4 that our GN-CNN on the original training set can achieve
similar results as DGCNN and Geo-CNN, but after the size of the training set is reduced,
the OA and MA of GN-CNN are better than Geo-CNN. When the training set size reduces
to 30%, the OA of GN-CNN classification increases by 2.5% compared with Geo-CNN,
and the MA increases by 3.1%. Moreover, since the number of samples in the ModelNet40
training set is 9843, and the number of samples in the test set is 2468, there are only about
500 samples in the training set when we reduce the size to 5% of the original training set,
and the number of training samples only accounts for about 20% of the number of samples
in the test set. In this case, compared with DGCNN, the OA of GN-CNN increased by 3.3%,
and the MA increased by 4.6%.
Figure 8. Reconstruction heatmap of GN-GAN. We first obtain the difference between each point
in the ground truth and the reconstructed point cloud, and then the difference is projected onto the
corresponding point of ground truth. The heat map is formed according to the difference.
Method OA (%)
Baseline 80.3
Baseline (GNConv) 81.1
Baseline (GNConv) + CMLP 82.4
Baseline + GCMLP 81.6
GN-CNN 82.6
To further compare the performance of GCMLP and CMLP, two layers of GCMLP in
GN-GAN are replaced with CMLP and named GN-GAN (CMLP). As shown in Table 6,
we use ground truth and predicted (GT-pred) error to evaluate the performance. GT-
pred error computes the average squared distance from each point in the ground truth
to its closest in the reconstructed point cloud. We observed that when using GCMLP to
extract features, the result is smaller, which means that the reconstruction result of GN-
GAN is better than that of GN-GAN (CMLP). This indicts that GCMLP has a stronger
feature-extracting ability.
Method OA (%)
CMLP 0.002403
GCMLP 0.002281
5. Conclusions
In this paper, we propose a convolutional neural network framework GN-CNN with
normal-aware convolution operation GNConv, which is suitable for Metaverse applications.
To solve the problem of few-shot learning of point clouds, we propose an unsupervised gen-
erative adversarial network GN-GAN, based on graph convolution-enhanced multilayer
perceptron operation GCMLP for point cloud reconstruction (for extracting experience
knowledge to warm start GN-CNN). The performance was evaluated on ModelNet40
and ModelNet10.
It is still challenging to reconstruct all local detailed features and the accuracy still has
room to improve. In the future, we plan to study more geometric characteristics of the point
cloud to improve the feature extraction performance and explore more prior knowledge of
point cloud geometric information to promote the learning of the backbone network.
Author Contributions: Conceptualization, Q.S. and Y.S.; methodology, Q.S., Y.X. and C.Y.; software,
Q.S. and Y.X.; validation, C.Y.; investigation, Q.S., Y.X. and Y.S.; writing—original draft preparation,
Q.S., Y.X., Y.S., J.S.A.L. and K.C.; writing—review and editing, J.S.A.L. and K.C.; supervision, Y.S. and
K.C. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by The Startup Foundation for Introducing Talent of NUIST
grant number U22B2002, National Natural Science Foundation of China grant number 2022r075 and
Natural Science Foundation of Jiangsu Province grant number BK20191329.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: 3rd Party Data.
Conflicts of Interest: The authors declare no conflict of interest.
Abbreviations
The following abbreviations are used in this manuscript:
Abbreviation Explanation
CNN convolutional neural network
(C)MLP (combined) multilayer perceptron
GAN generative adversarial network
Conv convolution
FC fully connected
CD Chamfer distance
IFPS iterative farthest point sampling
MA mean per-class accuracy
OA overall accuracy
Electronics 2023, 12, 273 14 of 16
References
1. Qi, C.; Su, H.; Kaichun, M.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation.
In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July
2017; pp. 77–85. [CrossRef]
2. Sarmad, M.; Lee, H.J.; Kim, Y.M. RL-GAN-Net: A Reinforcement Learning Agent Controlled GAN Network for Real-Time
Point Cloud Shape Completion. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 5891–5900. [CrossRef]
3. He, C.; Zeng, H.; Huang, J.; Hua, X.S.; Zhang, L. Structure Aware Single-Stage 3D Object Detection From Point Cloud.
In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA,
13–19 June 2020; pp. 11870–11879. [CrossRef]
4. Shi, W.; Rajkumar, R. Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud. In Proceedings of the 2020
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1708–1716.
[CrossRef]
5. Garcez, A.S.D.; Broda, K.B.; Gabbay, D.M., Neural-Symbolic Learning Systems. In Neural-Symbolic Cognitive Reasoning; Springer:
Berlin/Heidelberg, Germany, 2009; pp. 35–54. [CrossRef]
6. Hu, Z.; Ma, X.; Liu, Z.; Hovy, E.; Xing, E. Harnessing Deep Neural Networks with Logic Rules. In Proceedings of the 54th Annual
Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016.
7. Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-view 3D Object Detection Network for Autonomous Driving. In Proceedings of the
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 july 2017; pp. 6526–6534.
[CrossRef]
8. Zhu, Y.; Mottaghi, R.; Kolve, E.; Lim, J.J.; Gupta, A.; Fei-Fei, L.; Farhadi, A. Target-driven visual navigation in indoor scenes using
deep reinforcement learning. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA),
Singapore, 29 May–3 June 2017; pp. 3357–3364. [CrossRef]
9. Zhao, N.; Chua, T.S.; Lee, G.H. Few-shot 3d point cloud semantic segmentation. In Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 8873–8882.
10. Kang, D.; Cho, M. Integrative Few-Shot Learning for Classification and Segmentation. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 9979–9990.
11. Huang, Z.; Yu, Y.; Xu, J.; Ni, F.; Le, X. PF-Net: Point Fractal Network for 3D Point Cloud Completion. In Proceedings of the 2020
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 7659–7667.
[CrossRef]
12. Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic Graph CNN for Learning on Point Clouds.
ACM Trans. Graph. 2019, 38, 1–12. [CrossRef]
13. Lan, S.; Yu, R.; Yu, G.; Davis, L.S. Modeling Local Geometric Structure of 3D Point Clouds Using Geo-CNN. In Proceedings of
the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019;
pp. 998–1008. [CrossRef]
14. Su, H.; Maji, S.; Kalogerakis, E.; Learned-Miller, E. Multi-view Convolutional Neural Networks for 3D Shape Recognition.
In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015;
pp. 945–953. [CrossRef]
15. Yang, Z.; Wang, L. Learning Relationships for Multi-View 3D Object Recognition. In Proceedings of the 2019 IEEE/CVF
International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7504–7513.
[CrossRef]
16. Wei, X.; Yu, R.; Sun, J. View-GCN: View-Based Graph Convolutional Network for 3D Shape Analysis. In Proceedings of the 2020
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1847–1856.
[CrossRef]
17. Lawin, F.J.; Danelljan, M.; Tosteberg, P.; Bhat, G.; Khan, F.S.; Felsberg, M. Deep Projective 3D Semantic Segmentation.
In Proceedings of the International Conference on Computer Analysis of Images and Patterns, Ystad, Sweden, 22–24 August 2017;
Felsberg, M., Heyden, A., Krüger, N., Eds.; Springer International Publishing: Cham, Switzerland, 2017; pp. 95–107.
18. Feng, Y.; Zhang, Z.; Zhao, X.; Ji, R.; Gao, Y. GVCNN: Group-View Convolutional Neural Networks for 3D Shape Recognition.
In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22
June 2018; pp. 264–272. [CrossRef]
19. Yu, T.; Meng, J.; Yuan, J. Multi-view Harmonized Bilinear Network for 3D Object Recognition. In Proceedings of the 2018
IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 186–194.
[CrossRef]
20. Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.; Xiao, J. 3D ShapeNets: A deep representation for volumetric shapes.
In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June
2015; pp. 1912–1920. [CrossRef]
21. Maturana, D.; Scherer, S. VoxNet: A 3D Convolutional Neural Network for real-time object recognition. In Proceedings of the
2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September–3 October
2015; pp. 922–928. [CrossRef]
Electronics 2023, 12, 273 15 of 16
22. Riegler, G.; Ulusoy, A.O.; Geiger, A. OctNet: Learning Deep 3D Representations at High Resolutions. In Proceedings of the 2017
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6620–6629.
[CrossRef]
23. Huang, J.; You, S. Point cloud labeling using 3D Convolutional Neural Network. In Proceedings of the 2016 23rd International
Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4–8 December 2016; pp. 2670–2675. [CrossRef]
24. Rethage, D.; Wald, J.; Sturm, J.; Navab, N.; Tombari, F. Fully-Convolutional Point Networks for Large-Scale Point Clouds.
In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; Ferrari, V., Hebert, M., Sminchisescu, C.,
Weiss, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 625–640.
25. Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Proceedings
of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran
Associates Inc.: Red Hook, NY, USA, 2017; pp. 5105–5114.
26. Aoki, Y.; Goforth, H.; Srivatsan, R.A.; Lucey, S. PointNetLK: Robust amp; Efficient Point Cloud Registration Using PointNet.
In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA,
15–20 June 2019; pp. 7156–7165. [CrossRef]
27. Zhao, H.; Jiang, L.; Fu, C.W.; Jia, J. PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing. In Proceedings
of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019;
pp. 5560–5568. [CrossRef]
28. Lin, H.; Xiao, Z.; Tan, Y.; Chao, H.; Ding, S. Justlookup: One Millisecond Deep Feature Extraction for Point Clouds By
Lookup Tables. In Proceedings of the 2019 IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China,
8–12 July 2019; pp. 326–331. [CrossRef]
29. Sun, X.; Lian, Z.; Xiao, J. SRINet: Learning Strictly Rotation-Invariant Representations for Point Cloud Classification and
Segmentation. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; Association
for Computing Machinery: New York, NY, USA, 2019; pp. 980–988. [CrossRef]
30. Liu, Y.; Fan, B.; Xiang, S.; Pan, C. Relation-Shape Convolutional Neural Network for Point Cloud Analysis. In Proceedings of
the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019;
pp. 8887–8896. [CrossRef]
31. Achlioptas, P.; Diamanti, O.; Mitliagkas, I.; Guibas, L. Learning Representations and Generative Models for 3D Point Clouds.
In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018.
32. Valsesia, D.; Fracastoro, G.; Magli, E. Learning Localized Generative Models for 3D Point Clouds via Graph Convolution.
In Proceedings of the ICLR 2019, New Orleans, LA, USA, 6–9 May 2019.
33. Dong, N.; Xing, E.P. Few-shot semantic segmentation with prototype learning. In Proceedings of the BMVC, Newcastle, UK,
3–6 September 2018; Volume 3.
34. Kontschieder, P.; Fiterau, M.; Criminisi, A.; Bulò, S.R. Deep Neural Decision Forests. In Proceedings of the 2015 IEEE International
Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1467–1475. [CrossRef]
35. Zaheer, M.; Kottur, S.; Ravanbhakhsh, S.; Póczos, B.; Salakhutdinov, R.; Smola, A.J. Deep Sets. In Proceedings of the
31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017;
Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 3394–3404.
36. Simonovsky, M.; Komodakis, N. Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs.
In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 july
2017; pp. 29–38. [CrossRef]
37. Xu, Y.; Fan, T.; Xu, M.; Zeng, L.; Qiao, Y. SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters.
In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; Ferrari, V., Hebert, M., Sminchisescu, C.,
Weiss, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 90–105.
38. Klokov, R.; Lempitsky, V. Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models. In Proceedings
of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 863–872. [CrossRef]
39. Atzmon, M.; Maron, H.; Lipman, Y. Point Convolutional Neural Networks by Extension Operators. ACM Trans. Graph. 2018, 37.
[CrossRef]
40. Sarkar, K.; Hampiholi, B.; Varanasi, K.; Stricker, D. Learning 3D Shapes as Multi-layered Height-Maps Using 2D Convolutional
Networks. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; Ferrari, V., Hebert, M.,
Sminchisescu, C., Weiss, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 74–89.
41. Shen, Y.; Feng, C.; Yang, Y.; Tian, D. Mining Point Cloud Local Structures by Kernel Correlation and Graph Pooling.
In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA,
18–22 June 2018; pp. 4548–4557. [CrossRef]
42. Li, J.; Chen, B.M.; Lee, G.H. SO-Net: Self-Organizing Network for Point Cloud Analysis. In Proceedings of the
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018;
pp. 9397–9406. [CrossRef]
43. Kumawat, S.; Raman, S. LP-3DCNN: Unveiling Local Phase in 3D Convolutional Neural Networks. In Proceedings of the
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019;
pp. 4898–4907. [CrossRef]
Electronics 2023, 12, 273 16 of 16
44. Cheraghian, A.; Petersson, L. 3DCapsule: Extending the Capsule Architecture to Classify 3D Point Clouds. In Proceedings of
the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa Village, HI, USA, 7–11 January 2019;
pp. 1194–1202. [CrossRef]
45. Han, Z., S.M.L.Y.S.Z.M. View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global
Shape Memories to Support Local View Predictions. In Proceedings of the 2019 AAAI Conference on Artificial Intelligence,
Honolulu, HI, USA, 27 January–1 February 2019.
46. Liu, X.; Han, Z.; Liu, Y.; Zwicker, M. Point2Sequence: Learning the Shape Representation of 3D Point Clouds with an Attention-
Based Sequence to Sequence Network. In Proceedings of the 2019 AAAI Conference on Artificial Intelligence, Honolulu, HI, USA,
27 January–1 February 2019; Volume 33, pp. 8778–8785. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.