0% found this document useful (0 votes)
112 views

3D Shape Analysis Using The CNN

This document discusses using convolutional neural networks (CNNs) for 3D shape analysis. It summarizes previous work applying CNNs to different 3D representations like point clouds, voxels, and meshes. It then discusses MeshCNN, a CNN designed for triangular meshes, but notes its limited receptive field. The document proposes improving MeshCNN to better capture global features for tasks like shape classification and segmentation.

Uploaded by

hsisj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
112 views

3D Shape Analysis Using The CNN

This document discusses using convolutional neural networks (CNNs) for 3D shape analysis. It summarizes previous work applying CNNs to different 3D representations like point clouds, voxels, and meshes. It then discusses MeshCNN, a CNN designed for triangular meshes, but notes its limited receptive field. The document proposes improving MeshCNN to better capture global features for tasks like shape classification and segmentation.

Uploaded by

hsisj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

/Title (AAAI Press Formatting Instructions for Au-

thors Using LaTeX – A Guide) /Author (AAAI Press


Staff, Pater Patel Schneider, Sunil Issar, J. Scott
Penberthy, George Ferguson, Hans Guesgen, Fran-
cisco Cruz, Marc Pujol-Gonzalez) /TemplateVersion
(2022.1)
3D Shape Analysis Using the CNN

Written by 周芃为23020211153995,周密23020211153994,郭思正 31520211154042,曾威


31520211154020
1
Association for the Advancement of Artificial Intelligence
2275 East Bayshore Road, Suite 160
Palo Alto, California 94303
[email protected]

Abstract
Introduction
Convolutional neural networks (CNNs) have made
great breakthroughs in 2D computer vision. The great success of deep convolutional neural
Consequently, CNN has been also applied to 3D networks (CNNs) in 2D computer vision has led to
geometry, inspired by 2D CNNs. There are several their generalisation to various disciplines, including
for 3D data, such as point cloud, voxel, mesh and 3D geometry. For computational reasons, and to
some implicit representations. Among the facilitate data processing, various discrete
representations above, polygonal meshes provide a approximations for 3D shapes have been suggested
relatively efficient representation for 3D shapes. and utilized to represent shapes in an array of
Compared with point cloud, they explicitly capture applications. PointNet [1] is a pioneering and
topology of a shape, while compared with voxel, they representative approach for learning a feature
only represent the boundary of an object, and does representation of a point cloud, followed by more
not have redundant elements representing the object’ successful work in this domain .Apart from point
s interior. Meshes leverage non-uniformity to clouds,3D geometry learning has been extended to
represent large flat regions as well as intricate other forms of 3D data, such as voxels and meshes .
features. However, this nonuniformity and
irregularity inhibits mesh analysis efforts using In this project, we consider the neural network
CNNs. Based on MeshCNN, which present special MeshCNN, which aim to tap into the natural
edge-based convolution and pooling, we attempt to potential of the native mesh representation. The
improve the method of MeshCNN, since it has network adequately utilize the unique property,
limited receptive field and can not capture global every edge is incident to exactly two faces (triangles),
features. In this project, we still utilize the unique which defines a natural fixed-sized convolutional
properties of the triangle mesh for a direct analysis of neighborhood of four edges. But we find a defect of
3D shapes, including classification and segmentation, the special convolutional operation. The convolution
using an improved edition of MeshCNN, a kernel of MeshCNN has a receptive field which is
convolutional neural network designed specifically limited to the adjacent four edges of a given edge. We
for triangular meshes try to improve this defect. However, unlike
conventional edge collapse, which removes edges
that introduce a minimal geometric distortion, mesh
pooling delegates the choice of which edges to
collapse to the network in a task-specific manner. The
purged edges are the ones whose features contribute
the least to the used objective(see examples in figures
1 and 2).

Figure1
shape alignment, and applied the estimated deformation
on the original mesh.

Point clouds. The point cloud is a classic candidate for


data analysis attributed to the close relationship to data
acquisition and ease of conversion from other
representations. [1] proposes to use 1x1 convolutions
followed by global max pooling for order invariance. In
its followup work, [3], points are partitioned to capture
local structures better.

Meshes. A mesh representation is based on three types


Figure2 of geometric primitive: vertices, edges, and faces. We
classify mesh deep learning methods according to which
Related Work of these is treated as the primary data. The first is the
vertex-based approach. One popular approach performs
Many of the operators that MeshCNN presents or uses deep learning on 3D shapes by locally encoding in the
in their work are based on classic mesh processing tec-
neighborhood of each vertex into a regular domain, wh-
hniques [8][9][10], or more specifically, mesh simplific-
ation techniques [11][12][13].In particular, they use the ereupon convolution operations (or kernel functions)can
edge-collapse technique [13] for the task-driven poolin- imitate those used for images, such as [22][23][24].The
g operator. While classic mesh simplification techniqu- second is the edge-based method. In a 2-manifold mesh,
es aim to reduce the number of mesh elements with m- every edge is adjacent to two faces, and the four other
inimal geometric distortion[14][15], in MeshCNN, they edges of those two triangles. This property is exploited
use the mesh simplification technique to reduce the re-
solution of the feature maps within the context of a ne- by [25] to define an order invariant convolution. PD-
ural network. In the following, we briefly review relev- MeshNet[26] first constructs a primal graph and a dual
ant work on 3D geometric learning ,organized accor- graph from the input mesh, then performs convolutions
ding to input representation type. on these graphs using a graph attention network[27]. M-
eshWalker [29] employs random walks along edges to
Multi-view 2D projections. One way of applying deep extract shape features, instead of exploiting regular nei-
learning to geometric data is to transform 3D shapes
ghborhood structures. The last one is face-based method.
into images. Consequently, the method of transform
3D data into 2D projection has been exploited. Face-based methods focus on how to efficiently and effe-
Additionally, leveraging existing techniques and ctively gather information from neighboring faces.[30]
architectures from the 2D domain is made possible by propose a rotationally invariant face based method cons-
representing 3D shapes through their 2D projections idering ring neighbors.[31] propose MeshNet. It adopts
from various viewpoints. These sets of rendered
graph-constrained mesh-cell nodes to integrate local-to-
images serve as input to subsequent processing by
standard CNN models.[16] were the first to apply a global geometric features. DNF-Net [32] denoises mesh
multi-view CNN for the task of shape classification, normals on cropped local patches using multi-scale em-
however, this approach cannot perform semantic- bedding and a residual learning strategy. TextureNet [33]
segmentation. Then [17] presented a more parameterizes mesh patches and high-resolution textur-
comprehensive multi-view framework for shape es as quadrilaterals to employ grid convolution.
segmentation which fix the defect above.
SubdivNet[34],which is presented this year, achieves the
state-of-art result on tasks of shape classification and
Volumetric. Transforming a 3D shape into voxels
segmentation.
provides a grid-based representation that is analogous
to the 2D grid of an image, so that operations that are Proposed Solution
applied on 2D grids can be extended to 3D grids in a
straight-forward manner, thus allowing common
image-based approaches apply to 3D. [18] pioneered Applying CNN on Meshes
this concept, and presented a CNN that processes
voxelized shapes for classification and completion. The most fundamental and commonly used 3D data
Following that, [19] tackled shape reconstruction using representation in computer graphics is the non-uniform
a voxel-based variational autoencoder, and[20] polygonal mesh; large flat regions use a small number of
large polygons, while detailed regions use a larger
combined trilinear interpolation and Conditional number of polygons. A mesh explicitly represents the
Random Fields (CRF) with a volumetric network to topology of a surface: faithfully describing intricate
promote semantic shape segmentation.[21] used voxel structures while disambiguating proximity from nearby
to train a network to regress grid-based warp fields for surfaces (see Figure 3).
Realizing our goal to apply the CNN paradigm dire-
ctly onto triangular meshes, necessitates an analogous limitation of the convolution in MeshCNN which is
definition and implementation of the standard build- mentioned above, we attempt to modify the original
ing blocks of CNN: the convolution and pooling layers.
As opposed to images which are represented on a reg- convolution pattern or the pipeline of the network to
ular grid of discrete values, the key challenge in mesh obtain better experiment results on the tasks of 3D shape
analysis is its inherent irregularity and non-uniformity. classification and segmentation.

Experiments

Data Processing

Geometric mesh decimation helps to reduce the input


resolution and with it the network capacity required for
Figure3 training , so we simplidied each mesh to roughly the
same number of edges.Since our task is shape classifica-
Methods tion,the requirement of the shape resolution is relatively
low(about 750 edges).
Mesh Convolution.We define a convolution operator
for edges, where the spatial support is defined using
the four incident neighbors (Figure 3). Recall that conv- Augmentation. Since our input features are similarity-
olution is the dot product between a kernel k and a nei- invariant, applying rotation, translation and isotropic
ghborhood, thus the convolution for an edge feature e scaling does not generate new input features.However ,
and its four adjacent edges is:
applying anisotropic scaling on the vertex locations in x ,
y and z can generate new features.Moreover, we shift
vertices of each mesh to different locations and augment
the tessellation of each object by performing random
Mesh Pooling.We extend conventional pooling to irre- edge flips.
gular data, by identifying three core operations that to-
gether generalize the notion of pooling:
1) define pooling region given adjacency Mesh Classification
2) merge features in each pooling region
3) redefine adjacency for the merged features SHREC. We performed classification on 30 classes from
the SHREC dataset , with 20 examples per class. Split 16
Mesh Unpooling.Each mesh unpooling layer is paired
with a mesh pooling layer, to upsample the mesh topo- and 10 of the SHREC dataset are the number of training
logy and the edge features. The unpooling layer reins- examples per class and we use the split 16 in our
tates the upsampled topology (prior to mesh poo- task.We stop training after 200 epochs.Figure 5 shows
ling),by storing the connectivity prior to pooling. Note the test results.
that upsampling the connectivity is a reversible opera-
tion (just as in images). For unpooled edge feature co-
mputation, we retain a graph which stores the adjace-
ncies from the original edges (prior to pooling) to the
new edges (after pooling). Each unpooled edge feature
is then a weighted combination of the pooled edge Figure5
features. The case of average unpooling is demonstr-
ated in Figure 4. We also visualize some examples of mesh pooling
simplifications of this dataset in Figure 6.

Figure4
Plan

Our project is based on MeshCNN. Considering the Figure6


manifolds with non-uniform structures.
Cube engraving. The dataset of cubes is modeled with
shallow icon engravings(see Figure 7) References

[1]Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J


Guibas. 2017a. Pointnet: Deep learning on point sets for
3d classification and segmentation. Proc. Computer
Vision and Pattern Recognition (CVPR), IEEE 1, 2 (2017),
4.
[2]Yangyan Li, Rui Bu, Mingchao Sun, and Baoquan
Chen.2018. PointCNN. CoRRabs/1801.07791 (2018).
[3]Charles R. Qi, Li Yi, Hao Su, and Leonidas J Guibas.
Figure7 2017b. PointNet++: Deep Hierarchical Feature Learning
on Point Sets in a Metric Space. In Advances in Neural
We train our network to classify the cubes.We show Information Processing Systems (NIPS). 5105âĂŞ5114.
the test accuracy in Figure 8 and visualize the effect of [4]Roman Klokov and Victor S. Lempitsky. 2017. Escape
mesh pooling in Figure 9. from Cells: Deep Kd-Networks for the Recognition of 3D
Point Cloud Models. In ICCV. IEEE Computer
Society,863–872.
[5]Peng-Shuai Wang, Yang Liu, Yu-Xiao Guo, Chun-Yu
Sun,and Xin Tong. 2017. O-CNN:octree-based
Figure8
convolutional neural networks for 3D shape analysis.
ACM Trans.Graph.36, 4 (2017), 72:1–72:11.
[6]Rana Hanocka, Amir Hertz, Noa Fish, Raja Giryes,
Shachar Fleishman, and Daniel Cohen-Or. 2019.
MeshCNN: a network with an edge. ACM Trans. Graph.
38, 4 (2019),90:1–90:12.
[7]Alon Lahav and Ayellet Tal. 2020. MeshWalker: deep
mesh understanding by random walks. ACM Trans.
Graph.39, 6 (2020), 263:1–263:13.
[8]Hugues Hoppe. 1999. New quadric metric for
simplifying meshes with appearance attributes. In
Visualization’99.Proceedings. IEEE, 59–510.
[9]Szymon Rusinkiewicz and Marc Levoy. 2000. QSplat:
A Multiresolution Point Rendering
[10]System for Large Meshes. In Proceedings of the 27th
Annual Conference on Computer Graphics and
Interactive Techniques (SIGGRAPH ’ 00). ACM
Figure9 Press/Addison-Wesley Publishing Co., New York, NY,
USA, 343–352.https://doi.org/10.1145/344779.344940
Conclusion [11]Hugues Hoppe, Tony DeRose, Tom Duchamp, John
McDonald, and Werner Stuetzle.1993. Mesh
DISCUSSION AND FUTURE WORK optimization. , 19–26 pages.
[12]Michael Garland and Paul S Heckbert. 1997.
We have presented MeshCNN, a general method for Surfacesimplification using quadric error metrics. In
employing neural networks directly on irregular Proceedings of the 24th annual conference onComputer
triangular meshes. The key contribution of our work is graphics and interactive techniques. ACM
the definition and application of convolution and Press/Addison-Wesley Publishing Co., 209–216.
pooling operations tailored to irregular and non- [13]Hugues Hoppe. 1997. View-dependent refinement of
uniform structures. These operations facilitate a direct progressive meshes. In Proceedings of the 24th annual
analysis of shapes represented as meshes in their conference on Computer graphics and interactive
native form, and hence benefit from the unique techniques.ACM Press/Addison-Wesley Publishing Co.,
properties associated with the representation of surface 189-198.
[14]Marco Tarini, Nico Pietroni, Paolo Cignoni, Daniele Rosinol,Davide Scaramuzza, and Luca Carlone. 2020.
Panozzo, and Enrico Puppo. 2010.Practical quad mesh Primal-Dual Mesh Convolutional Neural Networks. In
simplification. In Computer Graphics Forum, Vol. 29. NeurIPS.
Wiley Online Library, 407–418. [27]Federico Monti, Oleksandr Shchur, Aleksandar
[15]Xifeng Gao, Daniele Panozzo, Wenping Wang, Bojchevski, Or Litany, Stephan Günnemann, and
Zhigang Deng, and Guoning Chen. 2017.Robust Michael M. Bronstein. 2018. Dual-Primal Graph
structure simplification for hex re-meshing. ACM Convolutional Networks.CoRR abs/1806.00770 (2018).
Transactions on Graphics 36, 6 (2017). arXiv:1806.00770
[16]Hang Su, Subhransu Maji, Evangelos Kalogerakis, [28]Petar Velickovic, Guillem Cucurull, Arantxa
and Erik Learned-Millers. 2015. Multiview Casanova, Adriana Romero, Pietro Liò, and Yoshua
Convolutional Neural Networks for 3D Shape Bengio. 2018. Graph Attention Networks. In ICLR
Recognition. In InternationalConference on Computer (Poster). OpenReview.net.
Vision (ICCV). [29]Alon Lahav and Ayellet Tal. 2020. MeshWalker:
[17]Evangelos Kalogerakis, Melinos Averkiou, deep mesh understanding by random walks. ACM
Subhransu Maji, and Siddhartha Chaudhuri.2017. 3D Trans. Graph. 39, 6 (2020), 263:1–263:13.
shape segmentation with projective convolutional [30]Haotian Xu, Ming Dong, and Zichun Zhong. 2017.
networks. In Proc. CVPR,Vol. 1. 8. Directionally Convolutional Networks for 3D Shape
[18]Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Segmentation. In ICCV. IEEE Computer Society, 2717 –
Yu, Li-guang Zhang, Xiaoou Tang,and Jianxiong Xiao. 2726.
2015. 3D shapenets: A deep representation for [31]Chunfeng Lian, Li Wang, Tai-Hsien Wu, Mingxia
volumetric shapes. In Computer Vision and Pattern Liu, Francisca Durán, Ching-Chang Ko, and Dinggang
Recognition (CVPR).1912âĂŞ1920. Shen. 2019. MeshSNet: Deep Multi-scale Mesh Feature
[19]Andrew Brock, Theodore Lim, J.M. Ritchie, and Learning for End-to-End Tooth Labeling on 3D Dental
Nick Weston. 2016. Generative and Discriminative Surfaces. In MICCAI (6) (Lecture Notes in Computer
Voxel Modeling with Convolutional Neural Networks. Science), Vol. 11769. Springer, 837–845.
In NIPS 3D Deep Learning Workshop. [32]Xianzhi Li, Ruihui Li, Lei Zhu, Chi-Wing Fu, and
[20]Lyne P. Tchapmi, Christopher B. Choy, Iro Armeni, Pheng-Ann Heng. 2020. DNF-Net: a Deep Normal
JunYoung Gwak, and Silvio Savarese.2017. SEGCloud: Filtering Network for Mesh Denoising. CoRR
Semantic Segmentation of 3D Point Clouds. In 3DV. abs/2006.15510 (2020).
[21]Rana Hanocka, Noa Fish, Zhenhua Wang, Raja [33]Jingwei Huang, Haotian Zhang, Li Yi, Thomas A.
Giryes, Shachar Fleishman, and Daniel Cohen-Or. 2018. Funkhouser, Matthias Nießner, and Leonidas J. Guibas.
ALIGNet: Partial-Shape Agnostic Alignment via 2019. TextureNet: Consistent Local Parametrizations for
Unsupervised Learning. ACM Trans. Graph. 38, 1, Learning From High-Resolution Signals on Meshes. In
Article 1 (Dec. 2018), 14 pages. CVPR. Computer Vision Foundation /IEEE, 4440–4449.
https://doi.org/10.1145/3267347 [34]Shi-Min Hu, Zheng-Ning Liu, Meng-Hao Guo, Jun-
[22]Jonathan Masci, Davide Boscaini, Michael M. Xiong Cai, Jiahui Huang, Tai-Jiang Mu, Ralph R.
Bronstein,and Pierre Vandergheynst.2015. Geodesic Martin.2021. Subdivision-Based Mesh Convolution
Convolutional Neural Networks on Riemannian Networks. https://arxiv.org/abs/2106.02285
Manifolds. In ICCV Workshops. IEEE Computer
Society, 832–840.
[23]Davide Boscaini, Jonathan Masci, Emanuele Rodolà,
and Michael M. Bronstein. 2016.Learning shape
correspondence with anisotropic convolutional neural
networks. InNIPS. 3189–3197.
[24]Federico Monti, Davide Boscaini, Jonathan Masci,
Emanuele Rodolà, Jan Svoboda, and Michael M.
Bronstein. 2017.Geometric Deep Learning on Graphs
and Manifolds Using Mixture Model CNNs. In CVPR.
IEEE Computer Society, 5425–5434.
[25]Rana Hanocka, Amir Hertz, Noa Fish, Raja Giryes,
Sh-char Fleishman, and Daniel Cohen-Or. 2019.
MeshCNN: a network with an edge. ACM Trans.
Graph. 38, 4 (2019),90:1–90:12.
[26]Francesco Milano, Antonio Loquercio, Antoni

You might also like