3D Tooth Segmentation and Labeling Using Deep Convolutional Neural Networks
3D Tooth Segmentation and Labeling Using Deep Convolutional Neural Networks
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2018.2839685, IEEE
Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. XX 1
Abstract—In this paper, we present a novel approach for 3D dental model segmentation via deep Convolutional Neural Networks
(CNNs). Traditional geometry-based methods tend to receive undesirable results due to the complex appearance of human teeth (e.g.,
missing/rotten teeth, feature-less regions, crowding teeth, extra medical attachments, etc.). Furthermore, labeling of individual tooth is
hardly enabled in traditional tooth segmentation methods. To address these issues, we propose to learn a generic and robust
segmentation model by exploiting deep Neural Networks, namely NNs. The segmentation task is achieved by labeling each mesh face.
We extract a set of geometry features as face feature representations. In the training step, the network is fed with those features, and
produces a probability vector, of which each element indicates the probability a face belonging to the corresponding model part. To this
end, we extensively experiment with various network structures, and eventually arrive at a 2-level hierarchical CNNs structure for tooth
segmentation: one for teeth-gingiva labeling and the other for inter-teeth labeling. Further, we propose a novel boundary-aware tooth
simplification method to significantly improve efficiency in the stage of feature extraction. After CNNs prediction, we do graph-based
label optimization and further refine the boundary with an improved version of fuzzy clustering. The accuracy of our mesh labeling
method exceeds that of the state-of-art geometry-based methods, reaching 99.06% measured by area which is directly applicable in
orthodontic CAD systems. It is also robust to any possible foreign matters on model surface, e.g., air bubbles, dental accessories, and
many more.
Index Terms—Boundary-aware simplification, 3D mesh segmentation, deep convolutional neural networks, fuzzy clustering.
1 I NTRODUCTION
1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2018.2839685, IEEE
Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. XX 2
image as in [5]. By doing so, we get a sequence of image- [7]. These approaches rely on geometry information more or
label pairs as CNNs input. Usually, 50% of triangle faces less. They can be grouped into two categories: region-based
on a dental mesh belong to gingiva, others belong to 14˜16 and boundary-based segmentation approaches. Region-
teeth. Besides, the quality of the segmentation boundary based approaches attempt to partition meshes into sever-
is vital in the subsequent orthodontic treatments (e.g., root al regions, where mesh faces share similar characteristics,
planning). To deal with data imbalance and improve bound- while faces in different regions differ greatly. Well-known
ary accuracy, we extensively experiment with various net- region-based works include K-means [8], clustering [9],
work structures and finally arrive at a hierarchical labeling decomposition [10], fitting primitives [11], watersheds [12],
architecture, which consists of two CNNs, one for teeth- random walks [13], fast marching [14]. While boundary-
gingiva labeling (we call it TGCNNs), the other for inter- based approaches concentrate on finding the optimal curves
teeth labeling (we call it TTCNNs). As raw dental mod- to separate two neighbouring parts. They determine final
els used in orthodontic treatments are usually very large curves by maximizing the difference between parts. Main
(200,000˜400,000 triangles) which poses a large overhead methods include normalized and randomized cut [15], core
on the feature extraction step (up to dozens of hours). To extraction [16], shape diameter function [17], and active
improve the efficiency of the feature extraction while leaving contours or scissoring [18], [19]. However, geometry-based
the quality of CNNs unaffected, we design a boundary- methods tend to fail when meshes become special and
aware mesh simplification algorithm and a correspondence- complicated.
free mapping algorithm to pre-process and post-process the As meshes vary from each other in terms of appearance,
dental meshes. it is impossible to separate a mesh into desired parts with
For a newly come dental mesh S , which is to be labeled, fully automatic approach. While manual segmentation is
we first simplify it to a mesh S 0 using our boundary-aware tiring as well as time-wasting, sketch-based semi-automatic
simplification algorithm. Then we extract the feature images methods become popular. They provide simple and user-
per face on S 0 and feed them into the TGCNNs net for friendly interfaces for users to add their suggestions as
teeth-gingiva separation. In an intermediate step, we do start points or optimization constraints. Literature [20], [21]
label optimization to smooth the labeling boundary. We then briefly described numbers of sketch-based segmentation
apply the TTCNNs to label each individual tooth face. After methods. For example, Ji et al. [22] introduced an improved
that, we do label optimization to smooth the labeling results region-growing algorithm for segmentation. Fan et al. [23]
again and back-project the labeling result to the original adopted an efficient local graph-cut-based optimization al-
model S . Finally, we optimize the segmentation boundary gorithm and received satisfying results. Studies [24], [25],
via an improved fuzzy clustering to achieve the final result. [26], [27] integrated harmonic field theory with sketch-
Our data-driven tooth labeling method is capable of based segmentation, which possess solid theoretical basis
segmenting various dental models regardless of their geo- and work well. However, sketch-based methods require a
metric variations. It is not only effective and efficient, but balance between user input and automatic computation.
also, to the best of our knowledge, the most accurate tooth Since 3D mesh databases, e.g., the Princeton Segmenta-
segmentation and labeling model in the literature so far tion Benchmark [28], were released, data-driven methods
which achieves a practical precision of 99.06%. The main have been proposed for mesh segmentation. Both non-
contributions of our method are: supervised and semi-supervised learning methods try to
• A simultaneous and robust teeth segmentation and learn a model for separating a mesh meaningfully from
labeling framework which achieves 99.06% accuracy the database and verify them on new meshes. Some recent
and can be directly applied in industrial orthodontic works include [29], [30], [31].
CAD systems; Dental Mesh Segmentation Numerous segmentation
• A carefully designed 2-level hierarchical CNNs mod- approaches have been proposed to separate dental mod-
el trained on 1, 000 dental meshes, which is robust els. According to the input format, we divide the existed
and generalize well on new data; approaches into two categories, 2D image and 3D mesh.
• A boundary-aware mesh simplification method to Researchers have proposed effective segmentation algo-
enable efficient feature extraction; rithms based on the 2D projection images. Yamany et al.
• An improved fuzzy clustering boundary optimiza- [2] encoded the curvature and surface normal information
tion algorithm coupling network prediction with ge- into a 2D image, and designed an image segmentation tool
ometry optimization. to extract structures of high/low curvature. By iteratively
removing these structures, individual teeth surfaces are
obtained. Similarly, Kondo et al. [32] presented an auto-
2 R ELATED W ORK mated method for tooth segmentation from 3D digitized
This paper proposes a data-driven method for dental mesh image captured by a laser scanner. Grzegorzek et al. [33]
segmentation. We first review the literature on general mesh presented a multi-stage approach for tooth segmentation
segmentation. Then, we discuss approaches for dental mesh from 3D dentition surfaces based on a 2D model-based
segmentation methods. Finally, we briefly introduce recent contour retrieval algorithm. Wongwaen et al. [34] converted
data-driven shape analysis methods in geometry processing. the 3D-panoramic to 2D space to find the cutting points for
General Mesh Segmentation 3D mesh segmentation is segmentation of individual tooth, followed by converting
a fundamental task for mesh understanding and processing. the 2D image back to 3D space for remaining operations.
It divides 3D shapes into several parts under reasonable Literature [3] subdivided those methods, which take 3D
criteria. Common approaches are mentioned in surveys [6], mesh as input, into 3 categories. The first is curvature-
1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2018.2839685, IEEE
Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. XX 3
Original models Simplified models Predict result Refinement result Predict result
Label
optimization
Boundary ANN Back- Sticky-teeth
optimization projection separation
Final result
Fig. 1. The pipeline of our method. Our method takes a raw teeth model as input, simplifies the model, extracts its features and feeds into a 2-level
hierarchical network to generate a label prediction, followed by label optimization and back-projection to get the final segmentation result.
based method, which separates the dental models relying on [60]. In order to design a data-driven algorithm, sufficient
surface curvature. Yuan et al. [35] analysed the regions of the shape databases are necessary. Famous databases include
3D dental model and classify them based on the minimum [28], [61] which are maintained by universities and [58]
curvatures of the surface. Zhao et al. [4] proposed an inter- which is collected from the web. Another way to gather
active segmentation method based on curvature values of adequate shape models is to create synthesis dataset, e.g.,
the triangle mesh. System designed by [36] requires users to [62]. Xu et al. [63] made a comprehensive survey on the
provide a one-time setting of a certain curvature threshold existing online data collections.
via an intuitive slider. Others, including snake-based active Traditional learning methods are mainly designed for
contour method [37], ”fast marching watershed” method finding a different representation of 3D mesh. In recent
[38] and morphologic skeleton extraction method [1] are all years, deep neural networks show their excellent perfor-
related to curvature information to some extent. mance in extracting latent features, as well as automatically
The second is contour-line-based method, which is a building mappings between input and output [64], [65].
relative accurate segmentation method as it allows human Especially, deep CNNs do well in image-format-input tasks
interaction. In studies [39] and [40], users assign the bound- [66], [67]. Researchers in computer graphics community are
ary between each tooth and gum in the form of surface making efforts to feed 3D mesh data into CNNs. Guo et al.
points by mouse click. Then the algorithm connects each [5] extracted a 20 × 30-dimension geometric feature image
pair of the neighbouring points depending on the geodesic for each triangle face and feed it into a typical classification
information. The generated segments are desired boundary. network together with the ground-truth face label. Maron
Although the boundary is accurate, this method relies too et al. [68] parameterized the sphere-like mesh to get a 2D
much on user interaction. Users have to rotate/translate the image and use it to train a modified FCN-like [69] network.
mesh and zoom in/out repetitively to make their sugges- [5], [68], [70] show that if well designed, CNNs are also
tions, which is tiring and time-consuming. capable of 3D mesh segmentation.
The third is harmonic-field-based method. Zou et al.
[3] proposed a harmonic-field-based segmentation method 3 OVERVIEW
which requires only a limited number of surface points as Fig.1 illustrates our pipeline. Due to the computational
prior. It saves users’ time and achieves reasonable results. burden of features for large meshes [29], we do mesh simpli-
Data-driven Shape Analysis Recently, data-driven ge- fication for each dental model to reduce the face numbers.
ometry processing algorithm has been developed both in To account for the preservation of informative geometric
computer graphics and computer vision communities [41]. features for segmentation, we design a boundary-aware
The commonly known shape analysis techniques can be mesh simplification algorithm to maintain the features along
grouped into several topics, such as classification [42], [43], the teeth-gingiva and tooth-tooth regions.
[44], matching [45], [46], [47], reconstruction [48], [49], [50], We extract global and local features of each face on
modeling and synthesis [51], [52], [53]. Data-driven segmen- the simplified model. We use the similar set of features as
tation methods are classified into supervised [54], [55], [56], in [29], and add positional features to boost the network
semi-supervised [30], [57], [58] and unsupervised [58], [59], performance. These features are reorganized into a 20 × 30
1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2018.2839685, IEEE
Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. XX 4
image to feed into the network. We design a 2-level hi- gingiva boundary, it is usually continuous, i.e., the green
erarchical network for face labeling. We train two CNNs points in Fig.2a.
with similar layers for teeth-gingiva and inter-teeth labeling We denote a dental mesh as G = hV, E, Fi, where V is the
respectively. The CNN architecture consists of convolution, vertex set, E edge set, and F face set. Label set L consists of
pooling and fully-connected layers, with carefully tuned the label li ∈ [0, 1] of each triangle face, where 0 represents
parameters, e.g., the number of layers and the activation definite gingiva and 1 represents definite teeth. Labels L are
functions. determined by optimizing
Right after each network prediction, we employ label X X
optimization to correct the wrongly predicted labels which arg min E1 (li ) + λ E2 (li , lj ) (1)
{li ,i∈F }
usually appear on the boundary. We further improve the i∈F i,j∈F
boundary between teeth and gingiva as well as between
individual teeth by graph optimization. We also employ where λ is a non-negative constant to balance the two terms
PCA analysis to split sticky-teeth (i.e., pairs of teeth which (empirically set λ = 100).
are adjacent and get the same label after optimization) The unary term is defined as
which occasionally appear in regions with missing/rotten
E1 (li ) = α1 Eu1 (li ) + α2 Eu2 (li ) + α3 Eu3 (li ) (2)
teeth or the front teeth. Finally, we back-project the labels of
simplified model onto the original model and further refine s.t. α1 + α2 + α3 = 1
the boundary. Below we discuss various algorithmic design
where Eu1 , Eu2 , Eu3 are the probability energy given by
choices in detail.
the three characteristics mentioned before, which are z-axis
coordinate, geodesic distance to the nearest sharp points and
4 A LGORITHM Euclidean distance to mesh center in XY plane. Specifically,
each item is defined as
4.1 Boundary-aware Tooth Simplification
Eu1 (li = 0) = (zi − zmin )/H
A dental model, acquired by CT scanning, is very precise,
Eu2 (li = 0) = 1r− gdi /gdmax
containing more than 200, 000 faces. A direct computation
of geometric features [29] on such fine model is extremely xi − xmesh 2 yi − ymesh 2 (3)
Eu3 (li = 0) = ( ) +( )
time-consuming in either training or testing stage (Section 0.5L 0.5W
4.2). Mesh simplification, therefore, is necessary for pre- E1 (li = 1) = 1 − E1 (li = 0)
processing dental meshes. Traditional feature-preserving
mesh simplification methods tend to lose semantic informa- where xi , yi , zi are the x, y, z-axis coordinate of triangle face
tion, e.g., details on the teeth-gingiva boundary. Clear and i. [L, W, H] are the length (in the x-axis direction), width
accurate tooth-tooth, together with teeth-gingiva boundary (in the y-axis direction) and height (in the z-axis direction)
plays an important role in the learning procedure (Section of the axis-aligned bounding box of a dental model. gdi is
5). Thus we design a boundary-aware tooth simplification the geodesic distance from face i to the nearest sharp point
algorithm to preserve such semantic information as much and gdmax is the maximum value for all gdi . xmesh and
as possible. ymesh are the x, y-axis coordinate of mesh center (set α1 =
To preserve the boundary information, first we need to 0.4, α2 = 0.5, α3 = 0.1). We detect the sharp feature points
identify them. Our aim is to divide a dental model into as local shape extremities [3]. The probability field is shown
three regions: gingiva, teeth, and teeth-gingiva boundary, in Fig.2b.
shown in Fig.2d. The gingiva region occupies a large part of Faces on teeth-gingiva boundary usually have negative
triangle faces but provides little discriminative information curvature. We use pairwise term E2 to measure this as
for classification as it is feature-less. The teeth region pos-
1
sesses more important geometric details than gingiva, and
AD(αij )
, li 6= lj
should not be simplified too much. The boundary region is E2 (li , lj ) = 1 + avg(AD) (4)
the most important for segmentation, whose details should
0, li = lj
be retained as much as possible. To this end, we modify the
traditional mesh simplification method [71]. We multiply The angular distance is AD(αij ) = η(1 − cos αij ), where
the edge-collapse-cost with different weights in different αij is the angle between the normal of face i and j . The
regions. definition here is the same as [10]. For convex angles, η =
Our task now is to identify those regions. We observe 0.05, and for concave angles η = 1.
that the dental meshes are usually scanned on the same We use graph cuts algorithm to solve the optimization
CT platform, whose bottom parts are planar, as shown in problem in Eqn. 1. As a result, the mesh is divided into
Fig.2a. This makes the classification task easier. We first two regions: teeth and gingiva. We extend the teeth-gingiva
identify the largest plane using a greedy floodfill algorithm boundary using Breadth First Search(BFS)(5 iterations in our
and align the normal of the largest plane with the z-axis. experiments) to get three regions (Fig.2d). Then we shall
It is easy to align x-axis and y-axis by PCA analysis. Then conduct a detail-preserved mesh simplification (Fig.2c). Em-
for an upper-part dental mesh, shown in Fig.2, the majority pirically, we set the collapsing weights for edges in gingiva,
of teeth faces appear in regions which are of larger z-axis teeth, and gingiva-teeth boundary regions as 1, 20, and 500
coordinate values, far from the mesh center, and close to respectively. The simplification ratio is 0.2. The simplified
teeth sharp points [3], i.e., the red points in Fig.2a. For teeth- model has around 40, 000 triangle faces.
1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2018.2839685, IEEE
Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. XX 5
1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2018.2839685, IEEE
Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. XX 6
1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2018.2839685, IEEE
Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. XX 7
nor CNNs predictions are reliable to determine the optimal Group Training Validation Test
boundary by itself. We propose to combine them and use
Number of Cases 1000 50 150
an improved fuzzy clustering algorithm [10] to refine the
boundary. The improved fuzzy clustering method takes
both geometry and CNNs predictions into consideration Metrics. We evaluate the performance from two different
and works well on dental models. aspects. One is from a global perspective, i.e., to calculate the
For each tooth labeled by l, we first do BFS from the percentage of the area of correctly labeled faces [5], which is
current boundary to visit a group of nearby faces, which expressed as
make up the fuzzy region. For faces on the border of the X X
region, we collect those faces adjacent to tooth l as set S , Accuracy = at gt (lt )/ at (12)
and others as set T . The modified capacity is t∈T t∈T
1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2018.2839685, IEEE
Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. XX 8
stages allows for more comprehensive fine tuning, for exam- bit trivial to evidence the effectiveness of boundary-aware
ple, the label optimization parameter λ. As for GR-DNN, we algorithm. This is true as the boundary regions are indeed
tried two versions of the anchor graph [79], one is to create a small portion of the whole model. However, the visual-
the anchor graph directly using the 600-dimensional feature ization of these results supports that boundary-aware algo-
space and the other is to use the Euclidean distance among rithm indeed helps networks predict much better towards
the faces to account for local connectivity. However, in both boundary region. Fig.7 indicates that our boundary-aware
cases we do not observe much difference, which indicates simplification algorithm effectively prevents over-prediction
learning a better embedding does not necessarily guarantee and under-prediction in boundary regions. Such clean and
a better classification result. accurate boundaries could largely benefit the subsequent
To evaluate the effectiveness of our pipeline, we make a processes in digital tooth treatments, for example, tooth
comparison with method [5], which performs very well in root region reconstruction and tooth alignment, etc. Table 5
general mesh segmentation. We prepare training data in the shows mean errors of boundary-aware algorithm is smaller
same way for the two methods, using mesh simplification than that in uniform simplification method.
followed with feature extraction. In the first comparison, we
compare the two methods as themselves. The differences TABLE 4
lie in the network structure, simplification and optimization Experiments of boundary-aware and uniform simplification.
methods. We use their own features, network and post-
optimization. The output label numbers of the network Training Data U(200) U(1000)
proposed by [5] is set to seventeen, with the purpose of Simplification Boundary Boundary
Uniform Uniform
segmenting all teeth and gingiva at once. After network Method -aware -aware
prediction, [5] only uses α−β swap to do label optimization. TGCNNs 98.55% 98.37% 98.93% 98.62%
Table 3 shows that on large-scale training set, our method TTCNNs 95.24% 95.03% 97.50% 97.25%
outperforms [5] significantly. In the second comparison, we Final 98.61% 98.11% 99.06% 98.81%
verify the usefulness of the COORD feature, that is, we add
a 7-dimension coordinate feature mentioned in 4.2 to [5]
and conduct 17-label classification. The performance rises
significantly (Table 3), which reveals the effectiveness of
our new feature. This is because teeth are usually aligned
and symmetry thus nearby teeth could confuse the network
in the original feature representation. With the coordinate
features, it could largely help distinguish for example, left
and right.
Fig.11 shows some of our representative results. Note Fig. 7. Comparing boundary-aware and uniform simplification. (a) Teeth-
that our method is robust to various complex circumstances gingiva prediction produced by CNNs on uniformly simplified models.
in human teeth such as missing/rotten teeth, irregular teeth (b) The final results of (a). (c) Teeth-gingiva prediction produced on
boundary-aware simplified models. (d) The final results of (c).
arrangements, noise, bubbles, foreign attachments, as well
as feature-less regions, thanks to our well designed network
and the improved boundary refinement algorithm. Our
networks are also very efficient, for a tooth model with TABLE 5
Mean errors of boundary-aware and uniform simplification method.
40, 000 triangles, it takes less than 1s for prediction. The
simplification, ANN mapping, and fuzzy refinement take
Training Data U(200) U(1000)
around 5s. The most time consuming step is the feature
Simplification Boundary Boundary
extraction, which takes 5 minutes per model. However, it Uniform Uniform
Method -aware -aware
is significantly faster than that on a raw model, i.e., if we do Mean errors/mm 0.0939 0.0951 0.0848 0.0867
not use mesh simplification (which is 12 hours per model).
Verification of the effectiveness of boundary-aware
simplification. To prove that the proposed boundary-aware Verification of the effectiveness of label optimization.
simplification is effective, we make two sets of simplified We employ label optimization in two stages. One is after
models(on simplification ratio 0.2). One is simplified using the GTCNNs to smooth the labeling results for the next
boundary-aware algorithm, the other is simplified uniform- stage and the other is after the TTCNNs to again smooth
ly. Then we extract features to build up two sets of training the labeling results. Table 6 shows the effectiveness of label
data, 200 models and 1000 models. The networks are trained optimization.
on two different amount of training data, in order to get rid
of the influence of data scale. Results are shown in Table 4. TABLE 6
To keep consistent with training data, all test data are Verification of label optimization.
simplified under the same rule. Rows named TGCNNs
and TTCNNs are the result of CNNs prediction. Final row CNN Prediction Label Optimization
refers to the final results of the whole pipeline. Judging U(1000) 98.93% 99.43%
TGCNNs
by the numerical value, boundary-aware simplification is L(1000) 98.88% 99.43%
slightly higher than uniform one. To be honest, the dif- U(1000) 97.50% 98.56%
TTCNNs
ference between each (boundary-aware, uniform) pair is a L(1000) 97.37% 98.43%
1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2018.2839685, IEEE
Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. XX 9
TABLE 3
Labeling accuracy of different network variants.
Guo et al. [5] [5] + COORD WLF NN LeNet GR-DNN-600 GR-DNN-3 Ours
U(1000) 84.81% 95.32% 98.81% 98.35% 98.51% 97.04% 97.00% 99.06%
L(1000) 82.95% 95.16% 98.26% 98.04% 98.23% 96.60% 93.42% 98.79%
1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2018.2839685, IEEE
Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. XX 10
100 7 C ONCLUSION
In this paper, we propose a learning-based dental mesh
segmentation method. It receives a detailed 3D dental model
98 as input, and outputs the label list of each triangle face.
Accuracy(%)
1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2018.2839685, IEEE
Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. XX 11
Fig. 11. (a) Teeth-gingiva classification. (b) Inter-teeth classification. (c) Our results. (d) Another view of our results. (e) Ground truth.
2000
500
R EFERENCES
1000 (hidden nodes)
1000 (hidden nodes)
[1] K. Wu, L. Chen, J. Li, and Y. Zhou, “Tooth segmentation on dental
meshes using morphologic skeleton,” Computers and Graphics,
600-dimension Input 600-dimension Input vol. 38, pp. 199–211, 2014.
[2] S. M. Yamany and A. R. Elbialy, “Efficient free-form surface rep-
(a) (b) resentation with application in orthodontics,” Proceedings of SPIE,
vol. 3640, no. 1, pp. 115–124, 1999.
Fig. 12. (a) Neural network structure. (b) GR-DNN structure. [3] B. J. Zou, S. J. Liu, S. H. Liao, X. Ding, and Y. Liang, “Interactive
tooth partition of dental mesh base on tooth-target harmonic
1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2018.2839685, IEEE
Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. XX 12
field,” Computers in Biology and Medicine, vol. 56, no. C, pp. 132– [29] E. Kalogerakis, A. Hertzmann, and K. Singh, “Learning 3D
144, 2015. Mesh Segmentation and Labeling,” ACM Transactions on Graphics,
[4] M. Zhao, L. Ma, W. Tan, and D. Nie, “Interactive tooth segmen- vol. 29, no. 3, 2010.
tation of dental models,” in Engineering in Medicine and Biology [30] Y. Wang, S. Asafi, O. Van Kaick, H. Zhang, D. Cohenor, and
Society, 2005. IEEE-EMBS 2005. 27th Annual International Conference B. Chen, “Active co-analysis of a set of shapes,” ACM Transactions
of the. IEEE, 2006, pp. 654–657. on Graphics, vol. 31, no. 6, p. 165, 2012.
[5] K. Guo, D. Zou, and X. Chen, “3d mesh labeling via deep con- [31] H. Benhabiles, G. Lavoue, J. Vandeborre, and M. Daoudi, “Learn-
volutional neural networks,” Acm Transactions on Graphics, vol. 35, ing boundary edges for 3dmesh segmentation,” Computer Graphics
no. 1, pp. 1–12, 2015. Forum, vol. 30, no. 8, pp. 2170–2182, 2011.
[6] M. Attene, S. Katz, M. Mortara, G. Patane, M. Spagnuolo, and [32] T. Kondo, S. H. Ong, and K. W. C. Foong, “Tooth segmentation
A. Tal, “Mesh segmentation - a comparative study,” in IEEE of dental study models using range images,” IEEE Transactions on
International Conference on Shape Modeling and Applications, 2006, Medical Imaging, vol. 23, no. 3, pp. 350–362, 2004.
pp. 7–7. [33] M. Grzegorzek, M. Trierscheid, D. Papoutsis, and D. Paulus, “A
[7] A. Shamir, “A survey on mesh segmentation techniques,” Comput- multi-stage approach for 3d teeth segmentation from dentition
er Graphics Forum, vol. 27, no. 6, pp. 1539–1556, 2008. surfaces.” in ICISP. Springer, 2010, pp. 521–530.
[8] S. Shlafman, A. Tal, and S. Katz, “Metamorphosis of polyhedral
[34] N. Wongwaen and C. Sinthanayothin, “Computerized algorithm
surfaces using decomposition,” Computer Graphics Forum, vol. 21,
for 3d teeth segmentation,” in International Conference on Electronics
no. 3, pp. 219–228, 2002.
and Information Engineering, 2010, pp. V1–277–V1–280.
[9] G. Lavoue, F. Dupont, and A. Baskurt, “A new cad mesh seg-
[35] T. Yuan, W. Liao, N. Dai, X. Cheng, and Q. Yu, “Single-tooth
mentation method, based on curvature tensor analysis,” Computer-
modeling for 3d dental model,” International Journal of Biomedical
aided Design, vol. 37, no. 10, pp. 975–987, 2005.
Imaging, vol. 2010, no. 2010, pp. 1029–1034, 2010.
[10] S. Katz and A. Tal, Hierarchical mesh decomposition using fuzzy
clustering and cuts. ACM, 2003, vol. 22, no. 3. [36] Y. Kumar, R. Janardan, B. E. Larson, and J. Moon, “Improved
[11] M. Attene, B. Falcidieno, and M. Spagnuolo, “Hierarchical mesh segmentation of teeth in dental models,” Computer-aided Design
segmentation based on fitting primitives,” The Visual Computer, and Applications, vol. 8, no. 2, pp. 211–224, 2013.
vol. 22, no. 3, pp. 181–193, 2006. [37] T. Kronfeld, D. Brunner, and G. Brunnett, “Snake-based segmenta-
[12] A. P. Mangan and R. T. Whitaker, “Partitioning 3d surface meshes tion of teeth from virtual dental casts,” Computer-aided Design and
using watershed segmentation,” IEEE Transactions on Visualization Applications, vol. 7, no. 2, pp. 221–233, 2013.
and Computer Graphics, vol. 5, no. 4, pp. 308–321, 1999. [38] Z. Li, X. Ning, and Z. Wang, “A fast segmentation method for
[13] Y. Lai, S. Hu, R. R. Martin, and P. L. Rosin, “Fast mesh segmenta- stl teeth model,” in Complex Medical Engineering, 2007. CME 2007.
tion using random walks,” Statistical Methods and Applications, pp. IEEE/ICME International Conference on. IEEE, 2007, pp. 163–166.
183–191, 2008. [39] C. Sinthanayothin and W. Tharanont, “Orthodontics treatment
[14] A. Koschan, “Perception-based 3d triangle mesh segmentation simulation by teeth segmentation and setup,” in International
using fast marching watersheds,” in Computer Vision and Pattern Conference on Electrical Engineering/electronics, Computer, Telecommu-
Recognition, 2003. Proceedings. 2003 IEEE Computer Society Confer- nications and Information Technology, 2008. Ecti-Con, 2008, pp. 81–84.
ence on, vol. 2. IEEE, 2003, pp. II–II. [40] Y. Ma and Z. Li, “Computer aided orthodontics treatment by
[15] A. Golovinskiy and T. Funkhouser, “Randomized cuts for 3d virtual segmentation and adjustment,” in International Conference
mesh analysis,” international conference on computer graphics and on Image Analysis and Signal Processing, 2010, pp. 336–339.
interactive techniques, vol. 27, no. 5, p. 145, 2008. [41] K. Xu, V. G. Kim, Q. Huang, and E. Kalogerakis, “Datadriven
[16] S. Katz, G. Leifman, and A. Tal, “Mesh segmentation using feature shape analysis and processing,” Computer Graphics Forum, vol. 36,
point and core extraction,” The Visual Computer, vol. 21, no. 8, pp. no. 1, 2017.
649–658, 2005. [42] Z. Barutcuoglu and C. Decoro, “Hierarchical shape classification
[17] L. Shapira, A. Shamir, and D. Cohenor, “Consistent mesh parti- using bayesian aggregation,” in IEEE International Conference on
tioning and skeletonisation using the shape diameter function,” Shape Modeling and Applications, 2006, pp. 44–44.
The Visual Computer, vol. 24, no. 4, pp. 249–259, 2008. [43] J. W. H. Tangelder and R. C. Veltkamp, “A survey of content based
[18] Y. Lee, S. Lee, A. Shamir, D. Cohen-Or, and H. P. Seidel, “Intelli- 3d shape retrieval methods,” Multimedia Tools and Applications,
gent mesh scissoring using 3d snakes,” in Computer Graphics and vol. 39, no. 3, p. 441, 2008.
Applications, 2004. PG 2004. Proceedings. Pacific Conference on, 2004, [44] R. Litman, A. Bronstein, M. Bronstein, and U. Castellani, “Su-
pp. 279–287. pervised learning of bagoffeatures shape descriptors using sparse
[19] Y. Lee, S. Lee, A. Shamir, D. Cohenor, and H. Seidel, “Mesh coding,” Computer Graphics Forum, vol. 33, no. 5, pp. 127–136, 2014.
scissoring with minima rule and part salience,” Computer Aided [45] Q. Huang, F. Wang, and L. Guibas, “Functional map networks for
Geometric Design, vol. 22, no. 5, pp. 444–465, 2005. analyzing and exploring large shape collections,” Acm Transactions
[20] L. Fan, M. Meng, and L. Liu, “Sketch-based mesh cutting,” Graph- on Graphics, vol. 33, no. 4, pp. 1–11, 2014.
ical Models graphical Models and Image Processing computer Vision,
[46] O. Van Kaick, H. Zhang, G. Hamarneh, and D. CohenOr, “A sur-
Graphics, and Image Processing, vol. 74, no. 6, pp. 292–301, 2012.
vey on shape correspondence,” Computer Graphics Forum, vol. 30,
[21] M. Meng, L. Fan, and L. Liu, “A comparative evaluation of
no. 6, pp. 1681–1707, 2011.
foreground/background sketch-based mesh segmentation algo-
rithms,” Computers and Graphics, vol. 35, no. 3, pp. 650–660, 2011. [47] M. Ovsjanikov, M. Ben-Chen, J. Solomon, A. Butscher, and
[22] Z. Ji, L. Liu, Z. Chen, and G. Wang, “Easy mesh cutting,” Computer L. Guibas, “Functional maps: a flexible representation of maps
Graphics Forum, vol. 25, no. 3, pp. 283–291, 2006. between shapes,” Acm Transactions on Graphics, vol. 31, no. 4, p. 30,
2012.
[23] L. Fan, L. Liu, and K. Liu, “Paint mesh cutting,” Computer Graphics
Forum, vol. 30, no. 2, pp. 603–612, 2011. [48] X. Guo, J. Lin, K. Xu, and X. Jin, “Creature grammar for creative
[24] Y. Zheng, C. Tai, and O. K. Au, “Dot scissor: A single-click interface modeling of 3d monsters,” Graphical Models, vol. 76, no. 5, pp.
for mesh segmentation,” IEEE Transactions on Visualization and 376–389, 2014.
Computer Graphics, vol. 18, no. 8, pp. 1304–1312, 2012. [49] C. Cao, Q. Hou, and K. Zhou, “Displaced dynamic expression
[25] M. Meng, L. Fan, and L. Liu, “icutter: a direct cutout tool for 3d regression for real-time facial tracking and animation,” ACM
shapes,” Computer Animation and Virtual Worlds, vol. 22, no. 4, pp. Transactions on graphics (TOG), vol. 33, no. 4, p. 43, 2014.
335–342, 2011. [50] C. H. Shen, H. Fu, K. Chen, and S. M. Hu, “Structure recovery
[26] Y. Zheng and C. Tai, “Mesh decomposition with crossboundary by part assembly,” Acm Transactions on Graphics, vol. 31, no. 6, pp.
brushes,” Computer Graphics Forum, vol. 29, no. 2, pp. 527–535, 1–11, 2012.
2010. [51] X. Xie, K. Xu, N. J. Mitra, D. Cohen-Or, and B. Chen, “Sketch-to-
[27] O. K. Au, Y. Zheng, M. Chen, P. Xu, and C. Tai, “Mesh segmen- design: Context-based part assembly,” in Computer Graphics Forum,
tation with concavity-aware fields,” IEEE Transactions on Visualiza- 2013, p. 233245.
tion and Computer Graphics, vol. 18, no. 7, pp. 1125–1134, 2012. [52] S. Chaudhuri, E. Kalogerakis, S. Giguere, and T. Funkhouser,
[28] X. Chen, A. Golovinskiy, and T. Funkhouser, “A benchmark for 3d “Attribit: content creation with semantic attributes,” in Proceedings
mesh segmentation,” international conference on computer graphics of the 26th annual ACM symposium on User interface software and
and interactive techniques, vol. 28, no. 3, p. 73, 2009. technology. ACM, 2013, pp. 193–202.
1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2018.2839685, IEEE
Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. XX 13
[53] L. Fan, R. Wang, L. Xu, J. Deng, and L. Liu, “Modeling by drawing [77] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick,
with shadow guidance,” Computer Graphics Forum, vol. 32, no. 7, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture
pp. 157–166, 2013. for fast feature embedding,” in Proceedings of the 22nd ACM inter-
[54] Y. Wang, M. Gong, T. Wang, D. Cohen-Or, H. Zhang, and B. Chen, national conference on Multimedia. ACM, 2014, pp. 675–678.
“Projective analysis for 3d shape segmentation,” Acm Transactions [78] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based
on Graphics, vol. 32, no. 6, p. 192, 2013. learning applied to document recognition,” Proceedings of the IEEE,
[55] W. Xu, Z. Shi, M. Xu, K. Zhou, J. Wang, B. Zhou, J. Wang, vol. 86, no. 11, pp. 2278–2324, 1998.
and Z. Yuan, “Transductive 3d shape segmentation using sparse [79] S. Yang, L. Li, S. Wang, W. Zhang, and Q. Huang, “A graph
reconstruction,” Computer Graphics Forum, vol. 33, no. 5, pp. 107– regularized deep neural network for unsupervised image rep-
115, 2014. resentation learning,” in Computer Vision and Pattern Recognition,
[56] Z. Xie, K. Xu, L. Liu, and Y. Xiong, “3d shape segmentation and 2017, pp. 7053–7061.
labeling via extreme learning machine,” Computer Graphics Forum, [80] L. Kaplansky and A. Tal, “Mesh segmentation refinement,” in
vol. 33, no. 5, pp. 85–95, 2014. Computer Graphics Forum, vol. 28, no. 7. Wiley Online Library,
[57] J. Lv, X. Chen, J. Huang, and H. Bao, “Semi-supervised mesh 2009, pp. 1995–2003.
segmentation and labeling,” Computer Graphics Forum, vol. 31, no.
7pt2, pp. 2241–2248, 2012.
[58] V. G. Kim, W. Li, N. J. Mitra, S. Chaudhuri, S. Diverdi, and
T. Funkhouser, “Learning part-based templates from large collec-
tions of 3d shapes,” Acm Transactions on Graphics, vol. 32, no. 4,
p. 70, 2013.
[59] O. Sidi, O. Van Kaick, Y. Kleiman, H. Zhang, and D. Cohen-Or,
“Unsupervised co-segmentation of a set of shapes via descriptor-
space spectral clustering,” in SIGGRAPH Asia Conference, 2011, p.
126. Xiaojie Xu is a postgraduate student at the
[60] R. Hu, L. Fan, and L. Liu, “Cosegmentation of 3d shapes via School of Information Science and Technolo-
subspace clustering,” Computer Graphics Forum, vol. 31, no. 5, pp. gy(SIST), ShanghaiTech University. He obtained
1703–1713, 2012. his B.E. from College of Information Science
[61] P. Shilane, P. Min, M. Kazhdan, and T. Funkhouser, “The princeton & Electronic Engineering at Zhejiang Universi-
shape benchmark,” in Shape Modeling International, 2004, pp. 167– ty. His research interests include geometry seg-
178. mentation, deep learning, and virtual reality.
[62] J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio,
R. Moore, A. Kipman, and A. Blake, “Real-time human pose
recognition in parts from single depth images,” in IEEE Conference
on Computer Vision and Pattern Recognition, 2011, pp. 1297–1304.
[63] K. Xu, V. G. Kim, Q. Huang, and E. Kalogerakis, “Datadriven
shape analysis and processing,” in SIGGRAPH Asia, 2015, p. 4.
[64] Y. Bengio et al., “Learning deep architectures for ai,” Foundations
and trends
R in Machine Learning, vol. 2, no. 1, pp. 1–127, 2009.
[65] G. Hinton, “A practical guide to training restricted boltzmann
machines,” Momentum, vol. 9, no. 1, p. 926, 2010.
[66] C. Farabet, C. Couprie, L. Najman, and Y. Lecun, “Learning hier-
archical features for scene labeling,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1915–29, 2012. Chang Liu is a postgraduate student at the
[67] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classifi- School of Information Science and Technol-
cation with deep convolutional neural networks,” in International ogy(SIST), ShanghaiTech University. She ob-
Conference on Neural Information Processing Systems, 2012, pp. 1097– tained her B.E. from the School of Electronic
1105. Engineering at Xidian University. Her research
[68] H. Maron, M. Galun, N. Aigerman, M. Trope, N. Dym, E. Yumer, interests include 3D object reconstruction, deep
V. G. Kim, and Y. Lipman, “Convolutional neural networks on learning on image classification.
surfaces via seamless toric covers,” SIGGRAPH, 2017.
[69] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional net-
works for semantic segmentation,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 39, no. 4, pp. 640–651, 2014.
[70] P.-S. Wang, Y. Liu, Y.-X. Guo, S. Chun-Yu, and X. Tong, “O-
cnn: Octree-based convolutional neural networks for 3d shape
analysis,” ACM Transactions on Graphics (SIGGRAPH), vol. 36,
no. 4, 2017.
[71] H. Hoppe, “Progressive meshes,” in Proceedings of the 23rd annual
conference on Computer graphics and interactive techniques. ACM,
1996, pp. 99–108.
[72] G. Ran and D. Cohen-Or, “Salient geometric features for partial
shape matching and similarity,” Acm Transactions on Graphics, Youyi Zheng is a Researcher (PI) at the State
vol. 25, no. 1, pp. 130–150, 2006. Key Lab of CAD&CG, College of Computer Sci-
[73] S. Belongie, J. Malik, and J. Puzicha, “Shape matching and object ence, Zhejiang University. He obtained his PhD
recognition using shape contexts. ieee trans pami,” IEEE Trans- from the Department of Computer Science and
actions on Pattern Analysis and Machine Intelligence, vol. 24, no. 4, Engineering at Hong Kong University of Science
2002. & Technology, and his M.Sc. and B.Sc. degrees
[74] L. Shapira, S. Shalom, A. Shamir, D. Cohen-Or, and H. Zhang, from the Department of Mathematics, Zhejiang
“Contextual part analogies in 3d objects,” International Journal of University. His research interests include geo-
Computer Vision, vol. 89, no. 2-3, pp. 309–326, 2010. metric modeling, imaging, and human-computer
[75] A. E. Johnson and M. Hebert, “Using spin images for efficient interaction.
object recognition in cluttered 3d scenes,” IEEE Transactions on
pattern analysis and machine intelligence, vol. 21, no. 5, pp. 433–449,
1999.
[76] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy
minimization via graph cuts,” IEEE Transactions on pattern analysis
and machine intelligence, vol. 23, no. 11, pp. 1222–1239, 2001.
1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.