0% found this document useful (0 votes)
27 views

3D Tooth Segmentation and Labeling Using Deep Convolutional Neural Networks

This document summarizes a research paper that proposes using deep convolutional neural networks (CNNs) to segment and label 3D dental models. The researchers extract geometry features from each face of a dental mesh and use those as inputs to train a 2-level hierarchical CNN model. The first level labels faces as teeth or gingiva, while the second level labels faces between individual teeth. They also introduce a boundary-aware simplification method to improve feature extraction efficiency. Evaluation shows their deep learning approach achieves 99.06% accurate labeling, outperforming traditional geometry-based methods. It provides a generic, robust solution to the challenges of dental model segmentation.

Uploaded by

Uroosa Sehar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

3D Tooth Segmentation and Labeling Using Deep Convolutional Neural Networks

This document summarizes a research paper that proposes using deep convolutional neural networks (CNNs) to segment and label 3D dental models. The researchers extract geometry features from each face of a dental mesh and use those as inputs to train a 2-level hierarchical CNN model. The first level labels faces as teeth or gingiva, while the second level labels faces between individual teeth. They also introduce a boundary-aware simplification method to improve feature extraction efficiency. Evaluation shows their deep learning approach achieves 99.06% accurate labeling, outperforming traditional geometry-based methods. It provides a generic, robust solution to the challenges of dental model segmentation.

Uploaded by

Uroosa Sehar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2018.2839685, IEEE
Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. XX 1

3D Tooth Segmentation and Labeling


using Deep Convolutional Neural Networks
Xiaojie Xu1 , Chang Liu1 , and Youyi Zheng†

Abstract—In this paper, we present a novel approach for 3D dental model segmentation via deep Convolutional Neural Networks
(CNNs). Traditional geometry-based methods tend to receive undesirable results due to the complex appearance of human teeth (e.g.,
missing/rotten teeth, feature-less regions, crowding teeth, extra medical attachments, etc.). Furthermore, labeling of individual tooth is
hardly enabled in traditional tooth segmentation methods. To address these issues, we propose to learn a generic and robust
segmentation model by exploiting deep Neural Networks, namely NNs. The segmentation task is achieved by labeling each mesh face.
We extract a set of geometry features as face feature representations. In the training step, the network is fed with those features, and
produces a probability vector, of which each element indicates the probability a face belonging to the corresponding model part. To this
end, we extensively experiment with various network structures, and eventually arrive at a 2-level hierarchical CNNs structure for tooth
segmentation: one for teeth-gingiva labeling and the other for inter-teeth labeling. Further, we propose a novel boundary-aware tooth
simplification method to significantly improve efficiency in the stage of feature extraction. After CNNs prediction, we do graph-based
label optimization and further refine the boundary with an improved version of fuzzy clustering. The accuracy of our mesh labeling
method exceeds that of the state-of-art geometry-based methods, reaching 99.06% measured by area which is directly applicable in
orthodontic CAD systems. It is also robust to any possible foreign matters on model surface, e.g., air bubbles, dental accessories, and
many more.

Index Terms—Boundary-aware simplification, 3D mesh segmentation, deep convolutional neural networks, fuzzy clustering.

1 I NTRODUCTION

A S computers developed, computer-aided-design(CAD)


systems appear in more and more fields. They take
advantage of hardware-supported computer graphics tech-
lem, which is described as the two neighbouring teeth are
misaligned, hence the boundary between them is implicit,
which results in the disappearance of normal interstices.
nology to effectively and efficiently do the tasks which are Furthermore, missing/rotten teeth and holes are commonly
traditionally of high labour intensity. Most of the dental clin- seen among people, which bring in additional challenges.
ics around the world use CAD systems to develop treatment These properties play the devil with traditional geometry-
plans, e.g., orthodontics. Orthodontic CAD systems play an based methods. Curvature-based methods tend to divide a
important role in modern dentistry. They accept a three surface into several parts along the concave discontinuity of
dimension (3D) dental model, specified by patients’ own the tangent plane, and thus, are not reliable towards feature-
impression, as input and assist dentists to extract, move, less regions with smooth varying curvatures, e.g., in the
delete, and rearrange teeth for simulating the treatment’s lingual portion of the tooth [1]. Another challenge is the
outcome. With an automatic processing system, dentists will noise generated during the scanning of plaster models, such
be set free from the time-consuming and boring task. as air bubbles and the inaccuracy of plaster models, which
Tooth segmentation and labeling is the most fundamen- usually happen deep in the mouth, around the wisdom teeth
tal and critical component of these CAD systems, which area. Moreover, some patients may wear dental accessories
remains unsolved. The major challenges are as follows. when making the radiology. Such foreign matters disturb
As a part of human body, teeth, similar to fingerprints, the feature distribution of each individual tooth, thus have
vary from one person to another. There is no deterministic an adverse impact on the segmentation task.
parametric description to cover any individual tooth of all Due to these challenges, traditional geometry-based
people. Besides, dental models from patients always suffer methods are less suitable for tooth segmentation task in
from strange abnormalities, for example, the crowding prob- practice, as they are lack of robustness to complex tooth
shapes and tooth arrangements. Besides, other existing
1 co-first author
image-based or interactive methods [2], [3], [4] are either
† corresponding author. labour-intensive or not accurate enough, which makes the
• X. Xu and C. Liu are with Chinese Academy of Sciences, Shanghai Insti-
development of an automatic, generic, and accurate tooth
tute of Microsyst & Information Technology, Shanghai 200050, Peoples segmentation framework demanding.
R China. And they are also with ShanghaiTech University, School of In this paper, we propose a data-driven method for 3D
Information Science & Technology, Shanghai 201210, and University of dental mesh segmentation. In particular, we exploit deep
Chinese Academy of Sciences, Peoples R China.
E-mail: xuxj,[email protected] Convolutional Neural Networks(CNNs) model for the task
• Y. Zheng is with the State Key Lab of CAD&CG, Zhejiang University, of tooth segmentation. The network is designed for labeling
Hangzhou, China, 310058. each tooth triangle. We extract 600-dimension geometry
E-mail: [email protected]
features for each mesh face and pack them into a 20 × 30

1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2018.2839685, IEEE
Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. XX 2

image as in [5]. By doing so, we get a sequence of image- [7]. These approaches rely on geometry information more or
label pairs as CNNs input. Usually, 50% of triangle faces less. They can be grouped into two categories: region-based
on a dental mesh belong to gingiva, others belong to 14˜16 and boundary-based segmentation approaches. Region-
teeth. Besides, the quality of the segmentation boundary based approaches attempt to partition meshes into sever-
is vital in the subsequent orthodontic treatments (e.g., root al regions, where mesh faces share similar characteristics,
planning). To deal with data imbalance and improve bound- while faces in different regions differ greatly. Well-known
ary accuracy, we extensively experiment with various net- region-based works include K-means [8], clustering [9],
work structures and finally arrive at a hierarchical labeling decomposition [10], fitting primitives [11], watersheds [12],
architecture, which consists of two CNNs, one for teeth- random walks [13], fast marching [14]. While boundary-
gingiva labeling (we call it TGCNNs), the other for inter- based approaches concentrate on finding the optimal curves
teeth labeling (we call it TTCNNs). As raw dental mod- to separate two neighbouring parts. They determine final
els used in orthodontic treatments are usually very large curves by maximizing the difference between parts. Main
(200,000˜400,000 triangles) which poses a large overhead methods include normalized and randomized cut [15], core
on the feature extraction step (up to dozens of hours). To extraction [16], shape diameter function [17], and active
improve the efficiency of the feature extraction while leaving contours or scissoring [18], [19]. However, geometry-based
the quality of CNNs unaffected, we design a boundary- methods tend to fail when meshes become special and
aware mesh simplification algorithm and a correspondence- complicated.
free mapping algorithm to pre-process and post-process the As meshes vary from each other in terms of appearance,
dental meshes. it is impossible to separate a mesh into desired parts with
For a newly come dental mesh S , which is to be labeled, fully automatic approach. While manual segmentation is
we first simplify it to a mesh S 0 using our boundary-aware tiring as well as time-wasting, sketch-based semi-automatic
simplification algorithm. Then we extract the feature images methods become popular. They provide simple and user-
per face on S 0 and feed them into the TGCNNs net for friendly interfaces for users to add their suggestions as
teeth-gingiva separation. In an intermediate step, we do start points or optimization constraints. Literature [20], [21]
label optimization to smooth the labeling boundary. We then briefly described numbers of sketch-based segmentation
apply the TTCNNs to label each individual tooth face. After methods. For example, Ji et al. [22] introduced an improved
that, we do label optimization to smooth the labeling results region-growing algorithm for segmentation. Fan et al. [23]
again and back-project the labeling result to the original adopted an efficient local graph-cut-based optimization al-
model S . Finally, we optimize the segmentation boundary gorithm and received satisfying results. Studies [24], [25],
via an improved fuzzy clustering to achieve the final result. [26], [27] integrated harmonic field theory with sketch-
Our data-driven tooth labeling method is capable of based segmentation, which possess solid theoretical basis
segmenting various dental models regardless of their geo- and work well. However, sketch-based methods require a
metric variations. It is not only effective and efficient, but balance between user input and automatic computation.
also, to the best of our knowledge, the most accurate tooth Since 3D mesh databases, e.g., the Princeton Segmenta-
segmentation and labeling model in the literature so far tion Benchmark [28], were released, data-driven methods
which achieves a practical precision of 99.06%. The main have been proposed for mesh segmentation. Both non-
contributions of our method are: supervised and semi-supervised learning methods try to
• A simultaneous and robust teeth segmentation and learn a model for separating a mesh meaningfully from
labeling framework which achieves 99.06% accuracy the database and verify them on new meshes. Some recent
and can be directly applied in industrial orthodontic works include [29], [30], [31].
CAD systems; Dental Mesh Segmentation Numerous segmentation
• A carefully designed 2-level hierarchical CNNs mod- approaches have been proposed to separate dental mod-
el trained on 1, 000 dental meshes, which is robust els. According to the input format, we divide the existed
and generalize well on new data; approaches into two categories, 2D image and 3D mesh.
• A boundary-aware mesh simplification method to Researchers have proposed effective segmentation algo-
enable efficient feature extraction; rithms based on the 2D projection images. Yamany et al.
• An improved fuzzy clustering boundary optimiza- [2] encoded the curvature and surface normal information
tion algorithm coupling network prediction with ge- into a 2D image, and designed an image segmentation tool
ometry optimization. to extract structures of high/low curvature. By iteratively
removing these structures, individual teeth surfaces are
obtained. Similarly, Kondo et al. [32] presented an auto-
2 R ELATED W ORK mated method for tooth segmentation from 3D digitized
This paper proposes a data-driven method for dental mesh image captured by a laser scanner. Grzegorzek et al. [33]
segmentation. We first review the literature on general mesh presented a multi-stage approach for tooth segmentation
segmentation. Then, we discuss approaches for dental mesh from 3D dentition surfaces based on a 2D model-based
segmentation methods. Finally, we briefly introduce recent contour retrieval algorithm. Wongwaen et al. [34] converted
data-driven shape analysis methods in geometry processing. the 3D-panoramic to 2D space to find the cutting points for
General Mesh Segmentation 3D mesh segmentation is segmentation of individual tooth, followed by converting
a fundamental task for mesh understanding and processing. the 2D image back to 3D space for remaining operations.
It divides 3D shapes into several parts under reasonable Literature [3] subdivided those methods, which take 3D
criteria. Common approaches are mentioned in surveys [6], mesh as input, into 3 categories. The first is curvature-

1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2018.2839685, IEEE
Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. XX 3

Train network(without gingiva faces)

Label-based Feature Train


simplification extraction network Teeth-gingiva Inter-teeth
labeling labeling
Features caffemodel caffemodel

Original models Simplified models Teeth-gingiva Teeth


prediction prediction

Non-label Feature Label


simplification extraction optimization
Features

Original models Simplified models Predict result Refinement result Predict result
Label
optimization
Boundary ANN Back- Sticky-teeth
optimization projection separation

Final result

Fig. 1. The pipeline of our method. Our method takes a raw teeth model as input, simplifies the model, extracts its features and feeds into a 2-level
hierarchical network to generate a label prediction, followed by label optimization and back-projection to get the final segmentation result.

based method, which separates the dental models relying on [60]. In order to design a data-driven algorithm, sufficient
surface curvature. Yuan et al. [35] analysed the regions of the shape databases are necessary. Famous databases include
3D dental model and classify them based on the minimum [28], [61] which are maintained by universities and [58]
curvatures of the surface. Zhao et al. [4] proposed an inter- which is collected from the web. Another way to gather
active segmentation method based on curvature values of adequate shape models is to create synthesis dataset, e.g.,
the triangle mesh. System designed by [36] requires users to [62]. Xu et al. [63] made a comprehensive survey on the
provide a one-time setting of a certain curvature threshold existing online data collections.
via an intuitive slider. Others, including snake-based active Traditional learning methods are mainly designed for
contour method [37], ”fast marching watershed” method finding a different representation of 3D mesh. In recent
[38] and morphologic skeleton extraction method [1] are all years, deep neural networks show their excellent perfor-
related to curvature information to some extent. mance in extracting latent features, as well as automatically
The second is contour-line-based method, which is a building mappings between input and output [64], [65].
relative accurate segmentation method as it allows human Especially, deep CNNs do well in image-format-input tasks
interaction. In studies [39] and [40], users assign the bound- [66], [67]. Researchers in computer graphics community are
ary between each tooth and gum in the form of surface making efforts to feed 3D mesh data into CNNs. Guo et al.
points by mouse click. Then the algorithm connects each [5] extracted a 20 × 30-dimension geometric feature image
pair of the neighbouring points depending on the geodesic for each triangle face and feed it into a typical classification
information. The generated segments are desired boundary. network together with the ground-truth face label. Maron
Although the boundary is accurate, this method relies too et al. [68] parameterized the sphere-like mesh to get a 2D
much on user interaction. Users have to rotate/translate the image and use it to train a modified FCN-like [69] network.
mesh and zoom in/out repetitively to make their sugges- [5], [68], [70] show that if well designed, CNNs are also
tions, which is tiring and time-consuming. capable of 3D mesh segmentation.
The third is harmonic-field-based method. Zou et al.
[3] proposed a harmonic-field-based segmentation method 3 OVERVIEW
which requires only a limited number of surface points as Fig.1 illustrates our pipeline. Due to the computational
prior. It saves users’ time and achieves reasonable results. burden of features for large meshes [29], we do mesh simpli-
Data-driven Shape Analysis Recently, data-driven ge- fication for each dental model to reduce the face numbers.
ometry processing algorithm has been developed both in To account for the preservation of informative geometric
computer graphics and computer vision communities [41]. features for segmentation, we design a boundary-aware
The commonly known shape analysis techniques can be mesh simplification algorithm to maintain the features along
grouped into several topics, such as classification [42], [43], the teeth-gingiva and tooth-tooth regions.
[44], matching [45], [46], [47], reconstruction [48], [49], [50], We extract global and local features of each face on
modeling and synthesis [51], [52], [53]. Data-driven segmen- the simplified model. We use the similar set of features as
tation methods are classified into supervised [54], [55], [56], in [29], and add positional features to boost the network
semi-supervised [30], [57], [58] and unsupervised [58], [59], performance. These features are reorganized into a 20 × 30

1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2018.2839685, IEEE
Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. XX 4

image to feed into the network. We design a 2-level hi- gingiva boundary, it is usually continuous, i.e., the green
erarchical network for face labeling. We train two CNNs points in Fig.2a.
with similar layers for teeth-gingiva and inter-teeth labeling We denote a dental mesh as G = hV, E, Fi, where V is the
respectively. The CNN architecture consists of convolution, vertex set, E edge set, and F face set. Label set L consists of
pooling and fully-connected layers, with carefully tuned the label li ∈ [0, 1] of each triangle face, where 0 represents
parameters, e.g., the number of layers and the activation definite gingiva and 1 represents definite teeth. Labels L are
functions. determined by optimizing
Right after each network prediction, we employ label X X
optimization to correct the wrongly predicted labels which arg min E1 (li ) + λ E2 (li , lj ) (1)
{li ,i∈F }
usually appear on the boundary. We further improve the i∈F i,j∈F
boundary between teeth and gingiva as well as between
individual teeth by graph optimization. We also employ where λ is a non-negative constant to balance the two terms
PCA analysis to split sticky-teeth (i.e., pairs of teeth which (empirically set λ = 100).
are adjacent and get the same label after optimization) The unary term is defined as
which occasionally appear in regions with missing/rotten
E1 (li ) = α1 Eu1 (li ) + α2 Eu2 (li ) + α3 Eu3 (li ) (2)
teeth or the front teeth. Finally, we back-project the labels of
simplified model onto the original model and further refine s.t. α1 + α2 + α3 = 1
the boundary. Below we discuss various algorithmic design
where Eu1 , Eu2 , Eu3 are the probability energy given by
choices in detail.
the three characteristics mentioned before, which are z-axis
coordinate, geodesic distance to the nearest sharp points and
4 A LGORITHM Euclidean distance to mesh center in XY plane. Specifically,
each item is defined as
4.1 Boundary-aware Tooth Simplification 
 Eu1 (li = 0) = (zi − zmin )/H
A dental model, acquired by CT scanning, is very precise, 
 Eu2 (li = 0) = 1r− gdi /gdmax



containing more than 200, 000 faces. A direct computation
of geometric features [29] on such fine model is extremely xi − xmesh 2 yi − ymesh 2 (3)
 Eu3 (li = 0) = ( ) +( )
time-consuming in either training or testing stage (Section 0.5L 0.5W




4.2). Mesh simplification, therefore, is necessary for pre- E1 (li = 1) = 1 − E1 (li = 0)

processing dental meshes. Traditional feature-preserving
mesh simplification methods tend to lose semantic informa- where xi , yi , zi are the x, y, z-axis coordinate of triangle face
tion, e.g., details on the teeth-gingiva boundary. Clear and i. [L, W, H] are the length (in the x-axis direction), width
accurate tooth-tooth, together with teeth-gingiva boundary (in the y-axis direction) and height (in the z-axis direction)
plays an important role in the learning procedure (Section of the axis-aligned bounding box of a dental model. gdi is
5). Thus we design a boundary-aware tooth simplification the geodesic distance from face i to the nearest sharp point
algorithm to preserve such semantic information as much and gdmax is the maximum value for all gdi . xmesh and
as possible. ymesh are the x, y-axis coordinate of mesh center (set α1 =
To preserve the boundary information, first we need to 0.4, α2 = 0.5, α3 = 0.1). We detect the sharp feature points
identify them. Our aim is to divide a dental model into as local shape extremities [3]. The probability field is shown
three regions: gingiva, teeth, and teeth-gingiva boundary, in Fig.2b.
shown in Fig.2d. The gingiva region occupies a large part of Faces on teeth-gingiva boundary usually have negative
triangle faces but provides little discriminative information curvature. We use pairwise term E2 to measure this as
for classification as it is feature-less. The teeth region pos-
1

sesses more important geometric details than gingiva, and 

AD(αij )
, li 6= lj
should not be simplified too much. The boundary region is E2 (li , lj ) = 1 + avg(AD) (4)
the most important for segmentation, whose details should

0, li = lj

be retained as much as possible. To this end, we modify the
traditional mesh simplification method [71]. We multiply The angular distance is AD(αij ) = η(1 − cos αij ), where
the edge-collapse-cost with different weights in different αij is the angle between the normal of face i and j . The
regions. definition here is the same as [10]. For convex angles, η =
Our task now is to identify those regions. We observe 0.05, and for concave angles η = 1.
that the dental meshes are usually scanned on the same We use graph cuts algorithm to solve the optimization
CT platform, whose bottom parts are planar, as shown in problem in Eqn. 1. As a result, the mesh is divided into
Fig.2a. This makes the classification task easier. We first two regions: teeth and gingiva. We extend the teeth-gingiva
identify the largest plane using a greedy floodfill algorithm boundary using Breadth First Search(BFS)(5 iterations in our
and align the normal of the largest plane with the z-axis. experiments) to get three regions (Fig.2d). Then we shall
It is easy to align x-axis and y-axis by PCA analysis. Then conduct a detail-preserved mesh simplification (Fig.2c). Em-
for an upper-part dental mesh, shown in Fig.2, the majority pirically, we set the collapsing weights for edges in gingiva,
of teeth faces appear in regions which are of larger z-axis teeth, and gingiva-teeth boundary regions as 1, 20, and 500
coordinate values, far from the mesh center, and close to respectively. The simplification ratio is 0.2. The simplified
teeth sharp points [3], i.e., the red points in Fig.2a. For teeth- model has around 40, 000 triangle faces.

1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2018.2839685, IEEE
Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. XX 5

with the most probable class label. We exactly follow this


key idea.

4.3.1 Hierarchical mesh labeling


A dental model (upper or lower part) consists of gingiva
and sixteen teeth, however, nearly half of the triangle faces
belong to gingiva, while others are subdivided into sixteen
groups. This imbalanced label distribution possesses diffi-
(a) (b)
culties in the network training. We experimented with a few
design variants (refer to Section 5), and found the following
2-level hierarchical network architecture achieves the best
performance, as shown in Fig.1.
The first level of our hierarchy is to conduct teeth-
gingiva separation. We train a 2-label classification network.
The second level is to separate each tooth. Taking the
symmetry of dental models into consideration, we train
(c) (d)
an 8-label classification network. We later use geometric
information to separate the left and right part and finally
Fig. 2. (a) Sharp points are colored in red and negative curvature get a dental model with seventeen labels. The structure of
points are colored in green. (b) The probability field of belonging to
the teeth. Light color indicates large probability. (c) Our simplification
the two networks are identical, as shown in Table 1.
result. (d) Three regions. After optimizing Eqn. 1 by graph cuts, we
roughly divide the model into teeth and gingiva regions. Then we extend 4.3.2 Networks structure
the teeth-gingiva boundary using BFS and get the three regions, i.e., Our network takes the reorganized 20 × 30 feature image,
gingiva(gray), tooth(blue), the area near teeth-gingiva boundary(green).
denoted as X0 , generated in Section 4.2, as input and
outputs face label L = {0, 1} or L = {0, · · · , 7}. We divide
4.2 Feature Extraction the network layers into three blocks: two convolution blocks
(CB) and one fully-connected block (FB), see Table 1. In CB1,
After mesh simplification, we extract 8 different types of
the convolution layer conv1 is applied to X0 as
geometric features, which form a 600-dimension vector for
each triangle face. The feature vector is then reorganized Yil = Wil ⊗ Xl−1 + bli , i = 1, . . . , 16, l = 1, (5)
into a 20 × 30 feature image, so that it can be fed into
where ⊗ indicates the convolution operation. The bias bli is
CNNs. These geometric features include curvature(CUR)
uniformly added to each pixel of Wil ⊗ Xl−1 . The output Yil
[72], PCA feature(PCA) [29], shape context(SC) [73], shape
then goes through an activation function, called parametric
diameter function(SDF) [74], spin image(SI) [75] and coor-
ReLU (pReLU)
dinates(COORD). Other than the coordinates, others are the
same as those used in [5]1 . These features are computed

yi , yi > 0
under several scales to capture both local and global infor- f (x) = (6)
ai yi , yi ≤ 0
mation. They make up the first 593 dimensions of the feature
vector. As for the last 7 dimensions, we introduce coordinate where ai is a trainable parameter. Then we achieve sixteen
features based on the natural distribution of dental models. feature maps {X1i }16
i=1 , which is computed by
For example, incisors are always found in a specified rel- Xli = f (Yil ), l=1 (7)
ative location of all dental models. The coordinate feature
consists of 3D Cartesian coordinates (x, y, z) (after align- For conv2, we figure out {X2i }32i=1 by setting l = 2 in
ment as in Section 4.1), spherical coordinate (ρ, θ, φ) and the Eqn. 5 and Eqn. 7. After that, we feed the thirty-two feature
absolute value of φ.
TABLE 1
4.3 Networks Architecture Network structure.

In recent years, CNNs have gone through rapid develop-


layer parameters
ment. Traditional CNNs have become deeper and wider to
learn feature as much as possible from the large-scale train- conv1 3 × 5, 16
ing data. Besides, specially designed network structures are CB1 conv2 3 × 3, 32
proposed for high level computer vision tasks. However, pool1 2 × 2, max
basic CNN structures are sufficient for studies like [5]. Such
conv3 3 × 3, 64
CNNs possess clear structures with fewer parameters to
CB2 conv4 3 × 3, 128
train. The training complexity decreases a lot. Therefore, we
pool2 2 × 2, max
design a CNN architecture based on [5] for dental mesh
segmentation. Notice that the CNNs proposed by [5] is a fc1 100
typical architecture for classification task, each face is set FB dropout 0.5
fc2 2(or 8)
1. Source code can be found at http://people.cs.umass.edu/∼kalo/ softmax
papers/LabelMeshes/.

1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2018.2839685, IEEE
Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. XX 6

maps to pooling layer pool1, which picks up the maximum


value of each 2 × 2 non-overlapping patch to represent it.
The feature maps output by pool1, noted as {X b 2 }32 is 1/4
i i=1
of the input size.
The block CB2 has similar composition with CB1.
{Xb 2 }32 acts as the input of conv3. We reorganize the out-
i i=1
put of pool2, denoted as {X b 4 }128 , into a vector to feed into
i i=1
fully-connected layer fc1. In order to simplify the network’s
complexity and reduce over-fitting, we randomly drop out (a) (b)
50% of the layer nodes during each training iteration. Then,
fc2, together with softmax layer generate the probability
vector of length 2 (or 8).

4.4 Label Optimization


CNNs prediction generates a label and a probability vector
for each face on the testing mesh. The prediction results
are rough and inaccurate on the boundary. Small fragments (c) (d)
appear where they are not supposed to be (Fig.3a, 3c). To
fix this, we adopt the multi-label graph cuts method [76] to Fig. 3. (a) Teeth-gingiva CNNs prediction. (b) Teeth-gingiva refinement
after label optimization. (c) Inter-teeth classification with eight labels
refine the prediction result after each network prediction. after CNNs prediction. (d) Inter-teeth refinement after label optimization.
Triangle face i is labeled by CNNs with li under the
probability of pi . The neighbouring faces of i are denoted
as Ni . Label optimization problem is solved by optimizing For each tooth, if its longest breadth axis is longer than
X X a constant τ1 (which is set as 1.4 times the mean value per
arg min ξU (pi , li ) + λ ξS (pi , pj , li , lj ) (8) tooth, calculated by doing PCA analysis on 1000 training
{li ,i∈F }
i∈F i∈F ,j∈Ni
data), we should break it into two teeth and repeat the
where λ is a non-negative constant (λ = 20 for teeth-gingiva processes if needed. This is regarded as another graph
classification, λ = 100 for inter-teeth classification). The first cuts problem, and can be solved by the similar labeling
term is defined as ξU (pi , li ) = − log(pi (li )). The penalty ξU optimizing procedure as above. We refer to the appendix
rises when probability pi (li ) drops. For a dental mesh, the for details.
teeth-gingiva boundary tends to be concave, so we define
the second term as


 0, li = lj
θij



ξS (pi , pj , li , lj ) = − log( )φij , li 6= lj , θij is concave (9)
 π
 −βij log( θij )φij ,


li 6= lj , θij is convex

π
with βij = 1 + |n̂i · n̂j | and φij = kci − cj k2 , where n̂i (a) (b) (c)
is the face normal of triangle i, ci is the barycenter of i,
and θij is the dihedral angle between i and j . We add the Fig. 4. (a) Sticky-teeth. (b) PCA analysis. (c) Separated teeth.
term βij to enforce the optimization to favor concave regions
as the teeth-gingiva and tooth-tooth boundaries are usually
concave. After label optimization, these small fragments will 4.5 ANN Mapping
be removed, shown in Fig.3b,3d.
Handling sticky cases. Due to the fact that teeth vary After labeling the simplified mesh properly, we need to
among peoples in shape, size, number of missing parts project the results back to the original detailed model. As
and holes, the hierarchical structure occasionally leads to the two meshes are coordinate-aligned, we employ ANN2
incorrect predictions around missing/rotten teeth. Besides, mapping, followed by additional label optimization (Eqn.
the central incisors are of the same label inherently (Fig.3d). 8) on the boundary, shown in Fig.5. It should be noticed
In these cases (it happens at the rate of 7.72%, statistics that the probabilities from CNNs prediction are projected at
collected from 150 test cases), the multi-label graph cuts the same time.
algorithm labels two or more nearby teeth as the same. We
take special treatments to these problems. 4.6 Boundary Smoothing
After inter-teeth segmentation, to distinguish one tooth Accurate and smooth boundary is very important in dental
from the other, we do PCA analysis for the classified teeth. treatment, as it will affect further processing, such as virtual
Because teeth in our dataset have no root, the height might gingiva generation, tooth rearrangement and dental appli-
vary across models. We leave out the height axis which is ance production. However, neither geometric information
normally aligned with z-axis and only consider the breadth
axis of the teeth which is more reliable in measuring the 2. ANN library can be found at https://www.cs.umd.edu/∼mount/
change. ANN/.

1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2018.2839685, IEEE
Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. XX 7

Dataset. It is hardly possible to build up a dental model


database with adequate amount for network training by
ourselves. Luckily, a professional orthodontic company is
willing to cooperate with us on this project. They provide
us with manually labeled dental meshes, which can be
(a) (b) (c) regarded as the ground truth. We divide the dataset into
3 groups, shown in Table 2. All the results in this chapter
Fig. 5. (a) Segmentation results on simplified mesh. (b) The results on are calculated on Test set.
original mesh after ANN mapping(The boundary is not smooth). (c)
The refinement results on original mesh after label optimization and
boundary smoothing. TABLE 2
Dental mesh dataset.

nor CNNs predictions are reliable to determine the optimal Group Training Validation Test
boundary by itself. We propose to combine them and use
Number of Cases 1000 50 150
an improved fuzzy clustering algorithm [10] to refine the
boundary. The improved fuzzy clustering method takes
both geometry and CNNs predictions into consideration Metrics. We evaluate the performance from two different
and works well on dental models. aspects. One is from a global perspective, i.e., to calculate the
For each tooth labeled by l, we first do BFS from the percentage of the area of correctly labeled faces [5], which is
current boundary to visit a group of nearby faces, which expressed as
make up the fuzzy region. For faces on the border of the X X
region, we collect those faces adjacent to tooth l as set S , Accuracy = at gt (lt )/ at (12)
and others as set T . The modified capacity is t∈T t∈T

1 where at and lt are the area and predicted label of triangle


Capnew (i, j) = 2 Cap(i, j) (10) face t. gt (lt ) is 1 if the prediction is correct, otherwise 0.
1 + exp(− xσ )
As we have emphasized before, boundary accuracy is very
1


 , i, j ∈
/ S, T important in tooth segmentation task. The other evaluation
AD(αij )
Cap(i, j) = 1 + avg(AD) (11) concentrates on the boundary among gingiva and teeth. We
adopt the Directional Cut Discrepancy(DCD) [28] to eval-

∞, else

uate the boundary mean errors. For simplicity, we denote
where x is the geodesic distance from the face center to the upper tooth model as U, and lower tooth model as L in the
nearest current boundary(σ = 0.05). Cap(i, j) is identical following paragraphs.
to [10], and AD(αij ) is defined as before. The equation Design choice of CNNs. As noted earlier, neural net-
shows that faces close to the current boundary are of high works have been proved to be very effective for common
probability to be the final boundary. In other words, we classification tasks. There are many variants that could
consider CNNs predictions as an important factor in deter- achieve similar results. Thus we explicitly explore different
mining the final location of boundaries. CNNs prediction is variants of network architectures to find the best match
usually better than traditional geometry refinement towards to our problem. The alternatives we have tried include
feature-less (e.g., flat) regions (Fig.6). We further refine the LeNet [78], traditional Neural Networks (NNs) with mod-
boundary using a simple shortest-path dynamic program- ified deeper structure, 1-level CNNs with weighted loss
ming, see appendix for details. function (WLF), which is commonly used on imbalanced
datasets, and the locally connected graph autoencoder [79]
(GR-DNN) which has been shown to be very effective in
representation learning. For LeNet, we directly employ it on
the 9-class teeth classification using our 20 × 30 feature as
input. The structure of the second network, NNs, is shown
in appendix, Fig.12a. The input is our 600-dimensional
Fig. 6. (a) The results after ANN mapping. (b) Ground truth. (c) Boundary feature vector. In WLF, we set class weights for imbalanced
smoothing using fuzzy clustering and cuts [10]. As some boundary
areas are very flat, it is hard to maintain CNNs prediction by directly data so that the 2-level structure is reduced to 1. Lastly, we
applying [10]. (d) Our method shows respect to the CNNs prediction implement GR-DNN and modify an intermediate layer of
results. the original structure (from size 2 to size 100) to account for
a proper embedding learning as our feature size is much
larger. We also modify the softmax layer for the final 9-class
5 E VALUATION AND R ESULTS classification (see the structure in Fig.12b).
We conduct a number of experiments to show the effec- The performance of these alternatives are shown in Table
tiveness of our approach and validate various algorithmic 3. Interestingly, all these alternatives have similar perfor-
components of our method, including mesh simplification, mance and all are able to get good results for tooth segmen-
choice of features, network design, label optimization, and tation. However, our method achieves the best performance.
the boundary refinement. Our CNN models are trained with We suspect that a wrap into images allows the network to
Caffe [77] on an i7-6700K CPU together with a GTX 1080 seek more potential unknown relations among the extracted
GPU. features than a single vector could do and a device of two

1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2018.2839685, IEEE
Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. XX 8

stages allows for more comprehensive fine tuning, for exam- bit trivial to evidence the effectiveness of boundary-aware
ple, the label optimization parameter λ. As for GR-DNN, we algorithm. This is true as the boundary regions are indeed
tried two versions of the anchor graph [79], one is to create a small portion of the whole model. However, the visual-
the anchor graph directly using the 600-dimensional feature ization of these results supports that boundary-aware algo-
space and the other is to use the Euclidean distance among rithm indeed helps networks predict much better towards
the faces to account for local connectivity. However, in both boundary region. Fig.7 indicates that our boundary-aware
cases we do not observe much difference, which indicates simplification algorithm effectively prevents over-prediction
learning a better embedding does not necessarily guarantee and under-prediction in boundary regions. Such clean and
a better classification result. accurate boundaries could largely benefit the subsequent
To evaluate the effectiveness of our pipeline, we make a processes in digital tooth treatments, for example, tooth
comparison with method [5], which performs very well in root region reconstruction and tooth alignment, etc. Table 5
general mesh segmentation. We prepare training data in the shows mean errors of boundary-aware algorithm is smaller
same way for the two methods, using mesh simplification than that in uniform simplification method.
followed with feature extraction. In the first comparison, we
compare the two methods as themselves. The differences TABLE 4
lie in the network structure, simplification and optimization Experiments of boundary-aware and uniform simplification.
methods. We use their own features, network and post-
optimization. The output label numbers of the network Training Data U(200) U(1000)
proposed by [5] is set to seventeen, with the purpose of Simplification Boundary Boundary
Uniform Uniform
segmenting all teeth and gingiva at once. After network Method -aware -aware
prediction, [5] only uses α−β swap to do label optimization. TGCNNs 98.55% 98.37% 98.93% 98.62%
Table 3 shows that on large-scale training set, our method TTCNNs 95.24% 95.03% 97.50% 97.25%
outperforms [5] significantly. In the second comparison, we Final 98.61% 98.11% 99.06% 98.81%
verify the usefulness of the COORD feature, that is, we add
a 7-dimension coordinate feature mentioned in 4.2 to [5]
and conduct 17-label classification. The performance rises
significantly (Table 3), which reveals the effectiveness of
our new feature. This is because teeth are usually aligned
and symmetry thus nearby teeth could confuse the network
in the original feature representation. With the coordinate
features, it could largely help distinguish for example, left
and right.
Fig.11 shows some of our representative results. Note Fig. 7. Comparing boundary-aware and uniform simplification. (a) Teeth-
that our method is robust to various complex circumstances gingiva prediction produced by CNNs on uniformly simplified models.
in human teeth such as missing/rotten teeth, irregular teeth (b) The final results of (a). (c) Teeth-gingiva prediction produced on
boundary-aware simplified models. (d) The final results of (c).
arrangements, noise, bubbles, foreign attachments, as well
as feature-less regions, thanks to our well designed network
and the improved boundary refinement algorithm. Our
networks are also very efficient, for a tooth model with TABLE 5
Mean errors of boundary-aware and uniform simplification method.
40, 000 triangles, it takes less than 1s for prediction. The
simplification, ANN mapping, and fuzzy refinement take
Training Data U(200) U(1000)
around 5s. The most time consuming step is the feature
Simplification Boundary Boundary
extraction, which takes 5 minutes per model. However, it Uniform Uniform
Method -aware -aware
is significantly faster than that on a raw model, i.e., if we do Mean errors/mm 0.0939 0.0951 0.0848 0.0867
not use mesh simplification (which is 12 hours per model).
Verification of the effectiveness of boundary-aware
simplification. To prove that the proposed boundary-aware Verification of the effectiveness of label optimization.
simplification is effective, we make two sets of simplified We employ label optimization in two stages. One is after
models(on simplification ratio 0.2). One is simplified using the GTCNNs to smooth the labeling results for the next
boundary-aware algorithm, the other is simplified uniform- stage and the other is after the TTCNNs to again smooth
ly. Then we extract features to build up two sets of training the labeling results. Table 6 shows the effectiveness of label
data, 200 models and 1000 models. The networks are trained optimization.
on two different amount of training data, in order to get rid
of the influence of data scale. Results are shown in Table 4. TABLE 6
To keep consistent with training data, all test data are Verification of label optimization.
simplified under the same rule. Rows named TGCNNs
and TTCNNs are the result of CNNs prediction. Final row CNN Prediction Label Optimization
refers to the final results of the whole pipeline. Judging U(1000) 98.93% 99.43%
TGCNNs
by the numerical value, boundary-aware simplification is L(1000) 98.88% 99.43%
slightly higher than uniform one. To be honest, the dif- U(1000) 97.50% 98.56%
TTCNNs
ference between each (boundary-aware, uniform) pair is a L(1000) 97.37% 98.43%

1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2018.2839685, IEEE
Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. XX 9

TABLE 3
Labeling accuracy of different network variants.

Guo et al. [5] [5] + COORD WLF NN LeNet GR-DNN-600 GR-DNN-3 Ours
U(1000) 84.81% 95.32% 98.81% 98.35% 98.51% 97.04% 97.00% 99.06%
L(1000) 82.95% 95.16% 98.26% 98.04% 98.23% 96.60% 93.42% 98.79%

Verification of the effectiveness of improved fuzzy


refinement. For general mesh segmentation, label optimiza-
tion is usually the last step in the pipeline. For normal
cases, it is effective and efficient. Dental meshes, as a group
of natural models, may have different boundary distribu-
tion from the theoretical optimal. The proposed boundary
smoothing algorithm slightly adjusts the boundary to make (a) (b)
the result much closer to ground truth. The evaluation
criteria here is boundary mean errors, i.e., the average error
of DCD(Sp ⇒ Sgt ) and DCD(Sgt ⇒ Sp ), in which Sp is the
computed boundary and Sgt is the ground-truth boundary.
Table 7 shows its effectiveness.

TABLE 7 (c) (d)


Directional Cut Discrepancy (DCD) before and after improved fuzzy
refinement. Fig. 8. Comparing results before and after improved fuzzy refinement.
Each row represents a dental model. (a)(c) are the results before im-
DCD(Sp ⇒ Sgt )/mm DCD(Sgt ⇒ Sp )/mm proved fuzzy refinement, i.e., the results right after label optimization.
(b)(d) are fuzzy refinement results.
Tooth Model Before After Before After
U(1000) 0.0935 0.0842 0.0960 0.0831
L(1000) 0.0917 0.0849 0.0947 0.0871 100

Compared with traditional methods [1], [3], we get better


boundary mean errors, shown in Table 8. Table 9 shows 98
Accuracy(%)

the distribution of our boundary mean errors. Besides, their


method either requires user interaction or is sensitive to
curvatures while our method does not.
96
TABLE 8
Comparison of boundary mean errors. TGCNNs
TTCNNs
Method Wu et al. [1] Zou et al. [3] Ours 94
Final
Mean errors/mm 0.1218 0.1300 0.0848
0 200 400 600 800 1,000
TABLE 9 Number of Training Models
Boundary error distribution.
Fig. 9. Experiments with increasing training data on 0.1 simplification
Range/mm [0,0.25) [0.25,0.5) [0.5,∞) ratio.
DCD(Sp ⇒ Sgt ) 92.95% 4.78% 2.27%
U(1000)
DCD(Sgt ⇒ Sp ) 93.74% 4.80% 1.46%
DCD(Sp ⇒ Sgt ) 93.04% 4.69% 2.27% models, it is necessary to explore how many models are ex-
L(1000)
DCD(Sgt ⇒ Sp ) 93.57% 4.66% 1.77% actly enough to train a good model. According to Fig.9, 1,000
is enough for teeth-gingiva segmentation as the polyline in
Fig.8 visualizes the improved boundary by the addition- blue increases very slowly within interval [400, 1000]. As for
al improved fuzzy algorithm. It shows that the wrongly inter-teeth segmentation, the accuracy is still growing in the
predicted triangle area between two neighbouring teeth front half of the red polyline. Suppose that we augment the
is corrected after the refinement. Fuzzy refinement is also training data, inter-teeth segmentation and the final results
able to make up the incomplete part on tooth surface. (in brown) may grow for further.
Such mistakes are commonly seen in results produced by Performance with increasing simplification ratio. The
label optimization, thus the improved fuzzy algorithm is original dental mesh contains too many triangle faces with
indispensable. tiny size, which makes feature extraction time-wasting. On
Performance with increasing training data. For most the other hand, an excessively simplified mesh lacks de-
learning-based algorithms, the quantity and quality of train- tailed information, so that the extracted features may be less
ing data play an important role in the learning step. Al- representative. We seek for the relationship between seg-
though the dental mesh dataset contains more than 1,000 mentation accuracy on different simplification ratio, shown

1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2018.2839685, IEEE
Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. XX 10

100 7 C ONCLUSION
In this paper, we propose a learning-based dental mesh
segmentation method. It receives a detailed 3D dental model
98 as input, and outputs the label list of each triangle face.
Accuracy(%)

Different from previous mesh segmentation methods, we


propose a label-free mesh simplification method particularly
tailored for preserving teeth boundary information while
96 greatly improving the efficiency of the networks. Moreover,
we design a hierarchical classification structure based on
two CNNs. Both of them are trained on 1, 000 dental mesh-
94 es. They are not only robust but also generalized well on
new models. Last but not least, an improved fuzzy cluster-
ing boundary refinement algorithm is raised for the final
0.05 0.1 0.15 0.2 boundary adjustment. This simultaneous and robust dental
Simplification Ratio mesh segmentation and labeling framework significantly
TGCNNs (200) TGCNNs (1000) advances the current state-of-the-art geometry-based teeth
TTCNNs (200) TTCNNs (1000) segmentation methods and achieves 99.06% accuracy for
Final (200) Final (1000) upper dental model, and 98.79% for lower dental model.
It directly satisfies the industrial clinical treatment demands
and is also robust to any possible foreign matters on dental
Fig. 10. Experiments with increasing simplification ratio on 200(in
dashed polyline) and 1000(in polyline). The color blue, red and brown model surface, e.g., air bubbles, dental accessories, and
represent teeth-gingiva segmentation, inter-teeth segmentation and the many more.
final results respectively.
ACKNOWLEDGMENTS
We thank reviewers for their insightful comments. We are
in Fig.10. It shows that when the simplification ratio be-
also grateful to all friends for proofreading. This work
comes lower, the accuracy decreases. When simplification
was supported in part by The National Natural Science
ratio decreases to 0.02, the absolute face number is too
Foundation of China No. 61502306, and the China Young
small (4,000 triangles) to retain sufficient features around
1000 Talents Program.
boundary. On this condition, boundary-aware simplification
is not helpful. We suggest a feasible larger ratio value, for
example, 0.2, to guarantee enough features for the network
A PPENDIX
to learn. To separate sticky teeth, we solve the following optimization
problem.
X X
arg min ξU (pi , li ) + λ ξS (pi , pj , li , lj ) (13)
{li ,i∈F }
i∈F i∈F ,j∈Ni
6 L IMITATIONS
The first term is defined as
− log(pi )

The proposed dental mesh segmentation method encom- , li is tooth 1
passes a few limitations. The first limitation is that when ξU (pi , li ) = (14)
− log(1 − pi ) , li is tooth 2
the boundary between two teeth are corrupted by the sim-
plification process, it will lead to an inaccurate prediction pi = γ(Pmax − Pmin ) + 0.5
which would be hard to fix even with the improved fuzzy
refinement. Thus we encourage a larger simplification ratio. (ci − C) · dˆ (15)
γ = max(min( , Pmax ), Pmin )
Second, the varying appearances of wisdom teeth on each |L|
dental mesh bring in a large portion of the inaccuracy. Some where ci is the barycenter of face i, and C is the barycenter
models keep two wisdom teeth there, while some just hold of the sticky-teeth. Constants are Pmin = 10−8 , Pmax = 1 −
one, and others lack both of them. Some wisdom teeth have Pmin , λ = 50. dˆ and |L| are the longest axis’s direction and
erupted completely, while others just show up partly. These length respectively. The second term is identical to Eqn. 9.
make the training step difficult as the amount of wisdom To further smooth the boundary, we apply a shortest-
teeth data is very small. As a result, the trained network path algorithm within a ring-like area (Fig.13), denoted as
may tend to mislabel it as gingiva. Another shortage is R, containing the current boundary, to further smooth the
the 2-level hierarchical network structure. Such structure is final optimal boundary. For edge v1 v2 , v1 , v2 ∈ R, we
suitable for imbalanced dental meshes, while also possesses denote the two faces that share v1 v2 as i, j . The weight of
the common deficiency of decision tree models. In other edge v1 v2 is
words, the higher level results will have a negative effect
on the lower level. Taking a wisdom tooth as an example, wv1 v2 = kv1 − v2 k2 + λw(αij )
if the teeth-gingiva classification regards it as gingiva, it 
1 + cos αij
will no longer take part in the inter-teeth classification step. η
 , αij < τ2 (16)
w(αij ) = 2
Under this circumstance, the following label optimization
and fuzzy refinement will not help.  1 + cos αij , else

2

1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2018.2839685, IEEE
Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. XX 11

(a) (b) (c) (d) (e)

Fig. 11. (a) Teeth-gingiva classification. (b) Inter-teeth classification. (c) Our results. (d) Another view of our results. (e) Ground truth.

where αij is the same as Eqn. 11. Constants are λ =


0.02, τ2 = π6 , η = 4. The equation agrees with the fact that
edges at groove are more likely to be the boundary. We solve
the problem with dynamic programming. Although there
are algorithms particularly designed for boundary curve
refinement [80], they are complicated to be implemented
and do not pay attention to any prior knowledge such as the
prediction results in our cases. Our simpler strategy works (a) (b)
well in our experiments.

9-class Prediction Anchor Graph Reconstruction


(c) (d)
100 9-class Prediction 500
Fig. 13. (a) The black curves represent a narrow ring. The red curve is
the initial route. (b) Find a short cut V of the ring and copy the nodes
50 250
500 into cut V 0 . (c) Expand the ring from the cut and get a strap. (d) Find the
100 shortest route from V to V 0 (the blue curve).
1000
250

2000
500
R EFERENCES
1000 (hidden nodes)
1000 (hidden nodes)
[1] K. Wu, L. Chen, J. Li, and Y. Zhou, “Tooth segmentation on dental
meshes using morphologic skeleton,” Computers and Graphics,
600-dimension Input 600-dimension Input vol. 38, pp. 199–211, 2014.
[2] S. M. Yamany and A. R. Elbialy, “Efficient free-form surface rep-
(a) (b) resentation with application in orthodontics,” Proceedings of SPIE,
vol. 3640, no. 1, pp. 115–124, 1999.
Fig. 12. (a) Neural network structure. (b) GR-DNN structure. [3] B. J. Zou, S. J. Liu, S. H. Liao, X. Ding, and Y. Liang, “Interactive
tooth partition of dental mesh base on tooth-target harmonic

1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2018.2839685, IEEE
Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. XX 12

field,” Computers in Biology and Medicine, vol. 56, no. C, pp. 132– [29] E. Kalogerakis, A. Hertzmann, and K. Singh, “Learning 3D
144, 2015. Mesh Segmentation and Labeling,” ACM Transactions on Graphics,
[4] M. Zhao, L. Ma, W. Tan, and D. Nie, “Interactive tooth segmen- vol. 29, no. 3, 2010.
tation of dental models,” in Engineering in Medicine and Biology [30] Y. Wang, S. Asafi, O. Van Kaick, H. Zhang, D. Cohenor, and
Society, 2005. IEEE-EMBS 2005. 27th Annual International Conference B. Chen, “Active co-analysis of a set of shapes,” ACM Transactions
of the. IEEE, 2006, pp. 654–657. on Graphics, vol. 31, no. 6, p. 165, 2012.
[5] K. Guo, D. Zou, and X. Chen, “3d mesh labeling via deep con- [31] H. Benhabiles, G. Lavoue, J. Vandeborre, and M. Daoudi, “Learn-
volutional neural networks,” Acm Transactions on Graphics, vol. 35, ing boundary edges for 3dmesh segmentation,” Computer Graphics
no. 1, pp. 1–12, 2015. Forum, vol. 30, no. 8, pp. 2170–2182, 2011.
[6] M. Attene, S. Katz, M. Mortara, G. Patane, M. Spagnuolo, and [32] T. Kondo, S. H. Ong, and K. W. C. Foong, “Tooth segmentation
A. Tal, “Mesh segmentation - a comparative study,” in IEEE of dental study models using range images,” IEEE Transactions on
International Conference on Shape Modeling and Applications, 2006, Medical Imaging, vol. 23, no. 3, pp. 350–362, 2004.
pp. 7–7. [33] M. Grzegorzek, M. Trierscheid, D. Papoutsis, and D. Paulus, “A
[7] A. Shamir, “A survey on mesh segmentation techniques,” Comput- multi-stage approach for 3d teeth segmentation from dentition
er Graphics Forum, vol. 27, no. 6, pp. 1539–1556, 2008. surfaces.” in ICISP. Springer, 2010, pp. 521–530.
[8] S. Shlafman, A. Tal, and S. Katz, “Metamorphosis of polyhedral
[34] N. Wongwaen and C. Sinthanayothin, “Computerized algorithm
surfaces using decomposition,” Computer Graphics Forum, vol. 21,
for 3d teeth segmentation,” in International Conference on Electronics
no. 3, pp. 219–228, 2002.
and Information Engineering, 2010, pp. V1–277–V1–280.
[9] G. Lavoue, F. Dupont, and A. Baskurt, “A new cad mesh seg-
[35] T. Yuan, W. Liao, N. Dai, X. Cheng, and Q. Yu, “Single-tooth
mentation method, based on curvature tensor analysis,” Computer-
modeling for 3d dental model,” International Journal of Biomedical
aided Design, vol. 37, no. 10, pp. 975–987, 2005.
Imaging, vol. 2010, no. 2010, pp. 1029–1034, 2010.
[10] S. Katz and A. Tal, Hierarchical mesh decomposition using fuzzy
clustering and cuts. ACM, 2003, vol. 22, no. 3. [36] Y. Kumar, R. Janardan, B. E. Larson, and J. Moon, “Improved
[11] M. Attene, B. Falcidieno, and M. Spagnuolo, “Hierarchical mesh segmentation of teeth in dental models,” Computer-aided Design
segmentation based on fitting primitives,” The Visual Computer, and Applications, vol. 8, no. 2, pp. 211–224, 2013.
vol. 22, no. 3, pp. 181–193, 2006. [37] T. Kronfeld, D. Brunner, and G. Brunnett, “Snake-based segmenta-
[12] A. P. Mangan and R. T. Whitaker, “Partitioning 3d surface meshes tion of teeth from virtual dental casts,” Computer-aided Design and
using watershed segmentation,” IEEE Transactions on Visualization Applications, vol. 7, no. 2, pp. 221–233, 2013.
and Computer Graphics, vol. 5, no. 4, pp. 308–321, 1999. [38] Z. Li, X. Ning, and Z. Wang, “A fast segmentation method for
[13] Y. Lai, S. Hu, R. R. Martin, and P. L. Rosin, “Fast mesh segmenta- stl teeth model,” in Complex Medical Engineering, 2007. CME 2007.
tion using random walks,” Statistical Methods and Applications, pp. IEEE/ICME International Conference on. IEEE, 2007, pp. 163–166.
183–191, 2008. [39] C. Sinthanayothin and W. Tharanont, “Orthodontics treatment
[14] A. Koschan, “Perception-based 3d triangle mesh segmentation simulation by teeth segmentation and setup,” in International
using fast marching watersheds,” in Computer Vision and Pattern Conference on Electrical Engineering/electronics, Computer, Telecommu-
Recognition, 2003. Proceedings. 2003 IEEE Computer Society Confer- nications and Information Technology, 2008. Ecti-Con, 2008, pp. 81–84.
ence on, vol. 2. IEEE, 2003, pp. II–II. [40] Y. Ma and Z. Li, “Computer aided orthodontics treatment by
[15] A. Golovinskiy and T. Funkhouser, “Randomized cuts for 3d virtual segmentation and adjustment,” in International Conference
mesh analysis,” international conference on computer graphics and on Image Analysis and Signal Processing, 2010, pp. 336–339.
interactive techniques, vol. 27, no. 5, p. 145, 2008. [41] K. Xu, V. G. Kim, Q. Huang, and E. Kalogerakis, “Datadriven
[16] S. Katz, G. Leifman, and A. Tal, “Mesh segmentation using feature shape analysis and processing,” Computer Graphics Forum, vol. 36,
point and core extraction,” The Visual Computer, vol. 21, no. 8, pp. no. 1, 2017.
649–658, 2005. [42] Z. Barutcuoglu and C. Decoro, “Hierarchical shape classification
[17] L. Shapira, A. Shamir, and D. Cohenor, “Consistent mesh parti- using bayesian aggregation,” in IEEE International Conference on
tioning and skeletonisation using the shape diameter function,” Shape Modeling and Applications, 2006, pp. 44–44.
The Visual Computer, vol. 24, no. 4, pp. 249–259, 2008. [43] J. W. H. Tangelder and R. C. Veltkamp, “A survey of content based
[18] Y. Lee, S. Lee, A. Shamir, D. Cohen-Or, and H. P. Seidel, “Intelli- 3d shape retrieval methods,” Multimedia Tools and Applications,
gent mesh scissoring using 3d snakes,” in Computer Graphics and vol. 39, no. 3, p. 441, 2008.
Applications, 2004. PG 2004. Proceedings. Pacific Conference on, 2004, [44] R. Litman, A. Bronstein, M. Bronstein, and U. Castellani, “Su-
pp. 279–287. pervised learning of bagoffeatures shape descriptors using sparse
[19] Y. Lee, S. Lee, A. Shamir, D. Cohenor, and H. Seidel, “Mesh coding,” Computer Graphics Forum, vol. 33, no. 5, pp. 127–136, 2014.
scissoring with minima rule and part salience,” Computer Aided [45] Q. Huang, F. Wang, and L. Guibas, “Functional map networks for
Geometric Design, vol. 22, no. 5, pp. 444–465, 2005. analyzing and exploring large shape collections,” Acm Transactions
[20] L. Fan, M. Meng, and L. Liu, “Sketch-based mesh cutting,” Graph- on Graphics, vol. 33, no. 4, pp. 1–11, 2014.
ical Models graphical Models and Image Processing computer Vision,
[46] O. Van Kaick, H. Zhang, G. Hamarneh, and D. CohenOr, “A sur-
Graphics, and Image Processing, vol. 74, no. 6, pp. 292–301, 2012.
vey on shape correspondence,” Computer Graphics Forum, vol. 30,
[21] M. Meng, L. Fan, and L. Liu, “A comparative evaluation of
no. 6, pp. 1681–1707, 2011.
foreground/background sketch-based mesh segmentation algo-
rithms,” Computers and Graphics, vol. 35, no. 3, pp. 650–660, 2011. [47] M. Ovsjanikov, M. Ben-Chen, J. Solomon, A. Butscher, and
[22] Z. Ji, L. Liu, Z. Chen, and G. Wang, “Easy mesh cutting,” Computer L. Guibas, “Functional maps: a flexible representation of maps
Graphics Forum, vol. 25, no. 3, pp. 283–291, 2006. between shapes,” Acm Transactions on Graphics, vol. 31, no. 4, p. 30,
2012.
[23] L. Fan, L. Liu, and K. Liu, “Paint mesh cutting,” Computer Graphics
Forum, vol. 30, no. 2, pp. 603–612, 2011. [48] X. Guo, J. Lin, K. Xu, and X. Jin, “Creature grammar for creative
[24] Y. Zheng, C. Tai, and O. K. Au, “Dot scissor: A single-click interface modeling of 3d monsters,” Graphical Models, vol. 76, no. 5, pp.
for mesh segmentation,” IEEE Transactions on Visualization and 376–389, 2014.
Computer Graphics, vol. 18, no. 8, pp. 1304–1312, 2012. [49] C. Cao, Q. Hou, and K. Zhou, “Displaced dynamic expression
[25] M. Meng, L. Fan, and L. Liu, “icutter: a direct cutout tool for 3d regression for real-time facial tracking and animation,” ACM
shapes,” Computer Animation and Virtual Worlds, vol. 22, no. 4, pp. Transactions on graphics (TOG), vol. 33, no. 4, p. 43, 2014.
335–342, 2011. [50] C. H. Shen, H. Fu, K. Chen, and S. M. Hu, “Structure recovery
[26] Y. Zheng and C. Tai, “Mesh decomposition with crossboundary by part assembly,” Acm Transactions on Graphics, vol. 31, no. 6, pp.
brushes,” Computer Graphics Forum, vol. 29, no. 2, pp. 527–535, 1–11, 2012.
2010. [51] X. Xie, K. Xu, N. J. Mitra, D. Cohen-Or, and B. Chen, “Sketch-to-
[27] O. K. Au, Y. Zheng, M. Chen, P. Xu, and C. Tai, “Mesh segmen- design: Context-based part assembly,” in Computer Graphics Forum,
tation with concavity-aware fields,” IEEE Transactions on Visualiza- 2013, p. 233245.
tion and Computer Graphics, vol. 18, no. 7, pp. 1125–1134, 2012. [52] S. Chaudhuri, E. Kalogerakis, S. Giguere, and T. Funkhouser,
[28] X. Chen, A. Golovinskiy, and T. Funkhouser, “A benchmark for 3d “Attribit: content creation with semantic attributes,” in Proceedings
mesh segmentation,” international conference on computer graphics of the 26th annual ACM symposium on User interface software and
and interactive techniques, vol. 28, no. 3, p. 73, 2009. technology. ACM, 2013, pp. 193–202.

1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2018.2839685, IEEE
Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. XX 13

[53] L. Fan, R. Wang, L. Xu, J. Deng, and L. Liu, “Modeling by drawing [77] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick,
with shadow guidance,” Computer Graphics Forum, vol. 32, no. 7, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture
pp. 157–166, 2013. for fast feature embedding,” in Proceedings of the 22nd ACM inter-
[54] Y. Wang, M. Gong, T. Wang, D. Cohen-Or, H. Zhang, and B. Chen, national conference on Multimedia. ACM, 2014, pp. 675–678.
“Projective analysis for 3d shape segmentation,” Acm Transactions [78] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based
on Graphics, vol. 32, no. 6, p. 192, 2013. learning applied to document recognition,” Proceedings of the IEEE,
[55] W. Xu, Z. Shi, M. Xu, K. Zhou, J. Wang, B. Zhou, J. Wang, vol. 86, no. 11, pp. 2278–2324, 1998.
and Z. Yuan, “Transductive 3d shape segmentation using sparse [79] S. Yang, L. Li, S. Wang, W. Zhang, and Q. Huang, “A graph
reconstruction,” Computer Graphics Forum, vol. 33, no. 5, pp. 107– regularized deep neural network for unsupervised image rep-
115, 2014. resentation learning,” in Computer Vision and Pattern Recognition,
[56] Z. Xie, K. Xu, L. Liu, and Y. Xiong, “3d shape segmentation and 2017, pp. 7053–7061.
labeling via extreme learning machine,” Computer Graphics Forum, [80] L. Kaplansky and A. Tal, “Mesh segmentation refinement,” in
vol. 33, no. 5, pp. 85–95, 2014. Computer Graphics Forum, vol. 28, no. 7. Wiley Online Library,
[57] J. Lv, X. Chen, J. Huang, and H. Bao, “Semi-supervised mesh 2009, pp. 1995–2003.
segmentation and labeling,” Computer Graphics Forum, vol. 31, no.
7pt2, pp. 2241–2248, 2012.
[58] V. G. Kim, W. Li, N. J. Mitra, S. Chaudhuri, S. Diverdi, and
T. Funkhouser, “Learning part-based templates from large collec-
tions of 3d shapes,” Acm Transactions on Graphics, vol. 32, no. 4,
p. 70, 2013.
[59] O. Sidi, O. Van Kaick, Y. Kleiman, H. Zhang, and D. Cohen-Or,
“Unsupervised co-segmentation of a set of shapes via descriptor-
space spectral clustering,” in SIGGRAPH Asia Conference, 2011, p.
126. Xiaojie Xu is a postgraduate student at the
[60] R. Hu, L. Fan, and L. Liu, “Cosegmentation of 3d shapes via School of Information Science and Technolo-
subspace clustering,” Computer Graphics Forum, vol. 31, no. 5, pp. gy(SIST), ShanghaiTech University. He obtained
1703–1713, 2012. his B.E. from College of Information Science
[61] P. Shilane, P. Min, M. Kazhdan, and T. Funkhouser, “The princeton & Electronic Engineering at Zhejiang Universi-
shape benchmark,” in Shape Modeling International, 2004, pp. 167– ty. His research interests include geometry seg-
178. mentation, deep learning, and virtual reality.
[62] J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio,
R. Moore, A. Kipman, and A. Blake, “Real-time human pose
recognition in parts from single depth images,” in IEEE Conference
on Computer Vision and Pattern Recognition, 2011, pp. 1297–1304.
[63] K. Xu, V. G. Kim, Q. Huang, and E. Kalogerakis, “Datadriven
shape analysis and processing,” in SIGGRAPH Asia, 2015, p. 4.
[64] Y. Bengio et al., “Learning deep architectures for ai,” Foundations
and trends R in Machine Learning, vol. 2, no. 1, pp. 1–127, 2009.
[65] G. Hinton, “A practical guide to training restricted boltzmann
machines,” Momentum, vol. 9, no. 1, p. 926, 2010.
[66] C. Farabet, C. Couprie, L. Najman, and Y. Lecun, “Learning hier-
archical features for scene labeling,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1915–29, 2012. Chang Liu is a postgraduate student at the
[67] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classifi- School of Information Science and Technol-
cation with deep convolutional neural networks,” in International ogy(SIST), ShanghaiTech University. She ob-
Conference on Neural Information Processing Systems, 2012, pp. 1097– tained her B.E. from the School of Electronic
1105. Engineering at Xidian University. Her research
[68] H. Maron, M. Galun, N. Aigerman, M. Trope, N. Dym, E. Yumer, interests include 3D object reconstruction, deep
V. G. Kim, and Y. Lipman, “Convolutional neural networks on learning on image classification.
surfaces via seamless toric covers,” SIGGRAPH, 2017.
[69] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional net-
works for semantic segmentation,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 39, no. 4, pp. 640–651, 2014.
[70] P.-S. Wang, Y. Liu, Y.-X. Guo, S. Chun-Yu, and X. Tong, “O-
cnn: Octree-based convolutional neural networks for 3d shape
analysis,” ACM Transactions on Graphics (SIGGRAPH), vol. 36,
no. 4, 2017.
[71] H. Hoppe, “Progressive meshes,” in Proceedings of the 23rd annual
conference on Computer graphics and interactive techniques. ACM,
1996, pp. 99–108.
[72] G. Ran and D. Cohen-Or, “Salient geometric features for partial
shape matching and similarity,” Acm Transactions on Graphics, Youyi Zheng is a Researcher (PI) at the State
vol. 25, no. 1, pp. 130–150, 2006. Key Lab of CAD&CG, College of Computer Sci-
[73] S. Belongie, J. Malik, and J. Puzicha, “Shape matching and object ence, Zhejiang University. He obtained his PhD
recognition using shape contexts. ieee trans pami,” IEEE Trans- from the Department of Computer Science and
actions on Pattern Analysis and Machine Intelligence, vol. 24, no. 4, Engineering at Hong Kong University of Science
2002. & Technology, and his M.Sc. and B.Sc. degrees
[74] L. Shapira, S. Shalom, A. Shamir, D. Cohen-Or, and H. Zhang, from the Department of Mathematics, Zhejiang
“Contextual part analogies in 3d objects,” International Journal of University. His research interests include geo-
Computer Vision, vol. 89, no. 2-3, pp. 309–326, 2010. metric modeling, imaging, and human-computer
[75] A. E. Johnson and M. Hebert, “Using spin images for efficient interaction.
object recognition in cluttered 3d scenes,” IEEE Transactions on
pattern analysis and machine intelligence, vol. 21, no. 5, pp. 433–449,
1999.
[76] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy
minimization via graph cuts,” IEEE Transactions on pattern analysis
and machine intelligence, vol. 23, no. 11, pp. 1222–1239, 2001.

1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

You might also like