Automatic_Thyroid_Ultrasound_Image_Segmentation_Based_on_U-shaped_Network
Automatic_Thyroid_Ultrasound_Image_Segmentation_Based_on_U-shaped_Network
Abstract—Automatic tumor segmentation of thyroid ultrasound we need to improve the existing depth learning model to fit for
image is quite challenging due to the poor image quality. medical images.
Recently the U-shaped network, especially U-Net, has achieved
good results in medical image segmentation. In this paper, we To solve the above problem, this paper proposed an
proposed a modified U-Net model (ReAgU-Net), which embedded improved U-shaped model. On the basis of U-Net, residual
the improved residual units into the skip connection among the substructures and attention gates are embedded in the jump
encoding and decoding path and introduce the attention gate connection to narrow the semantic gap between shallow and
mechanism to multiply the weight feature maps obtained from deep features. In addition, considering the small object in
shallow layers and deep layers. Also, a hyperparameter is medical images, this paper improves the loss function so that
introduced to combine Focal-Tversky Loss, Dice Loss and Cross- the model can promote the sensitivity while retaining the
entropy Loss to jointly guide the model optimization process. The attention of overlap.
experimental results demonstrate that the proposed approach
outperforms the other U-shaped models.
II. MATERIALS AND METHODS
Keywords-thyroid ultrasound image; automatic segmentation;
U-Net; ReAgU-Net A. Patients and imaging acquisition
192 patients (148 females, mean age 46.31±9.79 years;
I. INTRODUCTION range 11~67 years; and 44 males, mean age 54.9±11.7 years,
range 29~81 years) and totally 1936 images were evaluated in
Thyroid nodules are one of the most common diseases in the study. The mean size of the nodules was 1.74cm (range
adults. Although most nodules are benign, in recent years, the 0.77~2.64cm).
thyroid cancer has increased quickly. In the statistics of global
cancer population in 2018, the incidence and mortality of Ultrasonography (US) acquisition was performed with the
thyroid cancer rank ninth and sixth respectively [1]. The HITACHI Vision 900, HIVISION Preirus (Hitachi Medical
computer-aided diagnosis (CAD) system can describe the System, Tokyo, Japan) and Siemens S2000 (Siemens Medical
nodules objectively and quantitatively, eliminate the Solutions) equipped with a liner probe with central frequency
subjectivity of doctors, and provide a useful reference for of 7.5~14.0MHz. All the examinations were conducted by two
doctors. Automatic thyroid ultrasound image segmentation is a experienced sonographers and all the nodules used in this study
key step in CAD and also is very challenging due to low were delineated by them as the ground truth. Both of them have
contrast, speckle noise, weak boundary and artifacts. more than 6 years’ experience. The sample images were shown
in Fig.1.
With the great success in natural scene image analysis,
more and more deep learning methods have been applied to
medical image segmentation [2-8], including thyroid
ultrasound image segmentation [9-11]. The major strategy of
the approaches is to apply Convolutional Neural Network
(CNN) to encoding the image and upsample the deep features
to decoding the image. Although some results have been
achieved, there are still some problems to be studied in depth.
Natural scene images are easy access, easy labeling and
have large data sets. The deep learning model designed for
them is usually having deep hierarchy and large parameters.
While medical images are difficult to obtain and label, and also
are small data sets. Therefore, the direct application of the
existing deep learning model will make the training data
distributed in the space sparsely. This will bring the over-fitting
and affect the generalization ability of the model. Therefore, Figure 1. Sample images from dataset
Authorized licensed use limited to: China University of Geosciences Wuhan Campus. Downloaded on December 10,2024 at 11:59:55 UTC from IEEE Xplore. Restrictions apply.
978-1-7281-4852-6/19/$31.00 ©2019 IEEE
B. Embedding residual unit in skip connections In the task of object detection, Lin et al. [14] compared
The prototype of residual unit [12] is shown in Fig.2(a). single-stage and multi-stage detection methods, it is found that
The structure of the new residual learning unit proposed in this single-stage detector is often faster and simpler, but its
paper is shown in Fig.2(b), which reduces the parameters while accuracy lags behind cascade detector. The research shows that
increasing the depth of the model. It can effectively combine the most important problem is the serious imbalance of
the semantic level features from the encoder with the abstract foreground-background categories. In this paper, a single-stage
features from the decoder, thus solving the semantic gap to a segmentation method is used, so Focal-Tversky loss is
certain extent. introduced. And the three loss (Cross Entropy, Dice and Focal-
Tversky) are combined together as equation (1).
Lc = ε *FTLc + (1 − ε ) * ( CEc + DLc ) (1)
E. Training ReAgU-Net
On the basis of U-Net, residual unit and attention gates are
embedded in the skip connection to narrow the semantic gap
between shallow and deep features. The whole network
structure of our model ReAgU-Net is shown in Fig.4.
Authorized licensed use limited to: China University of Geosciences Wuhan Campus. Downloaded on December 10,2024 at 11:59:55 UTC from IEEE Xplore. Restrictions apply.
Require: : Learning rate
Table 1. Comparison of segmentation results
1 ∈ 0,1): Exponential decay rate for 1 st moment estimate model mIoU DSC Precision Recall
∈ 0,1): Exponential decay rate for 2nd moment estimate
2 U-Net 0.722 0.820 0.829 0.811
: Infinite decimal UNet++ 0.765 0.854 0.872 0.837
: Batch size ReAgU-Net 0.788 0.869 0.873 0.865
( ): Stochastic objective function with parameters
Require: 0: Initial parameter vector From the table, we can see that ReAgU-Net model has
0 ← 0 (Initialize 1st moment vector) 6.6%, 4.9%, 4.4% and 5.4% improvement in mIoU, DSC,
0 ← 0 (Initialize 2nd moment vector) Precision and Recall compared with U-Net model, and 2.3%,
← 0 (Initialize timestep) 1.5%, 0.1% and 2.8% improvement compared with UNet+.
This shows that ReAgU-Net model can recognize the location
1 while not converged do
and contour of nodules better and has higher accuracy.
2 ← +1
3 , , = 1,2, … , (Randomly selecting m pairs with and ) Some results are shown in Fig. 6.
4 ← ( ; ), = 1,2, … , (Computing segmented image for )
5 ← ( −1 ) (Get gradients w.r.t stochastic objective at timestep t)
6 ← 1 ∗ −1 + (1 − 1) ∗ (Update biased 1st moment estimate)
2
7 ← 2∗ −1 + (1 − 2) ∗ (Update biased 2 nd raw moment estimate)
′
8 ← /(1 − 1 ) (Compute bias-corrected 1st moment estimate)
′
9 ← /(1 − 2 ) (Compute bias-corrected 2nd raw moment estimate)
′ ′
10 ← −1 − ∗ /( + ) (Update parameters)
11 end while
12 return (Resulting parameters)
III. RESULTS
A. Evaluation Criteria
In this paper, four commonly used indexes are used to Original Image U-Net ReAgU-Net U-Net++ Ground Truth
evaluate the segmentation algorithm. They are mean Figure 5. Training the ReAgU-Net
Intersection over Union (mIoU), Dice Similarity Coefficient
(DSC), Precision (Precision) and Recall (Recall). The In order to better demonstrate the effectiveness of each
computation of the indexes is shown in equation (2) – (5). improvement point and compare their roles in the model, we
add each improvement point to the U-Net model in turn to
1 c TPi (2) compare their contributions. The results are shown in Table 2.
mIoU =
c i TPi + FN i + FPi Table 2. Segmentation Results of Different Improvement Points
2* TP (3) mdel mIoU DSC Precision Recall
DSC =
2* TP + FP + FN U-Net 0.722 0.820 0.829 0.811
U-Net + AG 0.754 0.844 0.838 0.850
TP (4) U-Net + R-RB 0.787 0.867 0.871 0.863
Precision =
TP + FP U-Net + AG + R-RB
0.766 0.851 0.858 0.844
+ Dice loss
TP (5) ReAgU-Net 0.788 0.869 0.873 0.865
Recall =
TP + FN
where c is the class (foreground or background), TP (True From the table, we can see that compared with the attention
Positive), TN (True Negative), FP (False Positive) an FN gate mechanism, the improved residual units contributes
(False Negative) are refer to pixels predicted by the algorithm. greatly to the performance and compared with the Dice loss,
the loss function proposed in this paper can further improve the
B. Exprimental Results performance.
The dataset was divided into training set, verification set
and test set according to the ratio of 7:2:1. The training set is C. Performance comparison under different data sets
used to train the model iteratively, and the convergence of the In order to test the generalization ability of the model, in
model is judged by the error of verification set. Then, the addition to the data provided by the Affiliated Hospital of
trained model is used to segment the test set. The segmentation Qingdao University, an open dataset of 428 thyroid ultrasound
performance is shown in Table 1. images from DDTI (Digital Database Thyroid Image) [16] was
used. These images are collected with TOSHIBA Nemio 30
Authorized licensed use limited to: China University of Geosciences Wuhan Campus. Downloaded on December 10,2024 at 11:59:55 UTC from IEEE Xplore. Restrictions apply.
and TOSHIBA Nemio MX. The frequency of linear detector is information loss caused by increasing the horizontal depth of
12MHz. The location of the nodule in each image is recorded the network. (3) Combining the advantages of Dice loss, cross-
by an XML file. So, the corresponding mask image can be entropy loss and Focal-Tversky loss, a new loss function is
obtained by parsing the XML file. designed to effectively solve the imbalance of foreground and
background categories in medical image segmentation.
Some examples are shown in Fig. 6. Experiments show that ReAgU-Net has 6.6%, 4.9%, 4.4% and
5.4% improvement in mIoU, DSC, precision and recall
compared with U-Net, and it also has outstanding performance
in different data sets.
But automatic segmentation of thyroid ultrasound image is
a very challenging task. There are still some images which
cannot segment well by the algorithm. Some examples are
shown no Fig. 7.
Authorized licensed use limited to: China University of Geosciences Wuhan Campus. Downloaded on December 10,2024 at 11:59:55 UTC from IEEE Xplore. Restrictions apply.
REFERENCES [9] H. Li, J. Weng, Y. Shi, et al. “An improved deep learning approach for
detection of thyroid papillary cancer in ultrasound images” Scientific
[1] B. Freddie, F. Jacques, S. Isabelle, et al. ‘‘Global Cancer Statistics 2018: Reports, vol. 8, pp. 1-12, 2018.
GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36
Cancers in 185 Countries.’’ CA Cancer J Clin, vol. 68, pp. 394-424, [10] P. Poudel, A. Illanes, D. Sheet, et al. “Evaluation of Commonly Used
2018. Algorithms for Thyroid Ultrasound Images Segmentation and
Improvement Using Machine Learning Approaches” Journal of
[2] O. Ronneberger, P. Fischer, T. Brox. “U-Net: Convolutional Networks Healthcare Engineering, pp. 1-13, 2018.
for Biomedical Image Segmentation,” International Conference on
Medical image computing and computer-assisted intervention. Springer, [11] J. Ma, F. Wu, T. Jiang, et al. “Ultrasound image-based thyroid nodule
Cham, pp. 234-241, 2015. automatic segmentation using convolutional neural networks”
International Journal of Computer Assisted Radiology and Surgery, vol.
[3] J. Long, E. Shelhamer and T. Darrell. “Fully Convolutional Networks 12, pp. 1895-1910, 2017.
for Semantic Segmentation” Proceedings of the IEEE conference on
computer vision and pattern recognition. pp. 3431-3440, 2015. [12] K. He, X. Zhang, S. Ren, et al. “Deep Residual Learning for Image
Recognition” Proceedings of the IEEE Conference on Computer Vision
[4] S. Zheng, S. Jayasumana, B. Romeraparedes, et al. “Conditional and Pattern Recognition, pp. 770-778, 2016.
Random Fields as Recurrent Neural Networks” Proceedings of the IEEE
international conference on computer vision. pp. 1529-1537, 2015. [13] J. Fu, H. Zheng and T. Mei. “Look Closer to See Better: Recurrent
Attention Convolutional Neural Network for Fine-grained Image
[5] M. Alom, M. Hasan, C. Yakopcic, et al. “Recurrent Residual Recognition” Proceedings of the IEEE Conference on Computer Vision
Convolutional Neural Network based on U-Net (R2U-Net) for Medical and Pattern Recognition, pp. 4438-4446, 2017.
Image Segmentation,” arXiv preprint arXiv:1802.06955, 2018.
[14] T. Lin, P. Goyal, R. Girshick, et al. “Focal Loss for Dense Object
[6] Z. Zhou, M. Siddiquee, N. Tajbakhsh, et al. “UNet++: A Nested U-Net Detection” Proceedings of the IEEE International Conference on
Architecture for Medical Image Segmentation” Deep Learning in Computer Vision, pp. 2980-2988, 2017.
Medical Image Analysis and Multimodal Learning for Clinical Decision
Support. Springer, Cham, pp. 3-11, 2018. [15] D. Kingma, J. Ba. “Adam: A Method for Stochastic Optimization” arXiv
preprint arXiv:1412.6980, 2014.
[7] Y. Xue, T. Xu, H. Zhang, et al. “SegAN: Adversarial Network with
Multi-scale L1 Loss for Medical Image Segmentation” [16] L. Pedraza, C. Vargas C , Fabián Narváez, et al. “An open access thyroid
Neuroinformatics, vol. 16, pp. 383-392, 2018. ultrasound image database” 10th International Symposium on Medical
Information Processing and Analysis. International Society for Optics
[8] M. Rezaei, K. Harmuth, W. Gierke, et al. “A Conditional Adversarial and Photonics, 2015.
Network for Semantic Segmentation of Brain Tumor” International
MICCAI Brainlesion Workshop. Springer, Cham, pp. 241-252, 2017.
Authorized licensed use limited to: China University of Geosciences Wuhan Campus. Downloaded on December 10,2024 at 11:59:55 UTC from IEEE Xplore. Restrictions apply.