0% found this document useful (0 votes)
4 views

AF-SEG An Annotation-Free Approach For Image Segmentation by Self-Supervision and Generative Adversarial Network

Uploaded by

Saikat Das
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

AF-SEG An Annotation-Free Approach For Image Segmentation by Self-Supervision and Generative Adversarial Network

Uploaded by

Saikat Das
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI)

April 3-7, 2020, Iowa City, Iowa, USA

AF-SEG: AN ANNOTATION-FREE APPROACH FOR IMAGE SEGMENTATION BY


SELF-SUPERVISION AND GENERATIVE ADVERSARIAL NETWORK

Fei Yu1∗ , Hexin Dong1∗ , Mo Zhang1 , Jie Zhao2 , Bin Dong3,1,2 , Quanzheng Li4,2 , Li Zhang1,2
1
Center for Data Science, Peking University, Beijing, China;
2
Center for Data Science in Health and Medicine, Peking University, Beijing, China;
3
Beijing International Center for Mathematical Research (BICMR), Peking University, Beijing, China;
4
MGH/BWH Center for Clinical Data Science, Boston, MA 02115, USA.

These authors contributed equally.

ABSTRACT tain more satisfactory segmentation results, researchers often


train the segmentation models in a supervised fashion.
Traditional segmentation methods are annotation-free but
In recent years, convolutional neural network (CNN) has
usually produce unsatisfactory results. The latest lead-
made great progress in supervised semantic segmentation [3,
ing deep learning methods improve the results but require
4, 5]. Such segmentation methods can produce highly re-
expensive and time-consuming pixel-level manual annota-
liable and accurate results after learning a large number of
tions. In this work, we propose a novel method based on
manual pixel-level annotations. However, pixel-level annota-
self-supervision and generative adversarial network (GAN),
tions are usually time-consuming, frustrating and even infea-
which has high performance and requires no manual annota-
sible, especially in the field of medical image segmentation.
tions. First, we perform traditional segmentation methods to
To solve this problem, researchers have developed a variety
obtain coarse segmentation. Then, we use GAN to generate a
of weakly supervised learning methods, expecting to use less
synthetic image, on which the image foreground is pixel-to-
complex annotations to train segmentation models. Dai et al.
pixel corresponding to the coarse segmentation. Finally, we
exploit bounding boxes to supervise convolutional networks
train the segmentation model with the data pairs of synthetic
for semantic segmentation [6]. Lin et al. propose to segment
images and coarse segmentations. We evaluate our method
images with semantic scribbles [7]. In addition, Wei et al.
on two types of segmentation tasks, including red blood
demonstrate the validity of an adversarial erasing approach
cell (RBC) segmentation on microscope images and vessel
using only image-level annotations [8]. Recently, Yu et al. re-
segmentation on digital subtraction angiographies (DSA).
port an annotation-free segmentation method for coronary ar-
The results show that our annotation-free method provides
teries on DSA images [9]. The method uses shape consistent
a considerable improvement over the traditional methods
generative adversarial network (SC-GAN) to generate the im-
and achieves comparable accuracies with fully supervised
age foreground and the image background separately, which
methods.
are later combined to train the segmentation model more ef-
Index Terms— Image segmentation, Generative adver- fectively.
sarial network, Annotation free, Deep learning Although SC-GAN achieves an annotation-free coronary
artery segmentation, it has two major limitations. First, SC-
GAN needs an auxiliary labeled dataset (such as Fundus
1. INTRODUCTION
images in [9]) as the source domain data in the process of
With the increasing accumulation of digital medical images, knowledge transfer. Second, to train the segmentation model,
automated segmentation has become one of the most impor- it requires synthesizing the data with additional images that
tant needs to quantitatively analyze the large-scale medical have clean backgrounds which are often difficult to obtain. In
image datasets. Traditionally, researchers have proposed vari- this work, inspired by SC-GAN, we propose a more general
ous automated segmentation models, including threshold seg- framework of annotation-free segmentation, which does not
mentation, Level set [1], and Hessian analysis for vessel seg- require any labeled data or additional clean-background im-
mentation [2]. The advantage of these methods is the avoid- ages. We call this new framework AF-SEG, and the details
ance of massive manual interactions. However, such models of the method will be introduced in the next section. We
are difficult to achieve accurate segmentation results. To ob- evaluate the performance of the proposed method on sickle
cell disease (SCD) RBC dataset and DSA dataset by several
Corresponding author: Li Zhang, zhangli [email protected] metrics. Qualitative and quantitative results show that our

978-1-5386-9330-8/20/$31.00 ©2020 IEEE 1503


Authorized licensed use limited to: KIIT University. Downloaded on June 07,2024 at 08:56:17 UTC from IEEE Xplore. Restrictions apply.
Training

Original Image

Coarse Label
Original input image: 512x512
Segmentor rate

1. Coarse 2. Background Early Stopping


segmentation generating

Original Image

Prediction
Segmentor Top λ*rate
3. Corresponding image
synthesis and segmentation Test

Mask results
Background
Inpainting
Segmentation result: 512x512

Fig. 1. The flowchart of the proposed method Fig. 2. The illustration of the background generating method.
The mask (green circles) produced by our method are larger
method provides a considerable improvement over the tra- in order to mask-out entire false negatives.
ditional methods and achieves comparable accuracies with
fully supervised methods. Discriminator

2. METHOD
Real Generator Fake Back Back Real*
In this work, we first perform traditional annotation-free
methods to obtain coarse segmentations. We then synthesize
an image corresponding to the coarse segmentation using Object Shape loss Object Segmentor Weak
GAN. Finally, we use the coarse segmentation as a pixel-
level annotation of the synthetic image, and supervisely train
a high-quality segmentation model. Inspired by SC-GAN,
Fig. 3. The illustration of corresponding image synthesis and
the foreground and background of the synthetic image are
segmentation method.
generated separately, and then the two are fused, which en-
sures that the foreground of the image is consistent with the
coarse segmentation. The flow chart of our method is shown 2.1 (to remove the false negatives). As shown in Figure 2, to
in Figure 1, and each step will be described in detail below. perform DSP, we train a simple segmentation network, whose
input image is the original image and the label is the corre-
2.1. Coarse Segmentation sponding coarse segmentation. DSP believes that the model
will detect all label-like structures in the early stage of the
We first use the traditional annotation-free method to generate
training process. Therefore, we apply the early-stop strategy
coarse segmentation results. For example, in cell segmenta-
during training. The stop time of the training is controlled
tion, we adopt the classic Level set method. In vessel seg-
by a hyper-parameter λ, which represents the multiple of the
mentation, Hessian analysis is also a good choice. Since the
size of the coarse segmentation that we want the final predic-
coarse segmentation obtained in this step will be used as the
tion to be. In this experiment, we simply set λ = 2. Using
annotation of the later supervised learning, we apply a mor-
the results of DSP, we can effectively mask-out the false neg-
phological erosion operation to reduce the false positive rate
atives in the background. Finally, we use image inpainting
of the coarse segmentation.
techniques, such as DIP, to generate clean (object-free) back-
ground images.
2.2. Background Generating
In this step, we will remove the false negative response left 2.3. Corresponding Image synthesis and segmentation
by the previous step and generate a clean background image
for final synthesis. Following the idea of Deep Image Prior We integrate image synthesis and segmentation into an end-
(DIP), we report a new strategy called Deep Segmentation to-end model. Figure 3 shows three major components of the
Prior (DSP) to enrich the coarse segmentation from Section model, including the generator, the discriminator and the seg-

1504
Authorized licensed use limited to: KIIT University. Downloaded on June 07,2024 at 08:56:17 UTC from IEEE Xplore. Restrictions apply.
menter. In the following text, we will explain the meanings of Cell dataset Vessel dataset
the terms in Figure 3, such as Real, Fake, Real*, and etc.
The generator (G) uses U-Net as its network backbone,
whose input is an original image (Real) and output is a syn-
thetic image (Fake). We introduce a shape consistent loss (L1
loss) to ensure that Real and Fake are consistent within the
coarse segmentation (Weak).

Lshape (G) = Ex∼pdata (x) [||label ∗ (G(x) − x)||1 ] (1)

where label represents the coarse segmentation (Weak).


The function of the discriminator (D) is to ensure that the
synthetic image can have a clean background outside of the
coarse segmentation (Weak). Firstly, we use the coarse seg-
mentation to get the background region of the synthetic image a b c d
(Fake) and the clean background image (Real*) generated in
Section 2.2,
Fig. 4. The results of background synthesis. (a) Original cell
images, (b) Synthetic cell background, (c) Original vessel im-
F akebg = ¬label ∗ G(x), Realbg = ¬label ∗ Real∗ (2) ages, (d) Synthetic vessel background
Using the adversarial training, the discriminator will re-
move the possible objects in the background of Fake, and the (50%), validation set (20%) and test set (30%). Finally, we
adversarial loss can be expressed as, perform grayscale transform and randomly choose 256×256
patches as the inputs of our model.
LGAN (G, D) = Ex∼pdata (x) [logD(Realbg )]
(3) In all experiments, we use the Adam optimizer [12] with
+ Ez∼pdata (z) [log(1 − D(F akebg ))] a learning rate of 2e-4. The learning rate remains constant for
the first 50 epochs and then linearly decreases till 0 for an-
The network architecture of the segmentor (S) uses the
other 50 epochs. In addition, the network defined in Figure 3
classical U-Net, whose objective function is,
uses instance normalization [13] instead of batch normaliza-
Lseg (S) = −[ylog ŷ + (1 − y)log(1 − ŷ)] (4) tion [14], LeakyReLU instead of ReLU.

where ŷ is the prediction and y the Weak label.


3.2. Background and Image Synthesis Results
Finally, the overall objective function is,
Figure 4 shows some examples of the background synthesis.
L(G, D, S) = LGAN (G, D) + Lseg (S) + µLshape (G) (5) We can find that the synthetic background is very realistic
and almost all the objects are eliminated, which just proves
where µ is a hyper-parameter and we set µ = 50 in our ex-
the effectiveness of our background generating method.
periments.
Figure 5 shows some examples of the corresponding im-
age synthesis. We can see that the synthetic images gener-
3. EXPERIMENTS AND RESULTS ated not only have a realistic background but also preserve
the main object structures corresponding to the coarse labels.
3.1. Data and experiment details
In this section, to evaluate the generalization and effective- 3.3. Image Segmentation Results
ness of our method, we perform experiments on two datasets:
1) The SCD RBC dataset. Referring to [10], we use 308 raw Table 4 compares the quantitative results of different meth-
microscope images of 5 different SCD patients. The raw im- ods on two datasets. The traditional methods (Level set and
age resolution is 1920×1080. Following [10], we prepro- Hessian Analysis) have Dice scores of 0.839±0.070 and
cess them by removing two-side margins and resizing them 0.636±0.046 respectively, which are the baseline results.
into 512×512. 2) The DSA dataset. Referring to [9], we The proposed AF-SEG has higher Dice scores (0.915±0.029
use 1092 coronary angiographies as our experimental data. and 0.820±0.028), which demonstrates its effectiveness.
Following [9], we preprocess them by median-filtering and The Supervised U-Net has Dice scores of 0.967±0.011 and
contrast-limited adaptive histogram equalization [11]. In ad- 0.913±0.017 respectively, which is trained with manual an-
dition, both two datasets are randomly split into training set notations. Figure 6 shows some results of image segmenta-

1505
Authorized licensed use limited to: KIIT University. Downloaded on June 07,2024 at 08:56:17 UTC from IEEE Xplore. Restrictions apply.
Table 1. The quantitative results of image segmentation on two datasets. The Level set [1], and Hessian Analysis [2] present
the traditional annotation-free methods. The Supervised U-Net is trained with manual annotations, which should be the upper
bound of our method.
Datasets Cell dataset Vessel dataset
Annotation-free X X X X
Methods Level Set U-Net Proposed Hessian Analysis U-Net Proposed
Accuracy 0.982±0.016 0.996±0.003 0.990±0.007 0.927±0.014 0.976±0.007 0.953±0.010
Precision 0.894±0.072 0.961±0.018 0.906±0.029 0.943±0.057 0.892±0.025 0.833±0.036
Recall 0.806±0.124 0.974±0.012 0.926±0.055 0.481±0.048 0.937±0.021 0.810±0.047
Dice 0.839±0.070 0.967±0.011 0.915±0.029 0.636±0.046 0.913±0.017 0.820±0.028

Cell dataset Vessel dataset


Cell dataset Vessel dataset

a
a

b
b

c
c

Fig. 5. The results of corresponding image synthesis. (a) d


Original images, (b) Weak labels, (c) Synthetic images

tion. Compared to traditional methods, the proposed method


shows more accurate segmentation. Fig. 6. The results of image segmentation. (a) Original
images, (b) Traditional methods, (c) Proposed methods, (d)
4. CONCLUSION Groundtruth.

In this paper, we propose an annotation-free segmentation


of image details. In the future, improvement of the method
method that improves the segmentation accuracy of tradi-
along this dimension could be fruitful.
tional methods. Using adversarial training, we effectively
synthesize a target image corresponding to the traditional
coarse segmentation. The synthetic images and the coarse 5. ACKNOWLEDGMENTS.
segmentation allow us to better train segmentation models.
The experimental results in both cell and vessel segmentation This work was supported by Natural Science Foundation of
tasks show that our method could obtain a significant im- China (NSFC) under Grants 81801778, 71704024, 11831002;
provement compared to traditional methods, demonstrating National Key R&D Program of China (No. 2018YFC0910700);
the good generalization and effectiveness of our method. Of Beijing Natural Science Foundation (Z180001).
course, our method has limitations, despite the fact that it
requires no manual labeling and meets the requirements in 6. REFERENCES
a wide range of medical quantitative analysis. For example,
our method is slightly inferior to the fully supervised methods [1] Chunming Li, Rui Huang, Zhaohua Ding, J Chris
in some applications that require additionally high accuracy Gatenby, Dimitris N Metaxas, and John C Gore, “A

1506
Authorized licensed use limited to: KIIT University. Downloaded on June 07,2024 at 08:56:17 UTC from IEEE Xplore. Restrictions apply.
level set method for image segmentation in the presence [11] Karel Zuiderveld, “Contrast limited adaptive histogram
of intensity inhomogeneities with application to mri,” equalization,” in Graphics gems IV. Academic Press
IEEE transactions on image processing, vol. 20, no. 7, Professional, Inc., 1994, pp. 474–485.
pp. 2007–2016, 2011.
[12] Diederik P Kingma and Jimmy Ba, “Adam: A
[2] Alejandro F Frangi, Wiro J Niessen, Koen L Vincken, method for stochastic optimization,” arXiv preprint
and Max A Viergever, “Multiscale vessel enhance- arXiv:1412.6980, 2014.
ment filtering,” in International conference on med-
ical image computing and computer-assisted interven- [13] Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky,
tion. Springer, 1998, pp. 130–137. “Instance normalization: The missing ingredient for fast
stylization,” arXiv preprint arXiv:1607.08022, 2016.
[3] Jonathan Long, Evan Shelhamer, and Trevor Darrell,
“Fully convolutional networks for semantic segmenta- [14] Sergey Ioffe and Christian Szegedy, “Batch nor-
tion,” in Proceedings of the IEEE conference on com- malization: Accelerating deep network training by
puter vision and pattern recognition, 2015, pp. 3431– reducing internal covariate shift,” arXiv preprint
3440. arXiv:1502.03167, 2015.

[4] Olaf Ronneberger, Philipp Fischer, and Thomas Brox,


“U-net: Convolutional networks for biomedical image
segmentation,” in International Conference on Med-
ical image computing and computer-assisted interven-
tion. Springer, 2015, pp. 234–241.
[5] Liang-Chieh Chen, George Papandreou, Florian
Schroff, and Hartwig Adam, “Rethinking atrous
convolution for semantic image segmentation,” arXiv
preprint arXiv:1706.05587, 2017.
[6] Jifeng Dai, Kaiming He, and Jian Sun, “Boxsup: Ex-
ploiting bounding boxes to supervise convolutional net-
works for semantic segmentation,” in Proceedings of
the IEEE International Conference on Computer Vision,
2015, pp. 1635–1643.
[7] Di Lin, Jifeng Dai, Jiaya Jia, Kaiming He, and Jian
Sun, “Scribblesup: Scribble-supervised convolutional
networks for semantic segmentation,” in Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition, 2016, pp. 3159–3167.
[8] Yunchao Wei, Huaxin Xiao, Honghui Shi, Zequn Jie,
Jiashi Feng, and Thomas S Huang, “Revisiting dilated
convolution: A simple approach for weakly-and semi-
supervised semantic segmentation,” in Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition, 2018, pp. 7268–7277.
[9] Fei Yu, Jie Zhao, Yanjun Gong, Zhi Wang, Yuxi
Li, Fan Yang, Bin Dong, Quanzheng Li, and
Li Zhang, “Annotation-free cardiac vessel segmenta-
tion via knowledge transfer from retinal images,” arXiv
preprint arXiv:1907.11483, 2019.
[10] Mo Zhang, Xiang Li, Mengjia Xu, and Quanzheng
Li, “Rbc semantic segmentation for sickle cell dis-
ease based on deformable u-net,” in International Con-
ference on Medical Image Computing and Computer-
Assisted Intervention. Springer, 2018, pp. 695–702.

1507
Authorized licensed use limited to: KIIT University. Downloaded on June 07,2024 at 08:56:17 UTC from IEEE Xplore. Restrictions apply.

You might also like