AF-SEG An Annotation-Free Approach For Image Segmentation by Self-Supervision and Generative Adversarial Network
AF-SEG An Annotation-Free Approach For Image Segmentation by Self-Supervision and Generative Adversarial Network
Fei Yu1∗ , Hexin Dong1∗ , Mo Zhang1 , Jie Zhao2 , Bin Dong3,1,2 , Quanzheng Li4,2 , Li Zhang1,2
1
Center for Data Science, Peking University, Beijing, China;
2
Center for Data Science in Health and Medicine, Peking University, Beijing, China;
3
Beijing International Center for Mathematical Research (BICMR), Peking University, Beijing, China;
4
MGH/BWH Center for Clinical Data Science, Boston, MA 02115, USA.
∗
These authors contributed equally.
Original Image
Coarse Label
Original input image: 512x512
Segmentor rate
Original Image
Prediction
Segmentor Top λ*rate
3. Corresponding image
synthesis and segmentation Test
Mask results
Background
Inpainting
Segmentation result: 512x512
Fig. 1. The flowchart of the proposed method Fig. 2. The illustration of the background generating method.
The mask (green circles) produced by our method are larger
method provides a considerable improvement over the tra- in order to mask-out entire false negatives.
ditional methods and achieves comparable accuracies with
fully supervised methods. Discriminator
2. METHOD
Real Generator Fake Back Back Real*
In this work, we first perform traditional annotation-free
methods to obtain coarse segmentations. We then synthesize
an image corresponding to the coarse segmentation using Object Shape loss Object Segmentor Weak
GAN. Finally, we use the coarse segmentation as a pixel-
level annotation of the synthetic image, and supervisely train
a high-quality segmentation model. Inspired by SC-GAN,
Fig. 3. The illustration of corresponding image synthesis and
the foreground and background of the synthetic image are
segmentation method.
generated separately, and then the two are fused, which en-
sures that the foreground of the image is consistent with the
coarse segmentation. The flow chart of our method is shown 2.1 (to remove the false negatives). As shown in Figure 2, to
in Figure 1, and each step will be described in detail below. perform DSP, we train a simple segmentation network, whose
input image is the original image and the label is the corre-
2.1. Coarse Segmentation sponding coarse segmentation. DSP believes that the model
will detect all label-like structures in the early stage of the
We first use the traditional annotation-free method to generate
training process. Therefore, we apply the early-stop strategy
coarse segmentation results. For example, in cell segmenta-
during training. The stop time of the training is controlled
tion, we adopt the classic Level set method. In vessel seg-
by a hyper-parameter λ, which represents the multiple of the
mentation, Hessian analysis is also a good choice. Since the
size of the coarse segmentation that we want the final predic-
coarse segmentation obtained in this step will be used as the
tion to be. In this experiment, we simply set λ = 2. Using
annotation of the later supervised learning, we apply a mor-
the results of DSP, we can effectively mask-out the false neg-
phological erosion operation to reduce the false positive rate
atives in the background. Finally, we use image inpainting
of the coarse segmentation.
techniques, such as DIP, to generate clean (object-free) back-
ground images.
2.2. Background Generating
In this step, we will remove the false negative response left 2.3. Corresponding Image synthesis and segmentation
by the previous step and generate a clean background image
for final synthesis. Following the idea of Deep Image Prior We integrate image synthesis and segmentation into an end-
(DIP), we report a new strategy called Deep Segmentation to-end model. Figure 3 shows three major components of the
Prior (DSP) to enrich the coarse segmentation from Section model, including the generator, the discriminator and the seg-
1504
Authorized licensed use limited to: KIIT University. Downloaded on June 07,2024 at 08:56:17 UTC from IEEE Xplore. Restrictions apply.
menter. In the following text, we will explain the meanings of Cell dataset Vessel dataset
the terms in Figure 3, such as Real, Fake, Real*, and etc.
The generator (G) uses U-Net as its network backbone,
whose input is an original image (Real) and output is a syn-
thetic image (Fake). We introduce a shape consistent loss (L1
loss) to ensure that Real and Fake are consistent within the
coarse segmentation (Weak).
1505
Authorized licensed use limited to: KIIT University. Downloaded on June 07,2024 at 08:56:17 UTC from IEEE Xplore. Restrictions apply.
Table 1. The quantitative results of image segmentation on two datasets. The Level set [1], and Hessian Analysis [2] present
the traditional annotation-free methods. The Supervised U-Net is trained with manual annotations, which should be the upper
bound of our method.
Datasets Cell dataset Vessel dataset
Annotation-free X X X X
Methods Level Set U-Net Proposed Hessian Analysis U-Net Proposed
Accuracy 0.982±0.016 0.996±0.003 0.990±0.007 0.927±0.014 0.976±0.007 0.953±0.010
Precision 0.894±0.072 0.961±0.018 0.906±0.029 0.943±0.057 0.892±0.025 0.833±0.036
Recall 0.806±0.124 0.974±0.012 0.926±0.055 0.481±0.048 0.937±0.021 0.810±0.047
Dice 0.839±0.070 0.967±0.011 0.915±0.029 0.636±0.046 0.913±0.017 0.820±0.028
a
a
b
b
c
c
1506
Authorized licensed use limited to: KIIT University. Downloaded on June 07,2024 at 08:56:17 UTC from IEEE Xplore. Restrictions apply.
level set method for image segmentation in the presence [11] Karel Zuiderveld, “Contrast limited adaptive histogram
of intensity inhomogeneities with application to mri,” equalization,” in Graphics gems IV. Academic Press
IEEE transactions on image processing, vol. 20, no. 7, Professional, Inc., 1994, pp. 474–485.
pp. 2007–2016, 2011.
[12] Diederik P Kingma and Jimmy Ba, “Adam: A
[2] Alejandro F Frangi, Wiro J Niessen, Koen L Vincken, method for stochastic optimization,” arXiv preprint
and Max A Viergever, “Multiscale vessel enhance- arXiv:1412.6980, 2014.
ment filtering,” in International conference on med-
ical image computing and computer-assisted interven- [13] Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky,
tion. Springer, 1998, pp. 130–137. “Instance normalization: The missing ingredient for fast
stylization,” arXiv preprint arXiv:1607.08022, 2016.
[3] Jonathan Long, Evan Shelhamer, and Trevor Darrell,
“Fully convolutional networks for semantic segmenta- [14] Sergey Ioffe and Christian Szegedy, “Batch nor-
tion,” in Proceedings of the IEEE conference on com- malization: Accelerating deep network training by
puter vision and pattern recognition, 2015, pp. 3431– reducing internal covariate shift,” arXiv preprint
3440. arXiv:1502.03167, 2015.
1507
Authorized licensed use limited to: KIIT University. Downloaded on June 07,2024 at 08:56:17 UTC from IEEE Xplore. Restrictions apply.