2019 (Brats) 1st - Two Stage Cascade U Net
2019 (Brats) 1st - Two Stage Cascade U Net
1 Introduction
Gliomas are the most common type of primary brain tumors. Automatic three-
dimensional brain tumor segmentation can save doctors time and provide an
appropriate method of additional tumor analysis and monitoring. Recently, deep
learning approaches have consistently outperformed traditional brain tumor seg-
mentation methods [6,10,17,20,24,27].
The multimodal brain tumor segmentation challenge (BraTS) is aimed at
evaluating state-of-the-art methods for the segmentation of brain tumors [1–
4,13]. The BraTS 2019 training dataset, which comprises 259 cases of high-grade
gliomas (HGG) and 76 cases of low-grade gliomas (LGG), is manually annotated
by both clinicians and board-certified radiologists. For each patient, a native pre-
contrast (T1), a post-contrast T1-weighted (T1Gd), a T2-weighted (T2) and a
T2 Fluid Attenuated Inversion Recovery (T2-FLAIR) are provided. An example
image set is presented in Fig. 1. Each tumor is segmented into enhancing tumor,
the peritumoral edema, and the necrotic and non-enhancing tumor core. A num-
ber of metrics (Dice score, Hausdorff distance (95%), sensitivity and specificity)
c Springer Nature Switzerland AG 2020
A. Crimi and S. Bakas (Eds.): BrainLes 2019, LNCS 11992, pp. 231–241, 2020.
https://doi.org/10.1007/978-3-030-46640-4_22
232 Z. Jiang et al.
as the first stage network to train a coarse prediction. In the second stage, we
increase the width of the network and use two decoders so as to boost perfor-
mance. The second stage is added to refine the prediction map by concatenating
a preliminary prediction map with the original input to utilize auto-context. We
do not use any additional training data and only participate in the segmentation
task in testing phase.
2 Methods
Myronenko [15] proposed an asymmetrical U-Net with a variational autoencoder
branch [5,11]. In this paper, we take a variant of this approach as the basic
segmentation architecture. We further propose a two-stage cascaded U-Net. The
details are illustrated as follows.
X2 X2 X2
X2
X2
X2
Due to GPU memory limitations, our networks is designed to take input patches
of size 128 × 128 × 128 voxels and to use a batch size of one. The network archi-
tecture consists of a larger encoding path, to extract complex semantic features,
and a smaller decoding path, to recover a segmentation map with the same input
size. The architecture of the first stage network is presented in Fig. 3.
The 3D U-Net has an encoder and a decoder path, each of which have four
spatial levels. At the beginning of the encoder, patches of size 128 × 128 × 128
voxels with four channels are extracted from the brain tumor images as input,
followed by an initial 3 × 3 × 3 3D convolution with 16 filters. We also use a
dropout with a rate of 0.2 after the initial encoder convolution. The encoder
part uses a pre-activated residual block [7,8]. Each of these blocks consists of
two 3 × 3 × 3 convolutions with Group Normalization [23] with group size of
8 and Rectified Linear Unit (ReLU) activation, followed by additive identity
skip connection. The number of pre-activated residual blocks is 1, 2, 2, and 4
within each spatial level. Moreover, a convolution layer with a 3 × 3 × 3 filter
and a stride of 2 is used to reduce the resolution of the feature maps by 2 and
simultaneously increase the number of feature channels by 2.
Unlike the encoder, the decoder structure uses a single pre-activated residual
block for each spatial level. Before up-sampling, we use 1 × 1 × 1 convolutions
to reduce the number of features by a factor of 2. Compared with [15], we use
a deconvolution with kernel size 2 × 2 × 2 and a stride of 2 rather than trilin-
ear interpolation in order to double the size of the spatial dimension. The net-
work features shortcut connections between corresponding layers with the same
Two-Stage Cascaded U-Net: 1st Place Solution to BraTS Challenge 2019 235
X2
十
X2 X2
X2 X2
X2
十
X2
X2
X2
十
2.4 Loss
The Dice Similarity Coefficient measures (DSC) the degree of overlap between
the prediction map and ground truth. The DSC is calculated by Eq. 1, where S
is the output of network, R is the ground truth label and | · | denotes the volume
of the region.
2|S ∩ R|
DSC = (1)
|S| + |R|
236 Z. Jiang et al.
Table 1. The first stage network structure, where + stands for additive identity skip
connection, Conv3 - 3 × 3 × 3 convolution, Conv1 - 1 × 1 × 1 convolution, GN - group
normalization, ConvTranspose - deconvolution with kernel size 2 × 2 × 2.
U-Net 1
Name Details Repeat Size
Input 4 × 128 × 128 × 128
InitConv Conv3, Dropout 1 16 × 128 × 128 × 128
EnBlock1 GN, ReLU, Conv3, GN, ReLU, Conv3, + 1 16 × 128 × 128 × 128
Encoder EnDown1 Conv3 stride 2 1 32 × 64 × 64 × 64
EnBlock2 GN, ReLU, Conv3, GN, ReLU, Conv3, + 2 32 × 64 × 64 × 64
EnDown2 Conv3 stride 2 1 64 × 32 × 32 × 32
EnBlock3 GN, ReLU, Conv3, GN, ReLU, Conv3, + 2 64 × 32 × 32 × 32
EnDown3 Conv3 stride 2 1 128 × 16 × 16 × 16
EnBlock4 GN, ReLU, Conv3, GN, ReLU, Conv3, + 4 128 × 16 × 16 × 16
DeUp3 Conv1, ConvTranspose, +EnBlock3 1 64 × 32 × 32 × 32
DeBlock3 GN, ReLU, Conv3, GN, ReLU, Conv3, + 1 64 × 32 × 32 × 32
DeUp2 Conv1,ConvTranspose,+EnBlock2 1 32 × 64 × 64 × 64
Decoder DeBlock2 GN, ReLU, Conv3, GN, ReLU, Conv3, + 1 32 × 64 × 64 × 64
DeUp2 Conv1, ConvTranspose, +EnBlock1 1 16 × 128 × 128 × 128
DeBlock1 GN, ReLU, Conv3, GN, ReLU, Conv3, + 1 16 × 128 × 128 × 128
EndConv Conv1 1 3 × 128 × 128 × 128
Sigmoid Sigmoid 1 3 × 128 × 128 × 128
3 Experiments
3.1 Data Pre-processing and Augmentation
Before feeding the data into the deep learning network, a preprocessing method
is used to process the input data. Since the MRI intensity values are non-
standardized, we apply intensity normalization to each MRI modality from each
patient independently by subtracting the mean and dividing by the standard
deviation of the brain region only.
Moreover, to prevent an overfitting issue from arising, we deploy three
types of data augmentation. Firstly, we apply a random intensity shift between
Two-Stage Cascaded U-Net: 1st Place Solution to BraTS Challenge 2019 237
Table 2. The second stage network structure, where + stands for additive identity skip
connection, Conv3 - 3 × 3 × 3 convolution, Conv1 - 1 × 1 × 1 convolution, GN - group
normalization, ConvTranspose - deconvolution with kernel size 2 × 2 × 2, Upsampling
- trilinear interpolation, Decoder2 is used only during training.
U-Net 2
Name Details Repeat Size
Input 7 × 128 × 128 × 128
InitConv Conv3, Dropout 1 32 × 128 × 128 × 128
EnBlock1 GN, ReLU, Conv3, GN, ReLU, Conv3, + 1 32 × 128 × 128 × 128
Encoder EnDown1 Conv3 stride 2 1 64 × 64 × 64 × 64
EnBlock2 GN, ReLU, Conv3, GN, ReLU, Conv3, + 2 64 × 64 × 64 × 64
EnDown2 Conv3 stride 2 1 128 × 32 × 32 × 32
EnBlock3 GN, ReLU, Conv3, GN, ReLU, Conv3, + 2 128 × 32 × 32 × 32
EnDown3 Conv3 stride 2 1 256 × 16 × 16 × 16
EnBlock4 GN, ReLU, Conv3, GN, ReLU, Conv3, + 4 256 × 16 × 16 × 16
DeUp3 Conv1, ConTranspose, +EnBlock3 1 128 × 32 × 32 × 32
DeBlock3 GN, ReLU, Conv3, GN, ReLU, Conv3, + 1 128 × 32 × 32 × 32
DeUp2 Conv1, ConTranspose, +EnBlock2 1 64 × 64 × 64 × 64
Decoder1 DeBlock2 GN, ReLU, Conv3, GN, ReLU, Conv3, + 1 64 × 64 × 64 × 64
DeUp2 Conv1, ConTranspose, +EnBlock1 1 32 × 128 × 128 × 128
DeBlock1 GN, ReLU, Conv3, GN, ReLU, Conv3, + 1 32 × 128 × 128 × 128
EndConv Conv1 1 3 × 128 × 128 × 128
Sigmoid Sigmoid 1 3 × 128 × 128 × 128
DeUp3 1 Conv1, Upsampling, +EnBlock3 1 128 × 32 × 32 × 32
DeBlock3 1 GN, ReLU, Conv3, GN, ReLU, Conv3, + 1 128 × 32 × 32 × 32
DeUp2 1 Conv1, Upsampling, +EnBlock2 1 64 × 64 × 64 × 64
Decoder2 DeBlock2 1 GN, ReLU, Conv3, GN, ReLU, Conv3, + 1 64 × 64 × 64 × 64
(Used only
during
training)
DeUp2 1 Conv1, Upsampling, +EnBlock1 1 32 × 128 × 128 × 128
DeBlock1 1 GN, ReLU, Conv3, GN, ReLU, Conv3, + 1 32 × 128 × 128 × 128
EndConv 1 Conv1 1 3 × 128 × 128 × 128
Sigmoid 1 Sigmoid 1 3 × 128 × 128 × 128
At testing time, we segment the whole brain region at once instead of using a
sliding window. The interpolation decoder is not used during the inference phase.
To obtain a more robust prediction, we preserve eight weights of the model in
the last time of the training progress for prediction. For each snapshot, the input
images are used different flipping before being fed into the network. Finally, we
average the output of the resulting eight segmentation probability maps.
3.4 Post-processing
We replace enhancing tumor with necrosis when the volume of predicted enhanc-
ing tumor is less than the threshold to post-process our segmentation results
(The threshold is chosen for each experiment independently, depending on the
performance of BraTS 2019 validation dataset).
4 Results
The variability of a single model can be quite high. We use total five net-
works from the 5-fold cross-validation as an ensemble to predict segmentation for
BraTS 2019 validation dataset. Also, we use an ensemble of a set of 12 models,
which are trained from scratch using the entire training dataset. The best single
model is chosen from the set of 12 models.
We report the results of our approach on the BraTS 2019 validation dataset,
which contains 125 cases with unknown glioma grade and unknown segmen-
tation. All reported values are computed via the online evaluation platform
(https://ipp.cbica.upenn.edu/) for evaluation of Dice score, sensitivity, speci-
ficity and Hausdorff distance (95%). Validation set results can be found in
Table 3. The performance of the best single model is slightly better than ensem-
ble of 5-fold cross-validation. The ensemble of 12 models results in a minor
improvement compared with the best single model.
Testing set results are presented in Table 4. Our algorithm achieved the first
place out of more than 70 participating teams.
Two-Stage Cascaded U-Net: 1st Place Solution to BraTS Challenge 2019 239
5 Conclusion
In this paper, we propose a two-stage cascaded U-Net. Our approach refines the
prediction through a progressive cascaded network. Experiments on the BraTS
2019 validation set demonstrate that our method can obtain very competitive
segmentation even though using single model. The testing results show that our
proposed method can achieve excellent performance, winning the first position
in the BraTS 2019 challenge segmentation task among 70+ participating teams.
References
1. Bakas, S., et al.: Segmentation labels and radiomic features for the pre-operative
scans of the TCGA-GBM collection. Cancer Imaging Archive (2017). https://doi.
org/10.7937/K9/TCIA.2017.KLXWJJ1Q
2. Bakas, S., et al.: Segmentation labels and radiomic features for the pre-operative
scans of the TCGA-LGG collection. Cancer Imaging Archive 286 (2017)
3. Bakas, S., et al.: Advancing the cancer genome atlas glioma MRI collections with
expert segmentation labels and radiomic features. Sci. Data 4, 170117 (2017).
https://doi.org/10.1038/sdata.2017.117
240 Z. Jiang et al.
4. Bakas, S., et al.: Identifying the best machine learning algorithms for brain tumor
segmentation, progression assessment, and overall survival prediction in the brats
challenge. arXiv preprint arXiv:1811.02629 (2018)
5. Doersch, C.: Tutorial on variational autoencoders. arXiv preprint arXiv:1606.05908
(2016)
6. Havaei, M., et al.: Brain tumor segmentation with deep neural networks. Med.
Image Anal. 35, 18–31 (2017)
7. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
pp. 770–778 (2016)
8. He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In:
Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp.
630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0 38
9. Kamnitsas, K., et al.: Ensembles of multiple models and architectures for robust
brain tumour segmentation. In: Crimi, A., Bakas, S., Kuijf, H., Menze, B., Reyes,
M. (eds.) BrainLes 2017. LNCS, vol. 10670, pp. 450–462. Springer, Cham (2018).
https://doi.org/10.1007/978-3-319-75238-9 38
10. Kamnitsas, K., et al.: Efficient multi-scale 3D CNN with fully connected CRF for
accurate brain lesion segmentation. Med. Image Anal. 36, 61–78 (2017)
11. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint
arXiv:1312.6114 (2013)
12. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic
segmentation. In: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp. 3431–3440 (2015)
13. Menze, B.H., et al.: The multimodal brain tumor image segmentation benchmark
(BRATS). IEEE Trans. Med. Imaging 34(10), 1993–2024 (2015). https://doi.org/
10.1109/tmi.2014.2377694
14. Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks
for volumetric medical image segmentation. In: 2016 Fourth International Confer-
ence on 3D Vision (3DV), pp. 565–571. IEEE (2016)
15. Myronenko, A.: 3D MRI brain tumor segmentation using autoencoder regular-
ization. In: Crimi, A., Bakas, S., Kuijf, H., Keyvan, F., Reyes, M., van Walsum,
T. (eds.) BrainLes 2018. LNCS, vol. 11384, pp. 311–320. Springer, Cham (2019).
https://doi.org/10.1007/978-3-030-11726-9 28
16. Paszke, A., et al.: Automatic differentiation in PyTorch (2017)
17. Pereira, S., Pinto, A., Alves, V., Silva, C.A.: Brain tumor segmentation using
convolutional neural networks in mri images. IEEE Trans. Med. Imaging 35(5),
1240–1251 (2016)
18. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomed-
ical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F.
(eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).
https://doi.org/10.1007/978-3-319-24574-4 28
19. Roth, H.R., et al.: A multi-scale pyramid of 3D fully convolutional networks for
abdominal multi-organ segmentation. In: Frangi, A.F., Schnabel, J.A., Davatzikos,
C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp.
417–425. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3 48
20. Shen, H., Wang, R., Zhang, J., McKenna, S.J.: Boundary-aware fully convolu-
tional network for brain tumor segmentation. In: Descoteaux, M., Maier-Hein, L.,
Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol.
10434, pp. 433–441. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-
66185-8 49
Two-Stage Cascaded U-Net: 1st Place Solution to BraTS Challenge 2019 241