2012.13871v1
2012.13871v1
Jun Ma[0000−0002−9739−0855]
arXiv:2012.13871v1 [eess.IV] 27 Dec 2020
1 Introduction
Accurate segmentation of the left ventricular cavity, myocardium and right ven-
tricle from cardiac magnetic resonance images plays an import role for quan-
titative analysis of cardiac function, which can be used in clinical cardiology
for patient management, disease diagnosis, risk evaluation, and therapy decision
[11]. In the recent years, many deep learning-based methods have achieved un-
precedented performance ([1] [3]), especially when testing cases have the same
distribution as training cases. However, segmentation accuracy can be greatly
degraded when these methods are tested on unseen datasets acquired from dis-
tinct MRI scanners or clinical centres [12]. This problem makes it difficult for
these methods to be applied consistently across multiple clinical centres, espe-
cially when subjects are scanned using different MRI protocols or machines.
2 J. Ma, Histogram Matching Augmentation
Fig. 1. Visual examples from different vendors. Images from different vendors have
significant appearance variations.
2 Proposed Method
Histogram matching has been a widely used method, which generates a processed
image with a specified histogram [8]. We give a formal introduction of histogram
matching as follows. Let S and T denote continuous intensities (considered as
random variables) of the source image and the target image, respectively. PS and
Title Suppressed Due to Excessive Length 3
it than follows from these two equations that M (S) = G(T ) and therefore, that
T must satisfy the condition
T = G−1 [M (S)] = G−1 (r) (3)
Equations (1)-(3) show that an image whose intensity levels have a specified
probability density function can be obtained from a given image by using the
following four steps:
– Step 1. Obtain PS from the source image and use Eq. (1) to obtain the value
of r.
– Step 2. Use the specified PDF in Eq. (2) to obtain the transformation func-
tion G(T ).
– Step 3. Obtain the inverse transformation T = G−1 (r).
– Step 4. Obtain the output image by first equalizing the input image using Eq.
(1); the pixel values in this image are the r values. For each pixel with value r
in the equalized images, perform the inverse mapping T = G−1 (r) to obtain
the corresponding pixel in the output image. When all pixels have been thus
processed, the PDF of the output image will be equal to the specified PDF.
Scikit-image [13] has a build-in function match histograms 1 for histogram
matching. In this paper, we use histogram matching to augment the training
dataset so as to introduce the intensity distribution of the testing set. Specifically,
we randomly select image pairs from labelled cases and unlabelled cases, and then
transform the intensity distribution of the unlabelled case to labelled case. In
this way, we can obtain many new training cases where its intensity distribution
is similar to the unlabelled cases. Figure 2 presents some examples of the source
images, target images and augmented images.
were scanned in clinical centres in three different countries (Spain, Germany and
Canada) using four different magnetic resonance scanner vendors (Siemens, Gen-
eral Electric, Philips and Canon). The training set will contain 150 annotated
images from two different MRI vendors (75 each) and 25 unlabelled images from
a third vendor. The CMR images have been segmented by experienced clini-
cians from the respective institutions, including contours for the left (LV) and
right ventricle (RV) blood pools, as well as for the left ventricular myocardium
(MYO). The 200 test cases correspond to 50 new studies from each of the ven-
dors provided in the training set and 50 additional studies from a fourth unseen
vendor, that will be tested for model generalization ability. 20% of these datasets
will be used for validation and the rest will be reserved for testing and ranking
participants.
During preprocessing, we resample all the images to 1.25 × 1.25 × 8mm3
and apply Z-score (mean subtraction and division by standard deviation) to
normalize each image. We employ nnU-Net [9] as the default network. The patch
size is 288 × 288 × 14, and batch size is 8. We train 2D U-Net and 3D U-Net with
five-fold cross validation. Each fold is trained on a TITAN V100 GPU with 1000
epochs. For each fold, we save the best-epoch model2 and final-epoch model.
The code and trained models will be publicly available for research community
after anonymous review. We declare that the segmentation method has not used
any pre-trained models nor additional MRI datasets other than those provided
by the organizers.
2
Best-epoch model stands for the model that can achieves the best Dice on the vali-
dation set.
Title Suppressed Due to Excessive Length 5
(a) Image (b) Ground Truth (c) 2D U-Net (Base) (d) 3D U-Net (Base) (e) 2D U-Net (HM) (f) 3D U-Net (HM)
Fig. 3. Visual examples of segmentation results. Base and HM stand for the baseline
dataset and the histogram matching augmented dataset.
Table 1 presents the five-fold cross validation results of the final-epoch models of
2D U-Net and 3D U-Net3 . Basically, we found that the performance of 3D U-Net
is slightly better than 2D U-Net. Figure 3 presents some visual segmentation
results of different models. The methods achieve the best Dice scores for left
ventricular, while the performance of myocardium is inferior than the left and
right ventricular, indicating that the myocardium is more challenging to obtain
accurate segmentation results.
The validation set is hidden by challenge organizers. We package our code and
model in a singularity container and submit it to the organizers. Due to the
limited number of submission tries, we do not submit the baseline models because
the models trained on augmented datasets would be better than baseline models.
In summary, we submit the following five solutions on the validation set.
Table 2 shows the quantitative results of the five solutions on validation set.
It can be observed that assembling multiple models may improve Dice, but would
degrade the HD and ASSD. We also apply paired T-test between the solution 1
and the other four solutions to show whether their performances are statistically
3
The results corresponding to the best-epoch model are not reported because it over-
fits the validation set, where the corresponding Dice score is meaningless.
6 J. Ma, Histogram Matching Augmentation
Table 1. Five-fold cross validation results. It should be noted that The models trained
on the default and the augmented dataset are not comparable, because the five-fold
splits are different between the two datasets.
significant different. Surprisingly, the statistical significance level p > 0.05 for
all comparisons. In other words, the performances of solution 2-4 do not have
statistically significant difference compared with the solution 1. It can be found
that ensemble more models can obtain sightly better Dice scores, but could
degrade the Hausdorff distance. We also compare our results with the recent
work [10] that uses GAN for domain adaptation, and our methods obtain better
Dice scores for LV, Myo and RV. Finally, we select the solution 1 as our final
solution for the hidden testing set.
LV Myo RV
Solution
Dice HD ASSD Dice HD ASSD Dice HD ASSD
Solution 1 0.9130 8.089 1.017 0.8627 12.19 0.710 0.8937 11.88 0.9078
Solution 2 0.9131 8.091 1.015 0.8627 12.20 0.710 0.8935 11.85 0.9101
Solution 3 0.9166 8.090 0.958 0.8660 12.25 0.734 0.8927 11.35 0.9682
Solution 4 0.9166 8.103 0.959 0.8658 12.26 0.736 0.8927 11.34 0.9667
Solution 5 0.9166 8.099 0.959 0.8659 12.25 0.735 0.8927 11.35 0.9674
GAN [10] 0.903 - - 0.859 - - 0.865 - -
Title Suppressed Due to Excessive Length 7
Table 3 presents the average Dice, HD and ASSD for each vendor on testing set.
Overall, the performances on vendor C and D are lower than the performances
on vendor A and B, because the training set does not have annotated cases from
the vendor C and D. It can be observed that the performance on the vendor
C is better the the performance on the vendor D, especially in HD and ASSD
with improvements up to 5mm, which could4 demonstrate the effectiveness of
our histogram matching data augmentation.
(a) Image (b) Ground truth (c) Segmentation (d) Image (e) Ground truth (f) Segmentation
LV Myo RV
Vendor
Dice HD ASSD Dice HD ASSD Dice HD ASSD
A 0.9148 10.78 1.003 0.8435 14.14 0.697 0.8771 12.84 1.142
B 0.9136 7.866 0.971 0.8675 10.14 0.717 0.8792 11.61 1.144
C 0.8943 9.231 1.389 0.8265 11.33 1.066 0.8732 10.82 1.152
D 0.8977 12.11 1.521 0.8243 14.34 0.958 0.8703 15.46 1.513
All 0.9051 9.996 1.221 0.8405 12.49 0.859 0.8749 12.68 1.238
Worse cases analysis Figure 4 and 5 show some segmentation results with
low performance. Basically, it can be observed that most of the segmentation
4
Here we use ‘could’ because we do not have corresponding testing results of our
baseline model where histogram matching data augmentation is not used.
8 J. Ma, Histogram Matching Augmentation
errors occur in the top or bottom of the heart because these regions usually have
low contrast and ambiguous boundaries. We find that the worse segmentation
results can also have reasonable shapes even for the severe over-segmentation
(e.g., Figure 4 the 2nd row ). Some LV segmentation results are significantly
smaller or larger than ground truth, which could motivate us to improve our
network by imposing size constrain, such as constraining the volume of network
outputs to be within a specified range.
(a) Image (b) Ground truth (c) Segmentation (d) Image (e) Ground truth (f) Segmentation
4 Conclusion
One of the challenging problems of current segmentation CNNs is that the per-
formance would degrade when applying the trained model to a new dataset.
In this paper, we introduce histogram matching to augment the training cases
that have the similar intensity distributions to the new (unlabelled) dataset set,
which is very simple and can be a plug-and-play method to any segmentation
CNNs. Based on the quantitative results on the validation set, we believe that
our method can be a strong baseline.
Title Suppressed Due to Excessive Length 9
Acknowledgement
The authors of this paper declare that the segmentation method they imple-
mented for participation in the M&Ms challenge has not used any pre-trained
models nor additional MRI datasets other than those provided by the organizers.
We also thank the organizers for hosting the great challenge.
References
1. Bernard, O., Lalande, A., Zotti, C., Cervenansky, F., Yang, X., Heng, P.A., Cetin,
I., Lekadir, K., Camara, O., Ballester, M.A.G., et al.: Deep learning techniques for
automatic mri cardiac multi-structures segmentation and diagnosis: is the problem
solved? IEEE Transactions on Medical Imaging 37(11), 2514–2525 (2018)
2. Campello, V.M., Palomares, J.F.R., Guala, A., Marakas, M., Friedrich, M.,
Lekadir, K.: Multi-Centre, Multi-Vendor & Multi-Disease Cardiac Image Segmen-
tation Challenge (Mar 2020). https://doi.org/10.5281/zenodo.3715890
3. Chen, C., Qin, C., Qiu, H., Tarroni, G., Duan, J., Bai, W., Rueckert, D.: Deep
learning for cardiac image segmentation: A review. Frontiers in Cardiovascular
Medicine 7, 25 (2020)
4. Chen, C., Dou, Q., Jin, Y., Chen, H., Qin, J., Heng, P.A.: Robust multimodal brain
tumor segmentation via feature disentanglement and gated fusion. In: International
Conference on Medical Image Computing and Computer-Assisted Intervention. pp.
447–456. Springer (2019)
5. Dou, Q., de Castro, D.C., Kamnitsas, K., Glocker, B.: Domain generalization via
model-agnostic learning of semantic features. In: Advances in Neural Information
Processing Systems. pp. 6450–6461 (2019)
6. Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F.,
Marchand, M., Lempitsky, V.: Domain-adversarial training of neural networks. The
Journal of Machine Learning Research 17(1), 2096–2030 (2016)
7. Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.:
Imagenet-trained cnns are biased towards texture; increasing shape bias improves
accuracy and robustness. In: International Conference on Learning Representations
(2018)
8. Gonzalez, R.C., Woods, R.E., Eddins, S.L.: Digital image processing using MAT-
LAB. Pearson Education India (2004)
9. Isensee, F., Jäger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnu-net: a
self-configuring method for deep learning-based biomedical image segmentation.
Nature Methods (2020)
10. Li, H., Zhang, J., Menze, B.: Generalisable cardiac structure segmentation via
attentional and stacked image adaptation. arXiv preprint arXiv:2008.01216 (2020)
11. Mach, F., Baigent, C., Catapano, A.L., Koskinas, K.C., Casula, M., Badimon,
L., Chapman, M.J., De Backer, G.G., Delgado, V., Ference, B.A., et al.: 2019
esc/eas guidelines for the management of dyslipidaemias: lipid modification to
reduce cardiovascular risk: The task force for the management of dyslipidaemias of
the european society of cardiology (esc) and european atherosclerosis society (eas).
European Heart Journal 41(1), 111–188 (2020)
12. Tao, Q., Yan, W., Wang, Y., Paiman, E.H., Shamonin, D.P., Garg, P., Plein, S.,
Huang, L., Xia, L., Sramko, M., et al.: Deep learning–based method for fully auto-
matic quantification of left ventricle function from cine mr images: a multivendor,
multicenter study. Radiology 290(1), 81–88 (2019)
10 J. Ma, Histogram Matching Augmentation
13. Van der Walt, S., Schönberger, J.L., Nunez-Iglesias, J., Boulogne, F., Warner, J.D.,
Yager, N., Gouillart, E., Yu, T.: scikit-image: image processing in python. PeerJ
2, e453 (2014)
14. Zhuang, X., Li, L.: Multi-sequence CMR based myocardial pathology segmenta-
tion challenge (Mar 2020). https://doi.org/10.5281/zenodo.3715932, https://doi.
org/10.5281/zenodo.3715932