0% found this document useful (0 votes)
7 views

2012.13871v1

This paper presents a histogram matching data augmentation method to improve cardiac image segmentation across different domains, addressing performance degradation when testing on distinct datasets. The method was evaluated in the MICCAI 2020 M&Ms challenge, achieving high Dice scores and ranking third overall. The proposed approach is simple and can be easily integrated into various segmentation tasks, with code and models made publicly available.

Uploaded by

Ruchira Tabassum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

2012.13871v1

This paper presents a histogram matching data augmentation method to improve cardiac image segmentation across different domains, addressing performance degradation when testing on distinct datasets. The method was evaluated in the MICCAI 2020 M&Ms challenge, achieving high Dice scores and ranking third overall. The proposed approach is simple and can be easily integrated into various segmentation tasks, with code and models made publicly available.

Uploaded by

Ruchira Tabassum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Histogram Matching Augmentation for Domain

Adaptation with Application to Multi-Centre,


Multi-Vendor and Multi-Disease Cardiac Image
Segmentation

Jun Ma[0000−0002−9739−0855]
arXiv:2012.13871v1 [eess.IV] 27 Dec 2020

Department of Mathematics, Nanjing University of Science and Technology


[email protected]

Abstract. Convolutional Neural Networks (CNNs) have achieved high


accuracy for cardiac structure segmentation if training cases and testing
cases are from the same distribution. However, the performance would
be degraded if the testing cases are from a distinct domain (e.g., new
MRI scanners, clinical centers). In this paper, we propose a histogram
matching (HM) data augmentation method to eliminate the domain
gap. Specifically, our method generates new training cases by using HM
to transfer the intensity distribution of testing cases to existing train-
ing cases. The proposed method is quite simple and can be used in a
plug-and-play way in many segmentation tasks. The method is eval-
uated on MICCAI 2020 M&Ms challenge, and achieves average Dice
scores of 0.9051, 0.8405, and 0.8749, and Hausdorff Distances of 9.996,
12.49, and 12.68 for the left ventricular, myocardium, and right ven-
tricular, respectively. Our results rank the third place in MICCAI 2020
M&Ms challenge. The code and trained models are publicly available at
https://github.com/JunMa11/HM_DataAug.

Keywords: Cardiac Segmentation · Deep learning · Domain adaptation


· Histogram Matching · Generalization

1 Introduction

Accurate segmentation of the left ventricular cavity, myocardium and right ven-
tricle from cardiac magnetic resonance images plays an import role for quan-
titative analysis of cardiac function, which can be used in clinical cardiology
for patient management, disease diagnosis, risk evaluation, and therapy decision
[11]. In the recent years, many deep learning-based methods have achieved un-
precedented performance ([1] [3]), especially when testing cases have the same
distribution as training cases. However, segmentation accuracy can be greatly
degraded when these methods are tested on unseen datasets acquired from dis-
tinct MRI scanners or clinical centres [12]. This problem makes it difficult for
these methods to be applied consistently across multiple clinical centres, espe-
cially when subjects are scanned using different MRI protocols or machines.
2 J. Ma, Histogram Matching Augmentation

Fig. 1. Visual examples from different vendors. Images from different vendors have
significant appearance variations.

The M&Ms challenge is the first international competition to date on car-


diac image segmentation combining data from different centres, vendors, dis-
eases and countries at the same time. It evaluates the generalisation ability of
machine/deep learning and cross-domain transfer learning techniques for cardiac
image segmentation [2]. Figure 1 shows six cardiac MR cases from three vendors.
It can be observed that the appearance varies remarkably among different ven-
dors. Thus, how to develop a robust segmentation model that it can generalize to
different centers, vendors, and diseases is an important but challenging problem.
Recently, many studies have been proposed to tackle this issue, such as do-
main adaptation ([6], [4]) and domain generalization [5]. Basically, domain adap-
tation aims to learn to align source and target domain in a domain-invariant high
level feature space, which usually needs few annotated or unlabelled cases from
the target domain during training. Domain generalization aims to train a model
that it can directly generalize to new domains without need of retraining, which
does not use data from the target domain. In practice, both the two popular
methods need to modify network architectures or loss functions.
Motivated by a recent study [7] where CNNs are more sensitive to texture
and intensity features. We aim to improve the generalization ability of CNNs by
transferring the intensity distribution of the target dataset to the source dataset.
Specifically, we use histogram matching to bring the intensity appearance of the
target dataset to the source dataset. Instead of modifying the network architec-
ture or loss function, our method only augments the training dataset, which is
very simple and can be a plug-and-play method to any segmentation tasks.

2 Proposed Method

Histogram matching has been a widely used method, which generates a processed
image with a specified histogram [8]. We give a formal introduction of histogram
matching as follows. Let S and T denote continuous intensities (considered as
random variables) of the source image and the target image, respectively. PS and
Title Suppressed Due to Excessive Length 3

PT denote their corresponding continuous probability density functions (PDF).


We can estimate PS from the source image, and PT is the target probability
density function. Let r be a random variable with the property
Z S
r = M (S) = (L − 1) PS (x)dx, (1)
0

where L is the number of intensity levels and x is a dummy variable of integration.


Suppose that a random variable w has the property
Z T
G(T ) = (L − 1) PT (x)dx = r, (2)
0

it than follows from these two equations that M (S) = G(T ) and therefore, that
T must satisfy the condition
T = G−1 [M (S)] = G−1 (r) (3)
Equations (1)-(3) show that an image whose intensity levels have a specified
probability density function can be obtained from a given image by using the
following four steps:
– Step 1. Obtain PS from the source image and use Eq. (1) to obtain the value
of r.
– Step 2. Use the specified PDF in Eq. (2) to obtain the transformation func-
tion G(T ).
– Step 3. Obtain the inverse transformation T = G−1 (r).
– Step 4. Obtain the output image by first equalizing the input image using Eq.
(1); the pixel values in this image are the r values. For each pixel with value r
in the equalized images, perform the inverse mapping T = G−1 (r) to obtain
the corresponding pixel in the output image. When all pixels have been thus
processed, the PDF of the output image will be equal to the specified PDF.
Scikit-image [13] has a build-in function match histograms 1 for histogram
matching. In this paper, we use histogram matching to augment the training
dataset so as to introduce the intensity distribution of the testing set. Specifically,
we randomly select image pairs from labelled cases and unlabelled cases, and then
transform the intensity distribution of the unlabelled case to labelled case. In
this way, we can obtain many new training cases where its intensity distribution
is similar to the unlabelled cases. Figure 2 presents some examples of the source
images, target images and augmented images.

3 Experiments and Results


3.1 Dataset and training protocols
Dataset The M&Ms challenge cohort is composed of 350 patients with hyper-
trophic and dilated cardiomyopathies as well as healthy subjects. All subjects
1
https://scikit-image.org/docs/stable/
4 J. Ma, Histogram Matching Augmentation

(a) Source image (b) Target image (c) HM Augmented image

Fig. 2. Visual examples of the augmented images by histogram matching.

were scanned in clinical centres in three different countries (Spain, Germany and
Canada) using four different magnetic resonance scanner vendors (Siemens, Gen-
eral Electric, Philips and Canon). The training set will contain 150 annotated
images from two different MRI vendors (75 each) and 25 unlabelled images from
a third vendor. The CMR images have been segmented by experienced clini-
cians from the respective institutions, including contours for the left (LV) and
right ventricle (RV) blood pools, as well as for the left ventricular myocardium
(MYO). The 200 test cases correspond to 50 new studies from each of the ven-
dors provided in the training set and 50 additional studies from a fourth unseen
vendor, that will be tested for model generalization ability. 20% of these datasets
will be used for validation and the rest will be reserved for testing and ranking
participants.
During preprocessing, we resample all the images to 1.25 × 1.25 × 8mm3
and apply Z-score (mean subtraction and division by standard deviation) to
normalize each image. We employ nnU-Net [9] as the default network. The patch
size is 288 × 288 × 14, and batch size is 8. We train 2D U-Net and 3D U-Net with
five-fold cross validation. Each fold is trained on a TITAN V100 GPU with 1000
epochs. For each fold, we save the best-epoch model2 and final-epoch model.
The code and trained models will be publicly available for research community
after anonymous review. We declare that the segmentation method has not used
any pre-trained models nor additional MRI datasets other than those provided
by the organizers.

2
Best-epoch model stands for the model that can achieves the best Dice on the vali-
dation set.
Title Suppressed Due to Excessive Length 5

(a) Image (b) Ground Truth (c) 2D U-Net (Base) (d) 3D U-Net (Base) (e) 2D U-Net (HM) (f) 3D U-Net (HM)

Fig. 3. Visual examples of segmentation results. Base and HM stand for the baseline
dataset and the histogram matching augmented dataset.

3.2 Five-fold cross validation results

Table 1 presents the five-fold cross validation results of the final-epoch models of
2D U-Net and 3D U-Net3 . Basically, we found that the performance of 3D U-Net
is slightly better than 2D U-Net. Figure 3 presents some visual segmentation
results of different models. The methods achieve the best Dice scores for left
ventricular, while the performance of myocardium is inferior than the left and
right ventricular, indicating that the myocardium is more challenging to obtain
accurate segmentation results.

3.3 Validation set results

The validation set is hidden by challenge organizers. We package our code and
model in a singularity container and submit it to the organizers. Due to the
limited number of submission tries, we do not submit the baseline models because
the models trained on augmented datasets would be better than baseline models.
In summary, we submit the following five solutions on the validation set.

– Solution 1. 3D U-Net best-epoch model;


– Solution 2. 3D U-Net final-epoch model;
– Solution 3. Ensemble of 3D U-Net best-epoch model and 2D U-Net best-
epoch model;
– Solution 4. Ensemble of 3D U-Net final-epoch model and 2D U-Net final-
epoch model;
– Solution 5. Ensemble of the above four solutions;

Table 2 shows the quantitative results of the five solutions on validation set.
It can be observed that assembling multiple models may improve Dice, but would
degrade the HD and ASSD. We also apply paired T-test between the solution 1
and the other four solutions to show whether their performances are statistically
3
The results corresponding to the best-epoch model are not reported because it over-
fits the validation set, where the corresponding Dice score is meaningless.
6 J. Ma, Histogram Matching Augmentation

Table 1. Five-fold cross validation results. It should be noted that The models trained
on the default and the augmented dataset are not comparable, because the five-fold
splits are different between the two datasets.

Dataset Model Fold LV Dice Myo Dice RV Dice


0 0.9124 0.8695 0.8842
1 0.9254 0.8756 0.9074
2D U-Net 2 0.9218 0.8753 0.8945
3 0.9265 0.8684 0.8845
Default
4 0.9291 0.8645 0.8874
Dataset
0 0.9256 0.8873 0.8981
(Baseline)
1 0.9344 0.8879 0.9141
3D U-Net 2 0.9399 0.8828 0.9010
3 0.9372 0.8760 0.8930
4 0.9308 0.8753 0.8944
0 0.9832 0.9639 0.9762
1 0.9871 0.9729 0.9825
2D U-Net 2 0.9871 0.9747 0.9803
Histogram 3 0.9831 0.9683 0.9784
Matching 4 0.9835 0.9735 0.9772
Augmented 0 0.9895 0.9633 0.9760
Dataset 1 0.9887 0.9690 0.9826
3D U-Net 2 0.9824 0.9632 0.9719
3 0.9871 0.9654 0.9796
4 0.9870 0.9741 0.9762

significant different. Surprisingly, the statistical significance level p > 0.05 for
all comparisons. In other words, the performances of solution 2-4 do not have
statistically significant difference compared with the solution 1. It can be found
that ensemble more models can obtain sightly better Dice scores, but could
degrade the Hausdorff distance. We also compare our results with the recent
work [10] that uses GAN for domain adaptation, and our methods obtain better
Dice scores for LV, Myo and RV. Finally, we select the solution 1 as our final
solution for the hidden testing set.

Table 2. Quantitative results of different solutions on the official hidden validation


set. ‘-’ denotes not reported.

LV Myo RV
Solution
Dice HD ASSD Dice HD ASSD Dice HD ASSD
Solution 1 0.9130 8.089 1.017 0.8627 12.19 0.710 0.8937 11.88 0.9078
Solution 2 0.9131 8.091 1.015 0.8627 12.20 0.710 0.8935 11.85 0.9101
Solution 3 0.9166 8.090 0.958 0.8660 12.25 0.734 0.8927 11.35 0.9682
Solution 4 0.9166 8.103 0.959 0.8658 12.26 0.736 0.8927 11.34 0.9667
Solution 5 0.9166 8.099 0.959 0.8659 12.25 0.735 0.8927 11.35 0.9674
GAN [10] 0.903 - - 0.859 - - 0.865 - -
Title Suppressed Due to Excessive Length 7

3.4 Testing set results

Table 3 presents the average Dice, HD and ASSD for each vendor on testing set.
Overall, the performances on vendor C and D are lower than the performances
on vendor A and B, because the training set does not have annotated cases from
the vendor C and D. It can be observed that the performance on the vendor
C is better the the performance on the vendor D, especially in HD and ASSD
with improvements up to 5mm, which could4 demonstrate the effectiveness of
our histogram matching data augmentation.

(a) Image (b) Ground truth (c) Segmentation (d) Image (e) Ground truth (f) Segmentation

Fig. 4. Visual examples of worse cases in vendor A and B.

Table 3. Quantitative results on the official hidden testing set

LV Myo RV
Vendor
Dice HD ASSD Dice HD ASSD Dice HD ASSD
A 0.9148 10.78 1.003 0.8435 14.14 0.697 0.8771 12.84 1.142
B 0.9136 7.866 0.971 0.8675 10.14 0.717 0.8792 11.61 1.144
C 0.8943 9.231 1.389 0.8265 11.33 1.066 0.8732 10.82 1.152
D 0.8977 12.11 1.521 0.8243 14.34 0.958 0.8703 15.46 1.513
All 0.9051 9.996 1.221 0.8405 12.49 0.859 0.8749 12.68 1.238

Worse cases analysis Figure 4 and 5 show some segmentation results with
low performance. Basically, it can be observed that most of the segmentation
4
Here we use ‘could’ because we do not have corresponding testing results of our
baseline model where histogram matching data augmentation is not used.
8 J. Ma, Histogram Matching Augmentation

errors occur in the top or bottom of the heart because these regions usually have
low contrast and ambiguous boundaries. We find that the worse segmentation
results can also have reasonable shapes even for the severe over-segmentation
(e.g., Figure 4 the 2nd row ). Some LV segmentation results are significantly
smaller or larger than ground truth, which could motivate us to improve our
network by imposing size constrain, such as constraining the volume of network
outputs to be within a specified range.

(a) Image (b) Ground truth (c) Segmentation (d) Image (e) Ground truth (f) Segmentation

Fig. 5. Visual examples of worse cases in vendor C and D.

Limitation Although histogram matching is a good pre-processing technique


to eliminate intensity distribution differences, it should be noted that histogram
matching could cover specific characteristics of other MRI modalities, like for
example, LGE MRI. With the histogram changes, relevant informations about
scar tissue might be degraded. In the future, we need to verify the effects of
histogram matching data augmentation on multi-sequence cardiac MR datasets,
such as MyoPS [14].

4 Conclusion

One of the challenging problems of current segmentation CNNs is that the per-
formance would degrade when applying the trained model to a new dataset.
In this paper, we introduce histogram matching to augment the training cases
that have the similar intensity distributions to the new (unlabelled) dataset set,
which is very simple and can be a plug-and-play method to any segmentation
CNNs. Based on the quantitative results on the validation set, we believe that
our method can be a strong baseline.
Title Suppressed Due to Excessive Length 9

Acknowledgement
The authors of this paper declare that the segmentation method they imple-
mented for participation in the M&Ms challenge has not used any pre-trained
models nor additional MRI datasets other than those provided by the organizers.
We also thank the organizers for hosting the great challenge.

References
1. Bernard, O., Lalande, A., Zotti, C., Cervenansky, F., Yang, X., Heng, P.A., Cetin,
I., Lekadir, K., Camara, O., Ballester, M.A.G., et al.: Deep learning techniques for
automatic mri cardiac multi-structures segmentation and diagnosis: is the problem
solved? IEEE Transactions on Medical Imaging 37(11), 2514–2525 (2018)
2. Campello, V.M., Palomares, J.F.R., Guala, A., Marakas, M., Friedrich, M.,
Lekadir, K.: Multi-Centre, Multi-Vendor & Multi-Disease Cardiac Image Segmen-
tation Challenge (Mar 2020). https://doi.org/10.5281/zenodo.3715890
3. Chen, C., Qin, C., Qiu, H., Tarroni, G., Duan, J., Bai, W., Rueckert, D.: Deep
learning for cardiac image segmentation: A review. Frontiers in Cardiovascular
Medicine 7, 25 (2020)
4. Chen, C., Dou, Q., Jin, Y., Chen, H., Qin, J., Heng, P.A.: Robust multimodal brain
tumor segmentation via feature disentanglement and gated fusion. In: International
Conference on Medical Image Computing and Computer-Assisted Intervention. pp.
447–456. Springer (2019)
5. Dou, Q., de Castro, D.C., Kamnitsas, K., Glocker, B.: Domain generalization via
model-agnostic learning of semantic features. In: Advances in Neural Information
Processing Systems. pp. 6450–6461 (2019)
6. Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F.,
Marchand, M., Lempitsky, V.: Domain-adversarial training of neural networks. The
Journal of Machine Learning Research 17(1), 2096–2030 (2016)
7. Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.:
Imagenet-trained cnns are biased towards texture; increasing shape bias improves
accuracy and robustness. In: International Conference on Learning Representations
(2018)
8. Gonzalez, R.C., Woods, R.E., Eddins, S.L.: Digital image processing using MAT-
LAB. Pearson Education India (2004)
9. Isensee, F., Jäger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnu-net: a
self-configuring method for deep learning-based biomedical image segmentation.
Nature Methods (2020)
10. Li, H., Zhang, J., Menze, B.: Generalisable cardiac structure segmentation via
attentional and stacked image adaptation. arXiv preprint arXiv:2008.01216 (2020)
11. Mach, F., Baigent, C., Catapano, A.L., Koskinas, K.C., Casula, M., Badimon,
L., Chapman, M.J., De Backer, G.G., Delgado, V., Ference, B.A., et al.: 2019
esc/eas guidelines for the management of dyslipidaemias: lipid modification to
reduce cardiovascular risk: The task force for the management of dyslipidaemias of
the european society of cardiology (esc) and european atherosclerosis society (eas).
European Heart Journal 41(1), 111–188 (2020)
12. Tao, Q., Yan, W., Wang, Y., Paiman, E.H., Shamonin, D.P., Garg, P., Plein, S.,
Huang, L., Xia, L., Sramko, M., et al.: Deep learning–based method for fully auto-
matic quantification of left ventricle function from cine mr images: a multivendor,
multicenter study. Radiology 290(1), 81–88 (2019)
10 J. Ma, Histogram Matching Augmentation

13. Van der Walt, S., Schönberger, J.L., Nunez-Iglesias, J., Boulogne, F., Warner, J.D.,
Yager, N., Gouillart, E., Yu, T.: scikit-image: image processing in python. PeerJ
2, e453 (2014)
14. Zhuang, X., Li, L.: Multi-sequence CMR based myocardial pathology segmenta-
tion challenge (Mar 2020). https://doi.org/10.5281/zenodo.3715932, https://doi.
org/10.5281/zenodo.3715932

You might also like