0% found this document useful (0 votes)
5 views

3D_Segmentation_of_Necrotic_Lung_Lesions_in_CT_Images_Using_Self-Supervised_Contrastive_Learning

This document presents a novel approach for the 3D segmentation of necrotic lung lesions in CT images using self-supervised contrastive learning techniques. The proposed method incorporates two unique augmentation strategies to enhance model performance, particularly for underrepresented lesion types, and demonstrates significant improvements in segmentation accuracy compared to baseline models. The results indicate that the approach effectively reduces the reliance on labeled data while maintaining high performance across various lesion appearances.

Uploaded by

kaaviya4002bai25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

3D_Segmentation_of_Necrotic_Lung_Lesions_in_CT_Images_Using_Self-Supervised_Contrastive_Learning

This document presents a novel approach for the 3D segmentation of necrotic lung lesions in CT images using self-supervised contrastive learning techniques. The proposed method incorporates two unique augmentation strategies to enhance model performance, particularly for underrepresented lesion types, and demonstrates significant improvements in segmentation accuracy compared to baseline models. The results indicate that the approach effectively reduces the reliance on labeled data while maintaining high performance across various lesion appearances.

Uploaded by

kaaviya4002bai25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Received 12 January 2024, accepted 28 January 2024, date of publication 7 February 2024, date of current version 6 March 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3363637

3D Segmentation of Necrotic Lung Lesions


in CT Images Using Self-Supervised
Contrastive Learning
YIQIAO LIU , SARAH HALEK, RANDOLPH CRAWFORD, KEITH PERSSON,
MICHAL TOMASZEWSKI, SHUBING WANG, RICHARD BAUMGARTNER ,
JIANDA YUAN, GREGORY GOLDMACHER , AND ANTONG CHEN
Merck & Co., Inc., Rahway, NJ 07065, USA
Corresponding author: Antong Chen ([email protected])
This work involved human subjects or animals in its research. The authors confirm that all human/animal subject research procedures and
protocols are exempt from review board approval.

ABSTRACT Deep convolutional neural networks (CNN) are often trained on 2D annotations created by
radiologists following RECIST guidelines to segment lesions in 3D medical images. Three-dimensional
segmentation is conducted by segmenting each lesion slice-by-slice on the axial direction and stacking
the 2D segmentation masks into 3D. However, the performance of such models is inherently biased by
the appearance of most of the lesions in the training dataset. Herein we propose an approach to generate
accurate 3D segmentations of underrepresented necrotic lung lesions. Our proposed approach applies two
novel augmentation techniques for contrastive learning pretraining: dependency augmentation that captures
inter-slice dependencies, and distance transform-based mask-out augmentation imitating necrotic lesions.
In dependency augmentation, cosine similarity within RECIST bounding box is applied to construct positive
pairs from 2D image slices of the same lesion in the current 3D volume and across longitudinal scans.
We further compared contrastive learning architectures, Momentum Contrast (MoCo) and Bootstrap Your
Own Latent (BYOL), based upon two internal 3D testing sets, one with regular lung lesions and the other
with necrotic lung lesions, and a public 3D DeepLesion lung lesion testing set. MoCo with both proposed
augmentations demonstrated the best performance among all methods that were compared. Specifically,
it 1) improved Dice similarity coefficient (DSC) by 8.42% over baseline model trained from scratch and
2.40% over ImageNet pretrained model on the 3D necrotic lung lesion set; 2) achieved better segmentation
performance on necrotic lesions with 10% of labeled data for supervised fine-tuning compared with the
baseline model trained with all labels from scratch.

INDEX TERMS Deep learning, necrotic lung lesion, lesion segmentation, self-supervised learning, semantic
segmentation.

I. INTRODUCTION make unidimensional size measurements (typically manually


In oncology clinical trials, efficacy is assessed by measuring drawing an outline of the lesion boundary on a single slice).
the change in tumor burden over time, using the Response They also document all other lesions, which are followed
Evaluation Criteria in Solid Tumors (RECIST) [1]. Prior to qualitatively as ‘‘non-target’’ lesions. At each on-treatment
treatment, radiologists find all malignant lesions, choose a assessment timepoint, reviewers measure the target lesions,
subset to follow qualitatively, called ‘‘target’’ lesions, and compare their aggregate size to prior timepoints, assess the
non-target lesions for total disappearance or unequivocal
The associate editor coordinating the review of this manuscript and growth, and search for new lesions, then combine these
approving it for publication was Kumaradevan Punithakumar . assessments into an ‘‘overall response’’ for that timepoint.

2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
VOLUME 12, 2024 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 32859
Y. Liu et al.: 3D Segmentation of Necrotic Lung Lesions in CT Images

FIGURE 1. (a) Example bad segmentation results on necrotic lung lesions. (b) Contrastive learning with dependency
augmentation that captures inter-slice dependency and distance transform-based mask-out augmentation. (c) SiBA-Net
architecture. (d) Contrastive pretraining architecture derived from the core branch of SIBA-Net.

The sequence of overall responses for all timepoints dur- showing regular appearance in the training dataset, the trained
ing treatment is then used to determine endpoints such as models tend to exclude the hypodense necrotic regions from
progression-free survival and objective response rate. the segmentation.
RECIST-based endpoints are strong predictors of clin- Existing approaches for handling data imbalance issue
ical outcome in large cohorts, but do not correlate well include resampling of under- and over-represented classes in
with survival in small cohorts, so there is strong inter- supervised learning [8], [9] and resampling of samples from
est and active research on better imaging-based methods hard tail classes in contrastive learning [10]. Self-supervised
of evaluating efficacy to replace RECIST. For these more contrastive learning leverages large unlabeled datasets by
sophisticated approaches, such as extracting radiomic fea- contrasting augmented positive and/or negative image pairs
tures to assess changes in the tumor microenvironment [2] to learn robust features that are closely relevant to the image
and modeling of lesion growth kinetics [3], unidimensional domain, and the pretrained models can be tuned effectively
measurements of lesion size are insufficient, and accurate 3D with limited labels [11], [12], [13], [14]. Furthermore, con-
segmentation of lesions in CT images becomes crucial. trastive learning was found to be more robust in handling
Recently, deep convolutional neural networks (CNNs) under-represented classes than supervised ImageNet-based
trained on 2D RECIST image delineations have been applied pretraining [15].
to segment lesions in a slice-by-slice manner [4], [5], where Under a contrastive learning framework, three-dimensional
2D slices are stacked to generate 3D segmentations, thereby medical images and videos contain spatial and temporal con-
significantly reducing the labor consumption for 3D manual tinuity that could be learned as invariance and embedded the
delineation. RECIST analysis done in the course of clinical learning process. In 3D CT and MR images, Zeng et al. [16]
trials is useful for 3D segmentation of lesions because it used the spatial continuity of lesions to generate positive
leverages the work already done by human experts to locate pairs of neighboring slices and tracked lesions across multiple
lesions in 3D CT scans, and thus provide the starting point timepoints. Similar to the slice-wise transition in medi-
for the 3D segmentation that would enable methods such as cal images, adjacent frames in videos could carry useful
radiomics and lesions growth kinetics to eventually supersede continuity information. Feichtenhofer et al. [17] selected
RECIST. However, the performance of these segmentation multiple video clips within a one-minute timespan as positive
models often suffers on lung lesions that show necrosis [6], pairs and found improvements across various contrastive
[7] as illustrated in Fig. 1a. These lesions contain dying learning frameworks and downstream tasks. Since objects
tumor tissue in response to treatment, leading to hypodense could have substantial difference across spatial and temporal
cores. Since the training of the 2D CNN-based segmenta- dimensions, the spatial distance and time span may not be
tion models is often biased towards the majority of lesions the best for positive pair selection. Qian et al. [18] selected
32860 VOLUME 12, 2024
Y. Liu et al.: 3D Segmentation of Necrotic Lung Lesions in CT Images

FIGURE 2. Example distance transform-based mask-out in 7 z-slices of four lesion volumes. D is the distance from the current
image to the RECIST image on the axial direction. α is the distance transform threshold computed as a function of D using
Equation (3). The central area of the CT image that has distance transform value >α are masked out. As the images approach the
upper and lower end of the lesion, α gets larger, and the masked-out region is smaller.

two clips from the same video to generate positive pairs positive pairs for the deep learning models to learn invariant
with closer clips having higher probabilities to be selected as features in the images. For the pretraining, in addition to
positive pairs. Han et al. [19] used optical flow to help select the standard data augmentation methods, we utilize depen-
positive pairs among video clips. Peng et al. [20] proposed dency augmentation that captures inter-slice dependencies:
self-paced contrastive learning where the network started the slices of the lesion from z-dimension and longitudi-
with confident positive pairs and gradually learned from less nal scans at different timepoints are selected using cosine
confident ones, using confidence for positive pair selection similarity criteria to augment for positive pairs. This aug-
and adaptively change the confidence threshold as the model mentation creates strong and natural variations to the positive
converges. But self-paced learning is harder to converge and pair and enables the encoder to learn the robust, invariant,
selecting a good learning pace was challenging. and dependent features describing the lesion and its sur-
Artifacts have been introduced in the construction of posi- rounding region between slices and timepoints. In addition,
tive pairs to improve the model’s robustness in artifact cases based on the RECIST delineations, we introduce a distance
during contrastive learning. Xu et al. [21] perturbed the input transform-based mask-out augmentation to simulate necrotic
and feature space to simulate shadow artifacts in prostate lung lesions and include them in the contrastive learning.
ultrasound images to learn consistent feature representa- By pairing a regular-looking lesion and a necrotic lesion into
tions between normal and perturbed images. Neto et al. [22] a positive pair, the model learns the invariant boundary part
generated human face images with face masks to learn seman- of the lesion, and therefore the hypointense necrotic core
tic features robust to the presence of masks and therefore would not be excluded. Experiments were carried out on
improved the performance of face recognition. two private testing datasets, one for necrotic lung lesions
Herein we propose a novel approach for the robust and and one for lesions with regular appearance, and one pub-
accurate segmentation of necrotic lung lesions using con- lic testing dataset [23] of randomly selected lung lesions,
trastive learning. Given the abundance of unlabeled 2D CT all manually annotated in 3D by experts. We evaluated
image slices and the relatively limited number of labeled the proposed approach through a comprehensive ablation
2D RECIST images, we introduce a framework conduct- study, comparing different augmentation methods under
ing self-supervised contrastive learning based on RECIST two contrastive learning frameworks, MoCo [12], [13] and
delineations and unlabeled images, followed by supervised BYOL [14]. We also compared our proposed approach
fine-tuning using 2D RECIST images and delineations. with two state-of-the-art CNN-based segmentation mod-
Contrastive learning relies on strong augmentation between els, HRNet [24] and nnU-Net [25]. HRNet learns strong

VOLUME 12, 2024 32861


Y. Liu et al.: 3D Segmentation of Necrotic Lung Lesions in CT Images

FIGURE 3. Example negative pairs of image x are in the orange box. Candidates for inter-slice positive
pairs (from longitudinal CT scans and from the same CT scan) of resized cropped xRECIST are in the blue
boxes. The boxplots of cosine similarity in all candidate positive pairs for xRECIST are on the right, with pink
for longitudinal scans and blue for same CT scan. The cosine similarity values in positive pairs from the
same CT scan are higher than the positive pairs from longitudinal scans, indicating the lesion has more
variation between longitudinal scans than the z-direction variation in a particular scan. Overall, cosine
similarity is a robust and quantifiable metric for positive pair construction.

high-resolution representations by connecting high-and-low B. MODEL FRAMEWORKS


resolution convolutions in parallel, where there are repeated We assess two self-supervised contrastive learning frame-
multi-scale fusions across parallel convolutions [26]. HRNet works, MoCo [12], [13] and BYOL [14], with a VGG-16
has demonstrated great segmentation and detection per- backbone from the SiBA-Net architecture [5] as shown in
formance in real-world pictures such as Cityscapes and Fig. 1b-d. Under our MoCo framework, in addition to the
PASCAL dataset [24] as well as in medical images [27], queue to store the embeddings of images in the original
[28], [29]. The nnU-Net utilizes the U-Net architecture with MoCo, we include two additional data fields which are the
automatic configuration of preprocessing, network architec- lesion identifier and the resized cropped images for each
ture, training and post-processing for any new task which element of the queue to determine positive and negative pairs
known to be one of the state-of-the-art approaches for various online. The lesion identifier consists of the trial number, site
medical image segmentation tasks. Finally, we demonstrated number, patient number, and lesion ID, which together form
the potential benefit of reducing the demand for labeled data a unique ID for each lesion. Resized cropped images xRECIST
in the supervised fine-tuning step following the pretrain- are generated by cropping the lesion images x to the tight
ing step. The results on the three test datasets showed that bounding boxes around the RECIST delineations, followed
the proposed approach led to a substantial improvement in by resizing the cropped images to 112 × 112 pixels. Details of
segmentation accuracy on 3D necrotic lung lesions while positive pair construction are in the next section. As multiple
maintaining performance on other lung lesions with regular positive pairs exist for one query image, we modified the
appearance. As far as we know, there is no previous study InfoNCE loss as shown in (1):
focusing on using contrastive learning approach to improve k
exp(zθ · zε+ /τ )
P
the segmentation of necrotic lung lesions. L = −log P (1)
k k
exp(zθ · zε+ /τ + zθ · zε− /τ )
II. IMAGE ANALYSIS METHODS k k
where zθ is the query embedding, zε+ and zε− are the embed-
A. PREPROCESSING
dings from positive and negative pairs, respectively. τ = 0.05,
The lesion images were cropped to sizes of 3r × 3r × 2r in and ε = mε + (1 − m)θ with m = 0.999.
x, y, and z dimensions with the RECIST annotation at the For each input image in BYOL, we randomly select one
center, where r is the radius of the lesion. We introduced positive pair online within the current mini-batch. We use
random shifting in x- and y-dimensions during the cropping original negative cosine similarity loss for BYOL as shown
process to prevent centering effect of the lesion which could in (2).
bias the model. Positional bias occurs when an object con- k
sistently appears in the same location in the images, e.g. the qθ (zθ ) · zε+
L = −log k
(2)
center, causing the deep learning model to learn and predict ||qθ (zθ )||2 · ||zε+ ||2
based on that specific location. To mitigate the bias, random
shifting is employed in the training images to train the model C. AUGMENTATION METHODS
robustly, such that it would be able to accurately segment In addition to conventional data augmentation methods,
the object of interest regardless of its location [30]. The 3D e.g. rotation, random cropping, and intensity perturbation,
lesion volumes were resampled to resolution of 0.75 mm × we introduce two novel augmentation methods.
0.75 mm × 0.75 mm, and then each 2D slice was resize to First, multiple positive pairs are identified for each slice
112 × 112 pixels. using a rule that captures inter-slice dependency: for a given

32862 VOLUME 12, 2024


Y. Liu et al.: 3D Segmentation of Necrotic Lung Lesions in CT Images

FIGURE 4. Training curve of contrastive learning with cosine similarity threshold 0.1 - 0.9 for MoCo
and BYOL.

TABLE 1. Lesion segmentation performance for MoCo and BYOL with cosine similarity threshold of 0.1-0.9 on an independent 2D necrotic evaluation set
for selection of optimal cosine similarity threshold.

TABLE 2. Ablation study to illustrate the impact of dependency augmentation that captures inter-slice dependency (Aug1) and distance transform-based
mask-out (Aug2) on three test sets. Means are reported with standard deviations in the brackets. Best results on each dataset are in bold. ∗ indicates
statistical significance compared to results of MoCo with two augmentation methods.

slice x, the slices from the same lesion on the current 3D scan same lesion but the cosine similarity lower than the threshold.
and across all longitudinal scans are considered for the con- 1
xRECIST , xRECIST
2
F
struction of positive pairs. Specifically, if the cosine similarity Cosine Similarity = 1 2
(3)
calculated by (3) between the two resized cropped images xRECIST 2
· xRECIST 2
1
xRECIST 2
and xRECIST is greater than a threshold T , the two Second, given one unlabeled CT image and the correspond-
images form a positive pair. Negative pairs are constructed ing RECIST delineation of the lesion, we perform distance
with images from different lesions, as well as those from the transform inside the RECIST mask, normalize the value to

VOLUME 12, 2024 32863


Y. Liu et al.: 3D Segmentation of Necrotic Lung Lesions in CT Images

FIGURE 5. Visualization of segmentation result on examples of 7 necrotic (left) and 2 regular (right) lung lesions from Baseline
model, ImageNet pretrained model, MoCo with two augmentation methods, BYOL with two augmentation methods, HRNet, and
nnU-Net.

TABLE 3. Comparison with state-of-the-art segmentation models HRNet and nnU-Net on three test sets. Means are reported with standard deviations in
the brackets. Best results on each dataset are in bold. ∗ indicates statistical significance compared to results of MoCo with two augmentation methods.

the range of [0, 1] with 0 at the boundary and 1 at the center retrospective study compliant with the Health Insurance
of the lesion, and mask out the central area of the lesion on Portability and Accountability Act. We collected 1,683
the CT image that has distance transform value higher than lung lesions from 1,121 participating subjects in multiple
the threshold α. Threshold α is calculated by (4). multi-center trials conducted during 2013-2021. All scans

|D|
 were deidentified and collected centrally for independent
α = min[0.5 + min , 0.4 , 0.9] (4) review. For each lung lesion, there are multiple timepoints,
r
which results in 7,024 3D volumes, or 115,620 2D CT slices.
where D is the distance between the current slice and the For each lesion, RECIST slice with 2D delineation by a
RECIST slice, and r is the radius of the lesion. α is kept in the radiologist and non-RECIST slices without delineation are
range of [0.5, 0.9]. As the slices approach the upper and lower utilized. For self-supervised pretraining, we used 115,620
end of the lesion, α gets larger, and the masked-out region CT slices from all lung lesions. For supervised fine tuning,
is smaller. This cropping-based augmentation is applied at a 5,309 and 395 internal RECIST slices with 2D delineations
probability of p = 0.6. Example images of this augmentation were used for training and validation, respectively. We had
are shown in Fig. 2. three test sets: first was 250 internal 3D regular lung lesions,
second was 140 3D lung lesions randomly selected from
III. EXPERIMENTAL METHODS public DeepLesion dataset [23], and third was 52 inter-
A. MATERIALS nal 3D necrotic lung lesions. All internal test data were
Institutional Review Board approval was obtained from from different trials compared to training and validation
multiple sites, and informed consent was waived for this dataset. Three imaging experts distributed the three test
32864 VOLUME 12, 2024
Y. Liu et al.: 3D Segmentation of Necrotic Lung Lesions in CT Images

sets and conducted annotations with 3D Slicer software


independently.

B. PRETRAINING AND FINE-TUNING


We used a queue size of 30,000, mini-batch size of 300,
and number of epochs of 150 for MoCo. We used mini-
batch size of 200 and number of epochs of 300 for BYOL.
Adam optimizer with initial learning rate of 10−3 and cosine
learning rate decay was applied for both MoCo and BYOL.
Besides the proposed augmentations, standard augmentations
in contrastive learning including random cropping, brightness
and contrast change, horizontal flipping, random rotation in
(0◦ , 90◦ , 180◦ , 270◦ ), and addition of random noise were
FIGURE 6. Comparison of DSC on 3D necrotic testing set between
used for all MoCo and BYOL models. For supervised fine Baseline, ImageNet pretraining, MoCo with two augmentations, and BYOL
tuning, we fine-tuned the entire CNN without freezing any with two augmentations. We vary the number of labels at 1%, 10%, 20%,
40%, 60%, 80%, and 100%.
layers. We used mini-batch size of 48, number of epochs of
100, and Adam optimizer with learning rate of 10−4 . We used
random cropping for data augmentation in supervised train- changes in time domain and z-direction varies from lesion
ing. Early stopping was applied once the validation loss to lesion, cosine similarity is a more robust metric to capture
stopped decreasing for 10 epochs. No post-processing was the similarity compared to the time interval in [17] and spatial
performed. distances in [16]. Besides, using the RECIST cropped image
allows the pretrained network to focus on the lesions rather
C. STATISTICAL ANALYSIS than the surroundings, which provides an additional layer of
Segmentation performance was assessed using Dice similar- guidance.
ity coefficient (DSC) [31] and Hausdorff distance (HD) [32],
which are metrics used regularly for assessing the accuracy of B. OPTIMIZATION FOR COSINE SIMILARITY THRESHOLD
automatic segmentations. DSC focuses on the volume over- MoCo has negative pairs and BYOL does not, and this differ-
lap, the value varies between 0 and 1, with 1 being perfectly ence results in different optimal cosine similarity thresholds.
overlapping. HD focuses on assessing the largest distance We varied the cosine similarity in the range of 0.1-0.9 with
between the segmentation surfaces which shows the distance a step size of 0.1. In Fig. 4, we show the training curves for
at the region with the worst segmentation quality. contrastive learning with different cosine similarity thresh-
For each test set, we repeated the supervised tuning 3 times old for both MoCo and BYOL. The training loss of MoCo
to evaluate consistency at model level. Differences between increases as we increase the cosine similarity threshold. Cor-
our proposed MoCo model with two augmentation methods respondingly, cosine similarity threshold of 0.9 introduces
versus other models were assessed using paired t-test with the most similar negative pairs to the contrastive learning
significance level set as p < 0.05. framework. The training loss of BYOL with cosine similarity
threshold of 0.1 - 0.5 converged to a higher loss compared to
IV. RESULTS cosine similarity threshold of 0.6 - 0.9. And among 0.6 - 0.9,
A. VALIDITY OF COSINE SIMILARITY threshold 0.7 has the lowest training loss.
The validity of cosine similarity can be justified from three An independent evaluation set (outside of all other training,
aspects: computational robustness, computational efficiency, validation, and testing sets) consisting of 2D RECIST slices
and capability to describe image-level similarities. Firstly, and manual delineations for 38 necrotic lung lesions was
it is robust to different CT scanners and contrast phases used to optimize the cosine similarity threshold. As shown
due to the intensity normalization process. Secondly, com- in Table 1, although cosine similarity of 0.9 in MoCo had the
putational efficiency comes with torch GPU computation. highest training loss, it had the best DSC on the evaluation set,
Thirdly, we show example candidate positive pairs and nega- manifesting that having challenging negative pairs could ben-
tive pairs for one image along with the cosine similarities in efit the contrastive learning of MoCo. Whereas BYOL didn’t
Fig. 3. The positive pairs are determined by two criteria: 1) the use negative pairs, and cosine similarity threshold of 0.7 had
two lesion images are from the same lesion across longitudi- the best mean DSC and mean HD. For both BYOL and MoCo,
nal scans in any z-slices; 2) the RECIST cropped image have the segmentation dropped significantly with cosine similarity
cosine similarity greater than threshold T . Negative pairs are threshold ≤ 0.4.
images from different lesions. The lesion image in temporal
positive pairs have deformation due to treatment response. C. ABLATION STUDY
In inter-slice positive pairs, the lesions demonstrated nat- We conducted ablation study to evaluate the effect of the
ural shape changes along the z-direction. Since the shape two proposed augmentation methods and also compared with

VOLUME 12, 2024 32865


Y. Liu et al.: 3D Segmentation of Necrotic Lung Lesions in CT Images

FIGURE 7. Volume rendering visualization of over-segmentation towards the superior and inferior ends of 3D lesion
volumes. Ground truth rendering is shown in cyan, our MoCo pretrained model, HRNet, and nnU-Net models are
rendered in yellow. The white arrows point to the over segmented regions.

training from scratch (Baseline) and tuning with ImageNet and show the DSC results on the 3D necrotic testing set for
pretrained model (ImageNet). The DSC and HD on the three Baseline model, ImageNet pretrained model, MoCo with two
testing sets are shown in Table 2. Our dependency augmen- augmentations, and BYOL with two augmentations in Fig. 6.
tation and mask-out augmentation are both beneficial for MoCo with two augmentations and 10% labels reached better
the improvement of necrotic lesion segmentation. The best performance than Baseline model trained with all labels.
model is MoCo with both augmentations, bringing 8.42% Similarly, MoCo with two augmentations and 20% labels
and 2.40% DSC improvement in 3D necrotic testing set com- reached comparable performance as ImageNet pretrained
pared to Baseline training from scratch and ImageNet transfer model tuned with all labels. BYOL with two augmentations
learning, respectively. For regular and DeepLesion testing shared the same trend as MoCo but performed slightly worse
set, MoCo with two augmentations also performed the best. than MoCo consistently.
Overall, MoCo performed better than BYOL. The original
BYOL model performed even worse than the Baseline model
V. DISCUSSION
trained from scratch. With our augmentation methods, the
The proposed MoCo model with two novel augmentations
performance of BYOL substantially improved on all testing
outperformed other models with pretraining and trained
sets, demonstrating the effectiveness of our augmentation
from scratch. Cosine similarity of 0.9 in MoCo intro-
methods in BYOL’s contrastive learning framework with
duced challenging negative pairs and worked the best in
only positive pairs. Example necrotic and regular lesion seg-
practice; cosine similarity of 0.7 in BYOL introduced
mentation results from Baseline model, ImageNet pretrained
the optimal amount of variation in positive pairs com-
model, MoCo with two augmentations, and BYOL with two
pared to other cosine similarity thresholds. Without negative
augmentations are shown in Fig. 5.
pairs, strong augmentations to introduce more variations
in positive pairs could be important for the success of
D. COMPARISON WITH STATE-OF-THE-ART MODELS BYOL.
We compared two state-of-the-art lesion segmentation mod- Since our data for supervised tuning are the labeled
els, HRNet and nnU-Net, with our proposed MoCo with two RECIST slices, while the testing images are from the entire
augmentations. HRNet and nnU-Net were trained with the 3D lesion volume, we need the model to not only learn from
same supervised training data and we repeated the training the labeled RECIST images, but also adapt to the adjacent
3 times to evaluate consistency at model level. Results are slices where lesions appear to be smaller. Pretraining of
shown in Table 3. Our proposed model has the highest DSC SiBA-Net model with our proposed augmentation methods
on all three test sets, and the DSC on the necrotic lung allows better learning of a lesion’s dependent features in
lesions were higher than those of the other two models by z-dimension and in longitudinal scans, making the model
a significant margin. more adaptive to the lesion size change between slices.
The RECIST-based mask-out enables stable segmentation
E. LABELED DATA EFFICIENCY STUDY performance with the presence of a necrotic core, making
We varied the amount of labeled data used in supervised the model more robust. In addition, our proposed SiBA-Net
fine-tuning at 1%, 10%, 20%, 40%, 60%, 80%, and 100%, model, HRNet, and nnU-Net have 1,815,264, 65,846,401,

32866 VOLUME 12, 2024


Y. Liu et al.: 3D Segmentation of Necrotic Lung Lesions in CT Images

and 7,930,849 parameters, respectively. Thus, HRNet and [5] B. Zhou, R. Crawford, B. Dogdas, G. Goldmacher, and A. Chen, ‘‘A
nnU-Net are easier to overfit than the proposed model, progressively-trained scale-invariant and boundary-aware deep neural net-
work for the automatic 3D segmentation of lung lesions,’’ in Proc.
resulting in over segmentation in the superior and inferior IEEE Winter Conf. Appl. Comput. Vis. (WACV), Jan. 2019, pp. 1–10, doi:
ends of 3D lesion volumes as shown in Fig. 7. 10.1109/WACV.2019.00008.
[6] B. Sohyun, J. Julip, H. Helen, O. Hoonil, and K. Bong-Seog, ‘‘Lung tumor
segmentation using coupling-net with shape-focused prior on chest CT
VI. CONCLUSION images of non-small cell lung cancer patients,’’ Proc. SPIE, vol. 1117,
In this work, we proposed two effective augmentation meth- Mar. 2020, Art. no. 113142, doi: 10.1117/12.2551280.
[7] L. Li, X. Zhao, W. Lu, and S. Tan, ‘‘Deep learning for variational mul-
ods, dependency augmentation to capture inter-slice depen- timodality tumor segmentation in PET/CT,’’ Neurocomputing, vol. 392,
dency, and distance transform-based mask-out, under the pp. 277–295, Jun. 2020, doi: 10.1016/j.neucom.2018.10.099.
contrastive learning framework to improve the segmentation [8] X. Hu, Y. Jiang, K. Tang, J. Chen, C. Miao, and H. Zhang, ‘‘Learning to
segment the tail,’’ 2020, arXiv:2004.00900.
performance on necrotic lung lesions in CT images without
[9] M. Schmidt-Mengin, T. Soulier, M. Hamzaoui, A. Yazdan-Panah,
sacrificing performance on regular lung lesions. Experiments B. Bodini, N. Ayache, B. Stankoff, and O. Colliot, ‘‘Online hard example
demonstrated that the proposed MoCo method with the two mining vs. Fixed oversampling strategy for segmentation of new multiple
augmentations led to the best performance on 3 test datasets sclerosis lesions from longitudinal FLAIR MRI,’’ Frontiers Neurosci.,
vol. 16, Nov. 2022, doi: 10.3389/fnins.2022.1004050.
including one dedicated to necrotic lung lesions. [10] Z. Jiang, T. Chen, T. Chen, and Z. Wang, ‘‘Improving contrastive
Our study has some limitations. Firstly, our pretrained learning on imbalanced seed data via open-world sampling,’’ 2021,
model also has some over-segmentation towards the supe- arXiv:2111.01004.
[11] R. Krishnan, P. Rajpurkar, and E. Topol, ‘‘Self-supervised learning
rior and inferior ends of 3D lesion volumes since only 2D in medicine and healthcare,’’ Nat. Biomed. Eng., vol. 6, no. 12,
annotated RECIST slices were used for fine-tuning. Using pp. 1346–1352, 2022, doi: 10.1038/s41551-022-00914-1.
3D delineated lesions for supervised fine-tuning in the future [12] X. Chen, H. Fan, R. Girshick, and K. He, ‘‘Improved baselines with
momentum contrastive learning,’’ 2020, arXiv:2003.04297.
may resolve this issue. Secondly, the online calculation of [13] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, ‘‘Momentum contrast for
cosine similarity between two images takes additional com- unsupervised visual representation learning,’’ 2019, arXiv:1911.05722.
putation time. As a result, pretraining our proposed MoCo [14] J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. H. Richemond, E. Buchatskaya,
C. Doersch, B. Avila Pires, Z. Daniel Guo, M. Gheshlaghi Azar, B. Piot,
model takes about ×2 the time of the original MoCo model, K. Kavukcuoglu, R. Munos, and M. Valko, ‘‘Bootstrap your own latent: A
but this is a one-time effort. In the future, other similar- new approach to self-supervised learning,’’ 2020, arXiv:2006.07733.
ity measurements such as mutual information and structural [15] H. Liu, J. Z. HaoChen, A. Gaidon, and T. Ma, ‘‘Self-supervised learning is
more robust to dataset imbalance,’’ 2021, arXiv:2110.05025.
similarity index measure (SSIM) may be applied instead [16] D. Zeng, Y. Wu, X. Hu, X. Xu, H. Yuan, M. Huang, J. Zhuang, J. Hu,
of cosine similarity. But a fast and PyTorch-based imple- and Y. Shi, ‘‘Positional contrastive learning for volumetric medical image
mentation is necessary. Thirdly, due to clinical demand, segmentation,’’ 2021, arXiv:2106.09157.
[17] C. Feichtenhofer, H. Fan, B. Xiong, R. Girshick, and K. He, ‘‘A large-scale
we only looked at necrotic lung lesions. In the future, we plan study on unsupervised spatiotemporal representation learning,’’ 2021,
to extend the approach into other under-represented lesion arXiv:2104.14558.
subtypes in other organs. [18] R. Qian, T. Meng, B. Gong, M.-H. Yang, H. Wang, S. Belongie, and
Y. Cui, ‘‘Spatiotemporal contrastive video representation learning,’’ 2020,
With our proposed approach for improved segmentation of arXiv:2008.03800.
the necrotic lung lesions, we are able to measure the change [19] T. Han, W. Xie, and A. Zisserman, ‘‘Self-supervised co-training for video
of features such as necrotic volume and the percentage of representation learning,’’ 2020, arXiv:2010.09709.
[20] J. Peng, P. Wang, C. Desrosiers, and M. Pedersoli, ‘‘Self-paced contrastive
necrotic region in the entire lesion volume. These features learning for semi-supervised medical image segmentation with meta-
could be used in combination with the regular radiomics labels,’’ 2021, arXiv:2107.13741.
features for the prediction of the patient’s response to [21] X. Xu, T. Sanford, B. Turkbey, S. Xu, B. J. Wood, and P. Yan, ‘‘Shadow-
consistent semi-supervised learning for prostate ultrasound segmentation,’’
treatments. IEEE Trans. Med. Imag., vol. 41, no. 6, pp. 1331–1345, Jun. 2022, doi:
10.1109/TMI.2021.3139999.
REFERENCES [22] P. C. Neto, F. Boutros, J. R. Pinto, N. Damer, A. F. Sequeira, and
J. S. Cardoso, ‘‘FocusFace: Multi-task contrastive learning for masked face
[1] E. A. Eisenhauer, ‘‘New response evaluation criteria in solid tumours: recognition,’’ 2021, arXiv:2110.14940.
Revised RECIST guideline (version 1.1),’’ Eur. J. Cancer, vol. 45, no. 2, [23] K. Yan, X. Wang, L. Lu, and R. M. Summers, ‘‘DeepLesion: Automated
pp. 228–247, Jan. 2009, doi: 10.1016/j.ejca.2008.10.026. mining of large-scale lesion annotations and universal lesion detection with
[2] A. Chen, J. Saouaf, B. Zhou, R. Crawford, J. Yuan, J. Ma, R. Baumgartner, deep learning,’’ J. Med. Imag., vol. 5, no. 3, Jul. 2018, Art. no. 036501, doi:
S. Wang, and G. Goldmacher, ‘‘A deep learning-facilitated radiomics 10.1117/1.jmi.5.3.036501.
solution for the prediction of lung lesion shrinkage in non-small cell [24] K. Sun, Y. Zhao, B. Jiang, T. Cheng, B. Xiao, D. Liu, Y. Mu, X. Wang,
lung cancer trials,’’ in Proc. IEEE 17th Int. Symp. Biomed. Imag. (ISBI), W. Liu, and J. Wang, ‘‘High-resolution representations for labeling pixels
Apr. 2020, pp. 678–682, doi: 10.1109/ISBI45749.2020.9098561. and regions,’’ 2019, arXiv:1904.04514.
[3] M. L. Maitland, J. Wilkerson, S. Karovic, B. Zhao, J. Flynn, M. Zhou, [25] F. Isensee, P. F. Jaeger, S. A. A. Kohl, J. Petersen, and K. H. Maier-Hein,
P. Hilden, F. S. Ahmed, L. Dercle, C. S. Moskowitz, Y. Tang, ‘‘NnU-Net: A self-configuring method for deep learning-based biomed-
D. E. Connors, S. J. Adam, G. Kelloff, M. Gonen, T. Fojo, L. H. Schwartz, ical image segmentation,’’ Nature Methods, vol. 18, no. 2, pp. 203–211,
and G. R. Oxnard, ‘‘Enhanced detection of treatment effects on metastatic Feb. 2021, doi: 10.1038/s41592-020-01008-z.
colorectal cancer with volumetric CT measurements for tumor burden [26] J. Wang, ‘‘Deep high-resolution representation learning for visual
growth rate evaluation,’’ Clin. Cancer Res., vol. 26, no. 24, pp. 6464–6474, recognition,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 43,
Dec. 2020, doi: 10.1158/1078-0432.ccr-20-1493. no. 10, pp. 3349–3364, Oct. 2021, doi: 10.1109/tpami.2020.
[4] J. Cai, Y. Tang, L. Lu, A. P. Harrison, K. Yan, J. Xiao, L. Yang, and 2983686.
R. M. Summers, ‘‘Accurate weakly-supervised deep lesion segmentation [27] Y. Tang, J. Cai, K. Yan, L. Huang, G. Xie, J. Xiao, J. Lu, G. Lin, and L. Lu,
using large-scale clinical annotations: Slice-propagated 3D mask genera- ‘‘Weakly-supervised universal lesion segmentation with regional level set
tion from 2D RECIST,’’ 2018, arXiv:1807.01172. loss,’’ 2021, arXiv:2105.01218.

VOLUME 12, 2024 32867


Y. Liu et al.: 3D Segmentation of Necrotic Lung Lesions in CT Images

[28] S. Wu, G. Hong, A. Xu, H. Zeng, X. Chen, Y. Wang, Y. Luo, P. Wu, KEITH PERSSON received the B.S. degree
C. Liu, N. Jiang, Q. Dang, C. Yang, B. Liu, R. Shen, Z. Chen, in computer science from Bellevue University,
C. Liao, Z. Lin, J. Wang, and T. Lin, ‘‘Artificial intelligence-based and the A.S. degree in occupational therapy
model for lymph node metastases detection on whole slide images in from the Quinsigamond Community College.
bladder cancer: A retrospective, multicentre, diagnostic study,’’ Lancet From 2001 to 2021, he was involved in var-
Oncol., vol. 24, no. 4, pp. 360–370, Apr. 2023, doi: 10.1016/s1470- ious research, management, and director level
2045(23)00061-x. roles with Parexel International Corporation.
[29] Y. Li, B. Zou, and Q. Liu, ‘‘A deep attention network via high-resolution
From 2021 to 2023, he was a Clinical Imaging
representation for liver and liver tumor segmentation,’’ Biocybernet-
Scientist with MSD.
ics Biomed. Eng., vol. 41, no. 4, pp. 1518–1532, Oct. 2021, doi:
10.1016/j.bbe.2021.08.010.
[30] C. Shorten and T. M. Khoshgoftaar, ‘‘A survey on image data augmentation
for deep learning,’’ J. Big Data, vol. 6, no. 1, p. 60, Jul. 2019, doi:
10.1186/s40537-019-0197-0.
[31] K. H. Zou, ‘‘Statistical validation of image segmentation quality based on
a spatial overlap index1: Scientific reports,’’ Acad. Radiol., vol. 11, no. 2,
pp. 178–189, 2004, doi: 10.1016/s1076-6332(03)00671-8. MICHAL TOMASZEWSKI received the M.Phys.
[32] D. P. Huttenlocher, G. A. Klanderman, and W. J. Rucklidge, ‘‘Compar- degree from the University of Edinburgh, U.K., in
ing images using the Hausdorff distance,’’ IEEE Trans. Pattern Anal. 2014, and the Ph.D. degree in physics and can-
Mach. Intell., vol. 15, no. 9, pp. 850–863, 1993, doi: 10.1109/34.232073. cer research from the University of Cambridge,
U.K., in 2018, where he developed a novel imaging
method for tumor vasculature quantification. He
is currently an Associate Principal Scientist with
the Translational Imaging Department, Merck &
YIQIAO LIU received the B.S. degree in Co., Inc., where he is responsible for development
biomedical engineering from Sichuan University, and application of magnetic resonance imaging
Chengdu, Sichuan, China, in 2014, and the Ph.D. methods used in drug discovery and development, primarily in the fields of
degree in biomedical engineering from Case West- neuroscience and metabolism, as well as MRI research support of clinical
ern Reserve University, Cleveland, OH, USA, trials.
in 2021. She joined Merck & Co., Inc., in 2022.
She is currently a Data Scientist with Data Sci-
ence and Scientific Informatics, Merck Research
Laboratory (MRL) IT, Merck & Co., Inc.
SHUBING WANG received the M.S. degree in
applied mathematics from the University of Texas
at Austin, Austin, TX, USA, in 2003, and the
Ph.D. degree in statistics from the University of
Wisconsin–Madison, Madison, WI, USA, in 2007.
He joined Merck & Co., Inc., in 2007. He is
currently a Senior Principal Scientist with the Bio-
metrics Research Department, Biostatistics and
SARAH HALEK received the B.S. degree in Research Decision Sciences (BR-BARDS), Merck
psychology with minor in information systems & Co., Inc.
from Drexel University, Philadelphia, PA, USA,
in 2001. She was with ICON Medical Imaging,
as the Director and a Technical Management,
from 2005 to 2021. She is currently a Principal
Scientist with Merck & Co., Inc.
RICHARD BAUMGARTNER received the B.S.
degree from the Slovak University of Technol-
ogy in Bratislava, in 1992, and the Ph.D. degree
in electrical engineering from the University
of Technology Vienna, Austria. He is currently
the Senior Director of the Biometrics Research
Department, Biostatistics and Research Decision
Sciences (BR-BARDS), Merck & Co., Inc. While
at Merck& Co., Inc., he has been supporting early
clinical and preclinical studies with imaging com-
RANDOLPH CRAWFORD received the B.S. ponent, including functional magnetic resonance imaging (fMRI), dynamic
degree in zoology from Michigan State Univer- contrast-enhanced MRI (DCE-MRI), and positron emission tomography
sity, Lansing, MI, USA, in 1982, and the M.S. (PET) imaging for neuroscience, inflammation, and cardiovascular thera-
degree in computer science from Johns Hopkins peutic areas. He is also involved in several projects in AI and machine
University, Baltimore, MD, USA, in 1990. He was learning. Previously, he was an Associate Research Officer with the Institute
with Merck & Co., Inc., for 17 years in image for Biodiagnostics, National Research Council Canada, Winnipeg, Canada,
processing. He has 30 years of experience in AI where he pioneered development of methods for exploratory analysis of
and machine learning and ten years of experience fMRI. At the Institute for Biodiagnostics, he also involved on metabolomic
in high performance computing. applications to develop diagnostic biomarkers for prediction of pathogenic
fungi and breast cancer.

32868 VOLUME 12, 2024


Y. Liu et al.: 3D Segmentation of Necrotic Lung Lesions in CT Images

JIANDA YUAN received the M.D. and Ph.D. Research and the Head of Clinical Imaging and Pathology. With his team
degrees from the Fudan Medical College, of radiologists, pathologists, and research scientists, he oversees endpoint
Shanghai, China, in 1993 and 1998, respectively. assessments in ∼250 clinical trials in oncology, neuroscience, immunology,
He is currently the Senior Medical Director of cardiovascular, infectious, and metabolic diseases. He also leads teams con-
the Head and Neck Product Development Team, ducting research in radiomics, tumor growth kinetics, novel response criteria,
Late Oncology Development Department, Merck and other innovative approaches to clinical trial imaging. He participates
Research Laboratory, Merck & Co., Inc. He over- has held leadership positions with QIBA, PINTaD, CDISC, the RECIST
sees several Merck sponsored Keytruda clinical Working Group, Project Data Sphere, and other industry and academic
trials and external academic collaborative clinical collaborations.
studies. He collaborates internally and externally
scientific experts to develop a robust portfolio of research to answer
critical questions about mechanisms of cancer immunotherapy response
and resistance. Before he joined Merck in February 2016, he established
and led the translational biomarker research with the Ludwig Center
for Cancer Immunotherapy, Memorial Sloan Kettering Cancer Center,
from 2002 to 2016. His research interests include translational medicine and
biomarker discovery for immune checkpoint blockade immunotherapy with
approximately 80 peer-reviewed articles, including publication in Science,
NEJM, Nature Medicine, Nature Immunology, PNAS, Blood, Journal of
Immunology, Clinical Cancer Research, and Journal for Immunotherapy of
Cancer. He served as a member for the Steering Committee for the CRI-CIC,
from 2006 to 2011. He is a member of SITC, COHAN, AACR, and ASCO.
He is the Group Chair of the SITC Biomarker Task Force and the Chair of the
SITC Tumor Mutational Burden Subcommittee. He is an Associate Editor
of the Journal for Immunotherapy of Cancer.

GREGORY GOLDMACHER received the M.D. ANTONG CHEN received the B.S. degree from
and Ph.D. degrees from the UT Southwestern Xi’an Jiaotong University, Xi’an, Shanxi, China,
Medical Center, Dallas, TX, USA, and the M.B.A. in 2003, the M.S. degree from the Rose-Hulman
degree from Temple University, Philadelphia, PA, Institute of Technology, Terre Haute, IN, USA,
USA. His clinical training was in diagnostic radi- in 2005, and the Ph.D. degree from Vanderbilt
ology. Prior to Merck & Co., Inc., he was a Senior University, Nashville, TN, USA, in 2012. He is
Medical Director and the Head of Oncology Imag- currently the Director of the Data Science and Sci-
ing at ICON Plc., and before that in academia. entific Informatics, Merck Research Laboratory
He has been with Merck & Co., Inc., since 2015, (MRL) IT, Merck & Co., Inc.
and he is currently an Associate VP of Clinical

VOLUME 12, 2024 32869

You might also like