haen1
haen1
A BSTRACT
Content-based image retrieval (CBIR) with self-supervised learning (SSL) accel-
erates clinicians’ interpretation of similar images without manual annotations.
arXiv:2311.11014v1 [cs.CV] 18 Nov 2023
1 I NTRODUCTION
Radiologists’ interpretation of lesions (nodules, tumor-like, and surface-like) requires careful anal-
ysis. Lesion images employ a wide variety of sizes, shapes (e.g., elliptical, irregular), colors (e.g.,
grayscale), and textures (e.g., convex, nodule, distortion) (Hofman & Hicks (2016)), which makes
manual image retrieval or content-based image retrieval (CBIR) with annotations time-consuming
and at risk of errors. Annotating lesions is expensive because of intra-class variations: the same
lesion type may not be visually similar, while different lesion types may appear similarly once
images are taken at similar stages of disease progression. In this work, we first develop the SSL
method for lesion feature extraction that is prepared for image retrieval and classification tasks.
The overall pipeline is shown in Figure 1. Second, we developed a web-based open-source DI-
COM (Digital Imaging and Communications in Medicine) application for Radiology image analy-
sis and similar lesion image retrieval, shown in Figure 2. Easy Python installation can be found at
https://github.com/openhcimed/flask_search.
Figure 1: Pipeline of the front-end and back-end framework and downstream tasks.
1
Published as a Tiny Paper at ICLR 2023
Figure 2: The front-end UI can load multiple series of DICOM files. It offers a variety of function-
alities such as annotations, size measures, contrasts, navigating image stacks, etc.
3 R ESULTS
Classification results. We started comparisons between two baselines: SimCLR and VAE (Varia-
tional AutoEncoder), utilizing ResNet-18. Table 1 shows that either applying the Frangi filter during
preprocessing or GeM pooling layer with L2-normalization results in improvements from baselines,
but to a different extent. Frangi filter provides a much stronger improvement over SimCLR, yet
combining it with the GeM approach results in the most effective feature extractor. However, using
Frangi filter has limitations, as varying parameters may produce discrepancies within the dataset
or in a different dataset. We observed benefits from these combinations, but it’s apparent that these
methods are off-the-shelf and are not compared with novel SSL networks. Furthermore, the perfor-
mance on other medical datasets to solve more problems is unknown, which is left for future work.
Table 1: Lesion type classification comparisons. SimCLR or VAE with ReNet-18 were trained as
baseline models. Frangi: filter in preprocessing, +GeM: GeM pooling approach.
CBIR results. As shown in Appendix E Table 2, we assess the same patient (intra-patient), across
different patients (inter-patient), and all patients, as per commonly used clinical evaluations. The
standard retrieval metrics, mAP@10 and Precision@k, suggest that intra-patient retrieval precision
is higher because of highly similar features within a single patient, while features are dissimilar
between different patients. Furthermore, results indicate that the SimCLR model (contrastive-based)
outperforms the VAE model (generative-based).
4 C ONCLUSIONS
We present an open-source interactive application that facilitates lesion analysis and retrieval based
on self-supervised learning (SSL). Developed from the contrastive learning SimCLR, with Frangi
filter, GeM pooling and L2-normalization together improve lesion retrieval performance.
2
Published as a Tiny Paper at ICLR 2023
URM S TATEMENT
The authors acknowledge that at least one key author of this work meets the URM criteria of ICLR
2023 Tiny Papers Track.
R EFERENCES
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for
contrastive learning of visual representations. In International conference on machine learning,
pp. 1597–1607. PMLR, 2020.
Alejandro F Frangi, Wiro J Niessen, Koen L Vincken, and Max A Viergever. Multiscale vessel
enhancement filtering. In International conference on medical image computing and computer-
assisted intervention, pp. 130–137. Springer, 1998.
R Gómez. Understanding ranking loss, contrastive loss, margin loss, triplet loss, hinge loss and all
those confusing names. Raúl Gómez blog, pp. 12, 2019.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recog-
nition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.
770–778, 2016.
Michael S Hofman and Rodney J Hicks. How we read oncologic fdg pet/ct. Cancer Imaging, 16(1):
1–14, 2016.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep con-
volutional neural networks. Advances in neural information processing systems, 25:1097–1105,
2012.
Dirk-Jan Kroon and M Schrijver. Hessian based frangi vesselness filter. MATLAB File Exchange,
24409, 2009.
Antonia Longo, Stefan Morscher, Jaber Malekzadeh Najafababdi, Dominik Jüstel, Christian Zakian,
and Vasilis Ntziachristos. Assessment of hessian-based frangi vesselness filter in optoacoustic
imaging. Photoacoustics, 20:100200, 2020.
Tri-Cong Pham, Chi-Mai Luong, Muriel Visani, and Van-Dung Hoang. Deep cnn and data augmen-
tation for skin lesion classification. In Asian Conference on Intelligent Information and Database
Systems, pp. 573–582. Springer, 2018.
Filip Radenović, Giorgos Tolias, and Ondřej Chum. Fine-tuning cnn image retrieval with no human
annotation. IEEE transactions on pattern analysis and machine intelligence, 41(7):1655–1668,
2018.
Mehreen Tariq, Sajid Iqbal, Hareem Ayesha, Ishaq Abbas, Khawaja Tehseen Ahmad, and Muham-
mad Farooq Khan Niazi. Medical image based breast cancer diagnosis: State of the art and future
directions. Expert Systems with Applications, 167:114095, 2021.
Gijs van Tulder and Marleen de Bruijne. Combining generative and discriminative representation
learning for lung ct analysis with convolutional restricted boltzmann machines. IEEE transactions
on medical imaging, 35(5):1262–1272, 2016.
Ke Yan, Xiaosong Wang, Le Lu, and Ronald M Summers. Deeplesion: automated mining of large-
scale lesion annotations and universal lesion detection with deep learning. Journal of medical
imaging, 5(3):036501, 2018.
Yang You, Igor Gitman, and Boris Ginsburg. Large batch training of convolutional networks. arXiv
preprint arXiv:1708.03888, 2017.
3
Published as a Tiny Paper at ICLR 2023
A PPENDIX
2 2
RA RB s2
− −
Vf (λ) = (1 − e− 2α2 ) · e 2β2 · (1 − e 2γ 2 ), (1)
where RA = |λ 2|
√ |λ1 | , and s = λ21 + λ22 + λ23 . The modification enhances contours
p
|λ3 | , RB = |λ2 ·λ3 |
by restricting λ such that |λ1| ≤ |λ2 | ≤ |λ3 | at the scale “s”. To suppress background noises that
are not contours, we set λ3 = 0 if λ3 > 0, and change λ1 and λ2 into high eigenvalues to control
RB close to 1, which differentiates blob-like and plate-like lesion structures from others. Threshold
parameters are α = 1, β = 0.6, and γ = 0.0444 to balance sensitivity differentiating blob-like
and plate-like lesions from background noises. To capture lesions with various sizes, we modify the
multiscale value, ”s”, from 1 to 9 with a step size of 0.2. A few examples comparing original ROIs
and responses after applying the new function are shown in Figure 3.
Figure 3: Sample responses after the modified Frangi filter for lesion contour detection.
B DATASET
DeepLesion Dataset. Previously, most algorithms considered one or a few types of lesions (Pham
et al. (2018); Tariq et al. (2021); van Tulder & de Bruijne (2016)). Yan et al. developed the large
and publicly available DeepLesion dataset from the NIH Clinical Center (Yan et al. (2018)) towards
developing universal lesion detections. In clinical practice, DeepLesion dataset is widely accepted
for monitoring cancer patients due to its recorded diameters in accordance with the Solid Tumor
Response Evaluation Criteria (RECIST), which is one of the commonly used means of criteria.
DeepLesion dataset consists of 33,688 PACS-bookmarked CT images from 10,825 studies of 4,477
unique patients (Yan et al. (2018)). It includes various lesions across the human body, such as lungs,
lymph, liver, etc. Four cardinal directions (left, top, right, bottom) enclose each lesion in every CT
slice as a bounding box to mark coarse annotations with labels. We crop regions of interest (ROIs)
based on these bounding boxes instructed in the dataset to comprise excessive instances for model
training.
C E XPERIMENTAL SETUP
Data pre-processing. We preprocess DeepLesion by cropping bounding boxes, flips, color distor-
tions, and Gaussian blur. We keep the image size as 64 x 64 and apply the Frangi filter.
Implementation Details. We train models with Pytorch (Krizhevsky et al. (2012)) on a single
Nvidia DGX-A100 GPU. We implement ResNet-18 (He et al. (2016)) in SimCLR, followed by two
more convolutional layers and a GeM pooling layer with L2-normalization. We performed 1000
epochs for SSL pre-training with LARS (You et al. (2017)) optimizer of a 0.05 learning rate, a 10−5
weight decay, and a cosine learning rate scheduler. For CBIR task, we compute contrastive loss
with margin 0.8 and cosine distance from the sigmoid classification layer. Optimized with SGD, we
fine-tune the model for 50 iterations with a learning rate of 0.01, momentum of 0.9, and tracker of
cosine growth for the learning rate.
4
Published as a Tiny Paper at ICLR 2023
a is an anchor, p is a positive sample sharing the same lesion type with a. n is a negative sample
different from a’s type. m is the margin determining the stretch for separating negative and positive
samples. We use cosine distance similarity measure d to imply how similar a retrieved candidate is
to a given query:
i·j
d(i, j) = (3)
∥i∥ ∥j∥
i is a query and j is a lesion candidate embedding.
E CBIR R ESULTS