0% found this document useful (0 votes)

8 views11 pages

Codebook Guided Bootstrapping-医学影像

Uploaded by

zmbhou

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views11 pages

Codebook Guided Bootstrapping-医学影像

Uploaded by

zmbhou

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

CoBooM: Codebook Guided Bootstrapping for

Medical Image Representation Learning

Azad Singh[0000−0002−6607−1130] and Deepak Mishra[0000−0002−4078−9400]

Indian Institute of Technology, Jodhpur 342307, Rajasthan, India

{singh.63,dmishra}iitj.ac.in
arXiv:2408.04262v1 [cs.CV] 8 Aug 2024

Abstract. Self-supervised learning (SSL) has emerged as a promising

paradigm for medical image analysis by harnessing unannotated data.
Despite their potential, the existing SSL approaches overlook the high
anatomical similarity inherent in medical images. This makes it chal-
lenging for SSL methods to capture diverse semantic content in med-
ical images consistently. This work introduces a novel and generalized
solution that implicitly exploits anatomical similarities by integrating
codebooks in SSL. The codebook serves as a concise and informative
dictionary of visual patterns, which not only aids in capturing nuanced
anatomical details but also facilitates the creation of robust and gener-
alized feature representations. In this context, we propose CoBooM, a
novel framework for self-supervised medical image learning by integrat-
ing continuous and discrete representations. The continuous component
ensures the preservation of fine-grained details, while the discrete aspect
facilitates coarse-grained feature extraction through the structured em-
bedding space. To understand the effectiveness of CoBooM, we conduct a
comprehensive evaluation of various medical datasets encompassing chest
X-rays and fundus images. The experimental results reveal a significant
performance gain in classification and segmentation tasks.

Keywords: Self-supervised Learning · Codebook · Chest X-ray

1 Introduction
Expensive annotations for medical images promote Self-Supervised Learning
(SSL) [6,15,13,8]. Recent developments demonstrate its effectiveness across di-
verse modalities, such as X-rays, MRIs, CT, and histopathology [16,23]. However,
despite advancements, existing methods like SimCLR [6], MoCo [15], BYOL [13],
and VICReg [3] encounter challenges when applied to medical images, in terms of
effectively creating positive and negative pairs. The complexity occurs due to in-
herent feature overlapping among different anatomical sub-structures and across
diverse image samples. Current SSL methods oversee the anatomical overlapping
and, thus, potentially compromise the model’s performance and generalization
capabilities.
In this work, we propose a simple yet effective technique involving learning
generalized features guided by a codebook [24,32], enabling the capturing of con-
cise discrete features. By associating similar anatomical features with common
2 A. Singh, D. Mishra

codes and distinguishing features with distinct codes, the codebook facilitates a
structured learning process, which overcomes the challenges associated, such as
defining effective positive and negative pairs [27]. This establishes a systematic
representation where recurring patterns are encoded consistently. For instance,
the presence of lung fields, ribs, and cardiac contours, common across chest
X-rays, may share the same or similar codes, providing a concise and shared
representation of prevalent features and creating a sparse but informative sum-
mary of the entire dataset. This introduces a strong structured inductive bias by
implicitly guiding the SSL model toward making assumptions about the common
patterns and structures present.
In this context, we propose an SSL framework named CoBooM: Codebook
Guided Bootstrapping for Medical Image Representation Learning. Specifically,
CoBooM encompasses a Context and Target Encoders for learning continuous
features and a Quantizer module to quantize the features using codebook and
integrate them with continuous features using the novel DiversiFuse sub-module.
The DiversiFuse sub-module utilizes cross-attention mechanisms that capitalize
on the complementary information offered by these two representations. The in-
troduction of the codebook encourages the SSL model to recognize and prioritize
the shared generalized common features during the training process. In addition,
the complementary integration of the continuous and discrete representations al-
lows the model to capture fine-grained features, contributing to a smooth and
rich embedding space. This leads to a more holistic and refined understanding of
the underlying data. We conduct experiments across diverse modalities to vali-
date its effectiveness, encompassing chest X-ray and fundus images. We evaluate
the proposed approach under linear probing and semi-supervised evaluation pro-
tocols and observe more than 3% performance gains in downstream classification
and segmentation tasks.

2 Background

Discriminative SSL Approaches: Discriminative SSL has seen advancements

with approaches like SimCLR [6], MoCo [15,7], BYOL [13], Barlow-Twins [30],
that captures generalized features by enhancing the similarity between positive
pairs while maximizing the dissimilarity between negative pairs either explic-
itly or implicitly. In the domain of medical images, discriminative SSL tech-
niques, especially contrastive approaches, have gained substantial attention and
found meaningful applicability. Various adaptations of contrastive methods, like
MoCo-CXR [22], for chest X-rays, MICLe [2] using multiple patient images, and
MedAug [25] with metadata-based positive pair selection, contribute to the im-
provement of medical image representations. Simultaneously, another approach,
DiRA [14], unites the discriminative, restorative, and adversarial learning to
capture the complementary features. Zhou et al. propose PCRL [34] for X-
ray and CT modalities, later improved with PCRLv2 [33] addressing pixel-level
restoration and scale information. Kaku et al. enhance contrastive learning with
intermediate-layer closeness in their approach [18]. In [9], SimCLR was used for
CoBooM 3

Fig. 1. The architecture overview of the proposed framework. EMA is an exponential

moving average used to update the parameters of the Target encoder. gθ and gϕ are the
three MLP networks that serve as projection heads for Context and Target encoders.
fθ′ serves as the decoder network.

pre-training on multiple unlabeled histopathology datasets, improving feature

quality and superior performance over ImageNet-pretrained networks. In other
studies [19,4], authors showcased the efficacy of different SSL methods on large-
scale pathology data. While the existing methods show advancements, however
they oversight the significant anatomical similarities in medical data. The pro-
posed approach implicitly harnesses the anatomical similarities to capture more
informative features.
Codebook in Medical Image Analysis: Using codebook in medical im-
age analysis holds the promising potential [12]. By discretizing the data, code-
books can simplify complex medical image features, making them easier to an-
alyze [20,26]. Recent studies [11,31] highlight the effectiveness of learning dis-
crete representations through codebooks across various domains in achieving
interpretable and robust medical image retrieval, generation, recognition, and
segmentation.

3 Methodology

Fig. 1 provides an architectural layout of the proposed SSL framework, com-

prising a Context encoder parameterized by θ, a Target encoder parameterized
by ϕ and a Quantizer module. Additionally, two projection heads are denoted
as qθ and pθ and a decoder fθ′ . The proposed framework adheres to the self-
distillation-based non-contrastive SSL paradigm [13]. The parameters θ undergo
updates through back-propagation of the loss, while the parameters ϕ are the
earlier version of the θ, updated using exponential moving average(EMA). Given
an input sample x, it creates two augmented views x1 and x2 by applying the
4 A. Singh, D. Mishra

random set of augmentations. x1 is processed by fϕ to output feature map yϕ

while fθ produces yθ from x2 . Further, yθ and yϕ after passing through the global
average pooling layer, fed to predictor heads gθ and gϕ to output the embeddings
zθ and zϕ carrying the global features. Subsequently, the target feature map yϕ
is quantized through the Quantizer module, utilizing a Codebook and Diversi-
Fuse module to represent and compress the features effectively. The following
subsection provides details of the proposed quantization process.

3.1 Quantizer

The Quantizer module utilizes codebook, a predefined table containing K dis-

crete codewords represented as vectors ek , each of size D. These codewords are
employed to quantize the lower-dimensional continuous feature maps yϕ received
from the target encoder fϕ . The Quantizer module compares the features from
yϕ with each K codewords in the codebook to measure similarity by employing
the Euclidean distance. The module identifies the closest codeword to the en-
coded data through an iterative process across the codebook. Subsequently, the
module replaces the continuous encoded data yϕ with the selected codewords,
effectively transforming the representation from continuous to discrete yd . This
quantization is executed with the objective of minimizing the quantization loss
Lq = lcb + α ∗ lce comprising of two terms, codebook loss (lcb = ||SG[yϕ ] − ek ||22 )
and the commitment loss (lce = ||yϕ − SG[ek ]||22 ). Here SG denotes the stop-
gradient operator and α specifies the weight of lce . The codebook loss guides
the adjustment of the codewords ek towards yϕ . Simultaneously, the commit-
ment loss enforces yϕ to adhere to specific embeddings in the codebook, thus
preventing unregulated expansion.

DiversiFuse (Feature Fusion with Multi-Head Cross Attention): Within

Quantizer, the DiversiFuse sub-module guides the model through discrete rep-
resentations yd in determining which parts of the continuous information yϕ are
more relevant. It enables the model to learn to focus on different aspects of the
continuous representation based on the specific values in the discrete features,
potentially capturing more complex patterns and dependencies within the data.
It involves a multi-head cross-attention mechanism where the quantized features
yd pass through q to output ydq , and the continuous features yϕ pass through k and
v to output zck and ycv respectively. The similarity scores between discrete queries
T
ydq and the continuous keys yck are calculated as SScore (ydq , yck ) = zdq · yck . Sub-
sequently, the scores are transformed into attention weights using the softmax
function: σ(SScore (ydq , yck )) denoted as ydc . The continuous values ycvPare then
weighted by the attention weights ydc and summed: WSum (ydc , ycv ) = ydc · ycv .
The keys yck help determine which parts of the continuous information should
be attended to, and the values provide the actual information to be attended to.
The process is repeated for all attention heads. The resulting aggregated repre-
′
sentation ydc is obtained through concatenation across all attention heads. This
integration of discrete and continuous representations enables the exchange of
CoBooM 5

complementary information, enhancing the model’s ability to capture complex

patterns and improve performance.

3.2 Loss Function

′
The output of the Quantizer module, denoted as ydc , undergoes an average
pooling layer and is subsequently projected into a lower-dimensional space using
the projection head pθ . The resulting output of pθ is denoted as zθ′ . To optimize
the parameters θ, the similarity scores between zθ and zϕ , as well as between zθ
and zθ′ , are calculated using the loss function defined in Equation (1).

⟨zθ , zϕ ⟩ ⟨zθ , zθ′ ⟩

L1 = , L2 = (1)
||zθ ||2 .||zϕ ||2 ||zθ ||2 .||zθ′ ||2
′
Additionally, ydc also fed to the decoder fθ′ to output the reconstructed image
′
x , enabling the model to capture local complementary features, formulated as
Lr = ||x − x′ ||2 . The final loss Lθ = α(L1 + L2 ) + Lq + γLr , where α and γ set to
0.5. Additionally, the symmetric form of the loss Lθ is utilized by interchangeably
feeding the views x1 and x2 to fθ and fϕ .

4 Experimental Setup

Descriptions of Datasets: For pre-training, we utilize a publicly available

official train set from NIH-Chest X-ray 14 [28] consisting of 86,524 X-ray im-
ages and the fundus images from the EyePACS [10] dataset. The downstream
classification task is performed on the officially available test set, with 25,596
samples and the retinal images from MuReD [21] and ODIR [35,17] datasets.
To assess the performance for the downstream segmentation task, we utilize the
SIIM-ACR [1] dataset for pneumothorax detection.
Implementation Details: We train the models on the Nvidia RTX A6000 with
the PyTorch framework. For backbone encoders (fθ and fϕ ), we use ResNet18
architecture, with an input image size of 224×224, batch size of 64, and number
of epochs of 300. The number of codebook vectors are 1024, each of size 512. All
projection and prediction heads are three-layer MLP networks with an output
size 256. For optimizing the parameters θ, we employ LARS [29] optimization, a
base learning rate set at 0.02. Additionally, we implement a cosine decay learning
rate scheduler without restarts.
Baselines for Comparison: To assess the performance of our proposed ap-
proach, we compare it with supervised learning, with random initialization (Sup.)
and several established SSL methods, encompassing contrastive, non-contrastive,
and clustering-based techniques including SimCLR [6], BYOL [13], VICReg [3],
SwAV [5], DiRA [14], CAiD [23] and PCRLv2 [33]. Notably, we conduct the
pre-training for the baselines following their official implementations and using
the same training protocol as our proposed method.
6 A. Singh, D. Mishra

Table 1. Performance evaluation of the proposed approach in terms of AUC score

on the NIH, MuRed, and the ODIR datasets, and dice score for the pneumothorax
segmentation (SIIM) under linear probing. The best results are bold, SD is not shown
due to low variability.

NIH SIIM MuReD ODIR

Methods
1% 5% 10% 30% All All 10% 10%
Sup. 51.6 55.1 57.1 61.1 61.8 48.4 58.6 56.4
SimCLR 56.9 59.7 62.7 67.6 70.0 50.3 72.1 70.2
BYOL 54.7 58.3 61.7 66.3 69.0 49.8 70.5 67.4
SwAV 55.5 59.1 62.4 67.7 70.2 53.4 71.6 70.8
VICReg 58.7 60.7 62.7 66.2 67.3 48.7 72.4 66.5
CAiD 63.7 67.2 68.9 70.3 73.5 55.3 70.7 69.5
PCRLv2 61.9 66.4 68.3 71.5 73.8 56.4 72.6 72.4
DiRA 60.8 65.8 68.6 72.6 74.1 56.8 71.7 70.8
Ours w/o Dec. 65.1 70.1 72.0 73.6 74.8 55.6 75.8 76.0
Ours w/ Dec. 64.9 70.3 72.4 73.3 74.3 57.5 76.0 75.3
Ours w/o DF. 63.3 68.6 70.9 72.1 73.4 54.9 74.6 73.8

5 Results and Discussion

Linear Probing Evaluation: Table 1 presents the experimental results on NIH

and SIIM-ACR datasets under linear probing protocol. Specifically, the param-
eters of encoder fθ remain frozen while that of the linear layer get updated. For
NIH, we evaluate the performance by sample labeled subsets from the official
train set and report the official test set results in terms of AUC score. Similarly,
on MuRed and ODIR datasets, the test set AUC score is reported by evaluating
10% of labeled training data. For pneumothorax segmentation on SIIM-ACR,
we report the results in terms of dice score by updating the parameters of the
decoder network while that of the encoder remains frozen. Supervised learning
(Sup.) notably yields lower AUC scores than the SSL methods. The proposed
approach consistently outperforms other baselines across varying degrees of la-
beled data, specifically for the 1% subset from NIH, the our approach achieves
the highest AUC score of 65.1% with an average performance gain of more than
3% from all the baseline methods. Fig 2 presents the diagnostic maps for different
pathological conditions, corresponding to the 10% labeled samples from NIH. A
similar trend is observed for MuReD and the ODIR dataset, where the proposed
approach outperforms the baselines with a considerable average margin of more
than 3%. This indicates the method’s ability to extract meaningful representa-
tions from unlabeled data for the subsequent downstream training using limited
labeled samples. Furthermore, a similar improvement in AUC scores is observed
with increased labeled data. The proposed approach also results in the highest
dice score of 57.5% on pneumothorax segmentation, with an improvement of 1%
compared to the best-performing baseline.
CoBooM 7

Ours DiRA PCRLv2 CAiD Ours DiRA PCRLv2 CAiD

Cardio.
Atelec.
Effus.

Mass
Fig. 2. Diagnostic maps for Atelectasis, Effusion, Cardiomegaly, and Mass correspond-
ing to the X-ray images from NIH indicate that CoBooM captures pathological features
effectively compared to other best-performing baseline methods. The bounding box in-
dicates the ground truth.

Table 2. Semi-supervised fine-tuning evaluation in terms of AUC score (%) on the NIH,
MuRed, and the ODIR datasets, and dice score for the pneumothorax segmentation.

NIH SIIM MuReD ODIR

Methods
1% 5% 10% 30% All All 10% 10%
SUP. 57.7 62.7 65.6 70.7 74.1 51.2 66.7 63.2
SimCLR 62.1 65.7 68.9 72.2 75.6 53.3 80.9 73.4
BYOL 61.0 65.2 67.7 71.6 74.8 52.8 78.6 71.3
SwAV 61.7 65.6 66.9 72.1 75.8 54.4 79.4 72.7
VICReg 60.0 64.8 68.4 71.8 75.4 54.4 78.3 72.9
CAiD 64.4 69.6 71.3 73.8 77.4 56.5 81.0 73.1
PCRLv2 63.0 68.7 70.6 73.1 76.1 57.3 82.4 74.6
DiRA 62.7 67.3 71.2 74.5 77.8 58.8 81.6 73.4
Ours w/o Dec. 65.8 70.6 72.3 76.7 79.6 57.8 84.4 75.8
Ours w/ Dec. 65.6 70.8 72.1 77.1 79.3 59.6 84.8 75.7
Ours w/o DF. 63.7 70.0 72.2 76.3 78.9 57.1 83.1 74.2

Semi-Supervised Evaluation: Table 2 presents the test set performance of the

baseline methods and the proposed approach under semi-supervised evaluation
on the NIH, SIIM-ACR, MuRed, and ODIR datasets, where we fine-tune the
parameters of backbone encoder fθ also along with the linear layer. We present
the official test set performance evaluation in terms of AUC score on the NIH,
MuReD, and ODIR by fine-tuning the model using various subsets of labeled
data extracted from the training samples. We observe consistently superior per-
formance of the proposed approach over existing SSL methods across all the
subsets. Notably, our method achieves the highest AUC score of 65.8%, with 1%
of the training samples surpassing the baselines by a margin exceeding 2%. The
trend persists as the labeled data increases to 100%, with the proposed approach
consistently outperforming the baselines and maintaining an average gain of 2%.
MuRed and ODIR datasets have a similar performance gain, with the highest
AUC scores of 84.8 and 75.8, respectively. For pneumothorax segmentation also,
8 A. Singh, D. Mishra

we observe the highest dice score of 59.6% with a margin of more than 2% com-
pared to the best-performing baseline method.
Optimal Performance with Minimal Fine-Tuning: Upon comparing the
results presented in Table 1 and 2, a noteworthy observation is that our pro-
posed method demonstrates minimal or no need for fine-tuning of the backbone
encoder, especially with lower numbers of labeled training samples. Specifically,
at 1%, the proposed method achieves AUC scores of 65.1% and 65.8% under
the linear-probing and semi-supervised fine-tuning evaluation protocols, respec-
tively. Similarly, for 5% and 10% labeled training samples, our method’s AUC
scores remain comparable with negligible margins. This trend contrasts baseline
methods, where a substantial performance gain is observed from linear probing
to semi-supervised fine-tuning. This highlights the effectiveness of our proposed
method while demonstrating a remarkable capacity to achieve optimal perfor-
mance with minimal fine-tuning to adapt to different tasks. This signifies the
proposed approach’s adaptability and highlights its potential to derive meaning-
ful and transferable representations with minimal fine-tuning, which aligns with
the practical requirements of real-world settings where computational resources
may be limited.

Ablation Studies: We conduct an ablation study to examine the impact of

different components of the proposed approach under both linear probing and
semi-supervised evaluation protocols. In our first study, we evaluate the model’s
performance by performing the pre-training, with and without the decoder by
keeping the DiversiFuse module. We pre-train the model without the Diversi-
Fuse module and the decoder for another study. Table 1 and 2 present the test
set results across various downstream tasks for these studies. We observe no ef-
fect of the decoder on the model’s performance during classification tasks in the
downstream evaluations. However, while evaluating the performance on the seg-
mentation task, we observed superior performance when pre-training the model
with the decoder under both evaluation protocols. When pre-train the model
without the DiversiFuse sub-module in the Quantizer, we observe a decline of
around 2% across all tasks on evaluating the model’s performance under linear
probing. Under semi-supervised evaluation, the model can maintain its perfor-
mance even without the DiversiFuse sub-module, however, for classification with
1% labeled samples from NIH, we observe a degradation in AUC score of 2%.
This highlights the importance of the DiversiFuse sub-module in improving the
quality of the learned representations with the help of discrete features.

6 Conclusion

In this work, we propose an efficient SSL pre-training by integrating the discrete

and continuous features with the help of a codebook. We propose a novel Diver-
siFuse sub-module, which guides the model in learning generalized and better
representation and does not require much fine-tuning, especially when labeled
data is limited. We highlight the proposed model’s ability to capture complex
CoBooM 9

medical attributes with limited resource availability through empirical studies.

We evaluate the performance of the proposed approach by comparing it with
various SSL methods under both linear probing and semi-supervised evaluations
for both classification and segmentation tasks. This highlights its effectiveness
in handling various tasks associated with medical image analysis.

References
1. Society for imaging informatics in medicine: Siim-acr pneumothorax segmentation
(2019), https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/
overview/description
2. Azizi, S., Mustafa, B., Ryan, F., Beaver, Z., Freyberg, J., Deaton, J., Loh, A.,
Karthikesalingam, A., Kornblith, S., Chen, T., Natarajan, V., Norouzi, M.: Big
self-supervised models advance medical image classification (2021)
3. Bardes, A., Ponce, J., LeCun, Y.: Vicreg: Variance-invariance-covariance regular-
ization for self-supervised learning (2022)
4. Boyd, J., Liashuha, M., Deutsch, E., Paragios, N., Christodoulidis, S.,
Vakalopoulou, M.: Self-supervised representation learning using visual field expan-
sion on digital pathology. In: Proceedings of the IEEE/CVF International Confer-
ence on Computer Vision. pp. 639–647 (2021)
5. Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised
learning of visual features by contrasting cluster assignments. Advances in Neural
Information Processing Systems 33, 9912–9924 (2020)
6. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for con-
trastive learning of visual representations (2020)
7. Chen, X., Fan, H., Girshick, R., He, K.: Improved baselines with momentum con-
trastive learning. arXiv preprint arXiv:2003.04297 (2020)
8. Chen, X., He, K.: Exploring simple siamese representation learning (2020)
9. Ciga, O., Xu, T., Martel, A.L.: Self supervised contrastive learning for digital
histopathology. Machine Learning with Applications 7, 100198 (2022)
10. Dugas, E., Jared, Jorge, Cukierski, W.: Diabetic retinopathy detection (2015),
https://kaggle.com/competitions/diabetic-retinopathy-detection
11. Gangloff, H., Pham, M.T., Courtrai, L., Lefèvre, S.: Leveraging vector-quantized
variational autoencoder inner metrics for anomaly detection. In: 2022 26th Inter-
national Conference on Pattern Recognition (ICPR). pp. 435–441. IEEE (2022)
12. Gorade, V., Mittal, S., Jha, D., Bagci, U.: Synergynet: Bridging the gap between
discrete and continuous representations for precise medical image segmentation. In:
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer
Vision. pp. 7768–7777 (2024)
13. Grill, J.B., Strub, F., Altché, F., Tallec, C., Richemond, P.H., Buchatskaya, E., Do-
ersch, C., Pires, B.A., Guo, Z.D., Azar, M.G., Piot, B., Kavukcuoglu, K., Munos,
R., Valko, M.: Bootstrap your own latent: A new approach to self-supervised learn-
ing (2020)
14. Haghighi, F., Taher, M.R.H., Gotway, M.B., Liang, J.: Dira: Discriminative,
restorative, and adversarial learning for self-supervised medical image analysis.
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition. pp. 20824–20834 (2022)
15. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised
visual representation learning (2020)
10 A. Singh, D. Mishra

16. Huang, S.C., Pareek, A., Jensen, M., Lungren, M.P., Yeung, S., Chaudhari, A.S.:
Self-supervised learning for medical image classification: a systematic review and
implementation guidelines. NPJ Digital Medicine 6(1), 74 (2023)
17. kaggle: Ocular disease recognition, https://www.kaggle.com/andrewmvd/
ocular-disease-recognition-odir5k
18. Kaku, A., Upadhya, S., Razavian, N.: Intermediate layers matter in momentum
contrastive self supervised learning. Advances in Neural Information Processing
Systems 34, 24063–24074 (2021)
19. Kang, M., Song, H., Park, S., Yoo, D., Pereira, S.: Benchmarking self-supervised
learning on diverse pathology datasets. In: Proceedings of the IEEE/CVF Confer-
ence on Computer Vision and Pattern Recognition. pp. 3344–3354 (2023)
20. Kobayashi, K., Hataya, R., Kurose, Y., Miyake, M., Takahashi, M., Nakagawa, A.,
Harada, T., Hamamoto, R.: Decomposing normal and abnormal features of medical
images for content-based image retrieval of glioma imaging. Medical image analysis
74, 102227 (2021)
21. Rodríguez, M.A., AlMarzouqi, H., Liatsis, P.: Multi-label retinal disease classi-
fication using transformers. IEEE Journal of Biomedical and Health Informatics
(2022)
22. Sowrirajan, H., Yang, J., Ng, A.Y., Rajpurkar, P.: Moco-cxr: Moco pretraining
improves representation and transferability of chest x-ray models (2021)
23. Taher, M.R.H., Haghighi, F., Gotway, M.B., Liang, J.: Caid: Context-aware in-
stance discrimination for self-supervised learning in medical imaging. In: Interna-
tional Conference on Medical Imaging with Deep Learning. pp. 535–551. PMLR
(2022)
24. Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning.
Advances in neural information processing systems 30 (2017)
25. Vu, Y.N.T., Wang, R., Balachandar, N., Liu, C., Ng, A.Y., Rajpurkar, P.: Medaug:
Contrastive learning leveraging patient metadata improves representations for
chest x-ray interpretation (2021)
26. Wang, J., Han, X.H., Xu, Y., Lin, L., Hu, H., Jin, C., Chen, Y.W., et al.: Sparse
codebook model of local structures for retrieval of focal liver lesions using multi-
phase medical images. International journal of biomedical imaging 2017 (2017)
27. Wang, J., Zeng, Z., Chen, B., Dai, T., Xia, S.T.: Contrastive quantization with code
memory for unsupervised image retrieval. In: Proceedings of the AAAI Conference
on Artificial Intelligence. vol. 36, pp. 2468–2476 (2022)
28. Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M.: Chestx-ray8:
Hospital-scale chest x-ray database and benchmarks on weakly-supervised classi-
fication and localization of common thorax diseases. In: Proceedings of the IEEE
conference on computer vision and pattern recognition. pp. 2097–2106 (2017)
29. You, Y., Gitman, I., Ginsburg, B.: Large batch training of convolutional networks.
arXiv preprint arXiv:1708.03888 (2017)
30. Zbontar, J., Jing, L., Misra, I., LeCun, Y., Deny, S.: Barlow twins: Self-supervised
learning via redundancy reduction (2021)
31. Zhang, Y., Sun, K., Liu, Y., Ou, Z., Shen, D.: Vector quantized multi-modal guid-
ance for alzheimer’s disease diagnosis based on feature imputation. In: International
Workshop on Machine Learning in Medical Imaging. pp. 403–412. Springer (2023)
32. Zheng, C., Vedaldi, A.: Online clustered codebook. In: Proceedings of the
IEEE/CVF International Conference on Computer Vision. pp. 22798–22807 (2023)
33. Zhou, H.Y., Lu, C., Chen, C., Yang, S., Yu, Y.: A unified visual information preser-
vation framework for self-supervised pre-training in medical image analysis. IEEE
Transactions on Pattern Analysis and Machine Intelligence (2023)
CoBooM 11

34. Zhou, H.Y., Lu, C., Yang, S., Han, X., Yu, Y.: Preservational learning improves
self-supervised medical image models by reconstructing diverse contexts. In: Pro-
ceedings of the IEEE/CVF International Conference on Computer Vision. pp.
3499–3509 (2021)
35. Zhou, Y., Wang, B., Huang, L., Cui, S., Shao, L.: A benchmark for studying di-
abetic retinopathy: segmentation, grading, and transferability. IEEE Transactions
on Medical Imaging 40(3), 818–828 (2020)

26687-Article Text-30750-1-2-20230626
No ratings yet
26687-Article Text-30750-1-2-20230626
10 pages
Continual Self-Supervised Learning - Towards Universal Multi-Modal Medical Data Representation Learning
No ratings yet
Continual Self-Supervised Learning - Towards Universal Multi-Modal Medical Data Representation Learning
11 pages
Learning Visual Context by Comparison
No ratings yet
Learning Visual Context by Comparison
17 pages
The Graphnet Zoo: An All-In-One Graph Based Deep Semi-Supervised Framework For Medical Image Classification
No ratings yet
The Graphnet Zoo: An All-In-One Graph Based Deep Semi-Supervised Framework For Medical Image Classification
11 pages
An_Attention-Guided_Deep_Neural_Network_for_Annotating_Abnormalities_in_Chest_X-ray_Images_Visualization_of_Network_Decision_Basis
No ratings yet
An_Attention-Guided_Deep_Neural_Network_for_Annotating_Abnormalities_in_Chest_X-ray_Images_Visualization_of_Network_Decision_Basis
4 pages
Self-Supervision_and_Weak_Supervision_for_Accurate_and_Interpretable_Chest_X-Ray_Classification_Models
No ratings yet
Self-Supervision_and_Weak_Supervision_for_Accurate_and_Interpretable_Chest_X-Ray_Classification_Models
8 pages
Anomaly Detection in Medical Images
No ratings yet
Anomaly Detection in Medical Images
4 pages
Wu_MedKLIP_Medical_Knowledge_Enhanced_Language-Image_Pre-Training_for_X-ray_Diagnosis_ICCV_2023_paper
No ratings yet
Wu_MedKLIP_Medical_Knowledge_Enhanced_Language-Image_Pre-Training_for_X-ray_Diagnosis_ICCV_2023_paper
12 pages
unpaired_medical_report_generation_cycle_consistency_hirsch_tal
No ratings yet
unpaired_medical_report_generation_cycle_consistency_hirsch_tal
16 pages
Multi-Label Local To Global Learning A Novel Learning Paradigm For Chest X-Ray Abnormality Classification
No ratings yet
Multi-Label Local To Global Learning A Novel Learning Paradigm For Chest X-Ray Abnormality Classification
12 pages
Vision-Language Foundation Model For Echocardiogram Interpretation
No ratings yet
Vision-Language Foundation Model For Echocardiogram Interpretation
16 pages
Aaai Mhoang
No ratings yet
Aaai Mhoang
13 pages
256_Camera Ready - Copy
No ratings yet
256_Camera Ready - Copy
9 pages
Contrast-Attentive Thoracic Disease Recognition With Dual-Weighting Graph Reasoning
No ratings yet
Contrast-Attentive Thoracic Disease Recognition With Dual-Weighting Graph Reasoning
11 pages
2503.01164v1
No ratings yet
2503.01164v1
10 pages
MedCLIP - Contrastive Learning From Unpaired Medical Images and Text
No ratings yet
MedCLIP - Contrastive Learning From Unpaired Medical Images and Text
12 pages
Robust and Explainable Framework To Address Data Scarcity in Diagnostic Imaging
No ratings yet
Robust and Explainable Framework To Address Data Scarcity in Diagnostic Imaging
64 pages
Automatic Detection of Pneumonia On Compressed Sensing Images Using Deep Learning
No ratings yet
Automatic Detection of Pneumonia On Compressed Sensing Images Using Deep Learning
4 pages
fradi-03-1088068
No ratings yet
fradi-03-1088068
10 pages
Medical Coding With Clinical Notes
No ratings yet
Medical Coding With Clinical Notes
13 pages
vision teransformaer paper
No ratings yet
vision teransformaer paper
12 pages
Self-supervised Visualisation of Medical Image Datasets
No ratings yet
Self-supervised Visualisation of Medical Image Datasets
13 pages
Information: Ensemble Deep Learning Models For Heart Disease Classification: A Case Study From Mexico
No ratings yet
Information: Ensemble Deep Learning Models For Heart Disease Classification: A Case Study From Mexico
28 pages
Using Generative AI to Investigate Medical Imagery Models
No ratings yet
Using Generative AI to Investigate Medical Imagery Models
14 pages
Medical Image Captioning via Generative Pretrained
No ratings yet
Medical Image Captioning via Generative Pretrained
13 pages
mTechPesWeJune21Grp6 Final+Submission
No ratings yet
mTechPesWeJune21Grp6 Final+Submission
50 pages
ScribFormer Transformer Makes CNN Work Better For Scribble-Based Medical Image Segmentation
No ratings yet
ScribFormer Transformer Makes CNN Work Better For Scribble-Based Medical Image Segmentation
12 pages
Applsci 11 11185
No ratings yet
Applsci 11 11185
19 pages
jimaging-07-00074
No ratings yet
jimaging-07-00074
4 pages
Applsci 13 10521
No ratings yet
Applsci 13 10521
25 pages
Contrastive Learning of Medical Visual Representations From Paired Images and Text
No ratings yet
Contrastive Learning of Medical Visual Representations From Paired Images and Text
15 pages
text(2)
No ratings yet
text(2)
10 pages
Paper 2
No ratings yet
Paper 2
11 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Can We Trust Deep Learning Model For Diagnosis
No ratings yet
Can We Trust Deep Learning Model For Diagnosis
10 pages
(25) Luis EDLforChest ElsevierCBM (2022)
No ratings yet
(25) Luis EDLforChest ElsevierCBM (2022)
15 pages
2406.06512v1
No ratings yet
2406.06512v1
28 pages
Comput Methods Programs Biomed 2021 206 106130
No ratings yet
Comput Methods Programs Biomed 2021 206 106130
11 pages
2002.08277 - When Radiology Report Generation Meets Knowledge Graph
No ratings yet
2002.08277 - When Radiology Report Generation Meets Knowledge Graph
8 pages
1-s2.0-S1361841524000112-am
No ratings yet
1-s2.0-S1361841524000112-am
22 pages
Self-Supervised Learning For Medical Image Classi Fication: A Systematic Review and Implementation Guidelines
No ratings yet
Self-Supervised Learning For Medical Image Classi Fication: A Systematic Review and Implementation Guidelines
16 pages
Machine Learning Approaches in Medical Image Analysis
No ratings yet
Machine Learning Approaches in Medical Image Analysis
4 pages
tip.2021.3052711
No ratings yet
tip.2021.3052711
12 pages
Artificial Intelligence in Medicine
No ratings yet
Artificial Intelligence in Medicine
8 pages
Towards Continuous Domain Adaptation For Healthcare
No ratings yet
Towards Continuous Domain Adaptation For Healthcare
5 pages
applsci-14-01238-v2
No ratings yet
applsci-14-01238-v2
18 pages
Pseudo-Data Based Self-Supervised Federated Learning For Classification of Histopathological Images
No ratings yet
Pseudo-Data Based Self-Supervised Federated Learning For Classification of Histopathological Images
14 pages
Data Augmentation Through Pseudolabels in Automatic Region Based Coronary Artery Segmentation For Disease Diagnosis
No ratings yet
Data Augmentation Through Pseudolabels in Automatic Region Based Coronary Artery Segmentation For Disease Diagnosis
11 pages
A Real-World Dataset and Benchmark For Foundation Model Adaptation in Medical Image Classification
No ratings yet
A Real-World Dataset and Benchmark For Foundation Model Adaptation in Medical Image Classification
9 pages
Semi-Supervised Medical Image Classification With Relation-Driven Self-Ensembling Model
No ratings yet
Semi-Supervised Medical Image Classification With Relation-Driven Self-Ensembling Model
12 pages
2405.11618v1
No ratings yet
2405.11618v1
19 pages
Automatic Radiology Report Generation Based On Multi-View Image Fusion and Medical Concept Enrichment
No ratings yet
Automatic Radiology Report Generation Based On Multi-View Image Fusion and Medical Concept Enrichment
9 pages
_Disease_Prediction_via_Graph_Neural_Networks -KodVar
No ratings yet
_Disease_Prediction_via_Graph_Neural_Networks -KodVar
9 pages
2302.05043v2
No ratings yet
2302.05043v2
35 pages
不确定性引导的互一致性学习在半监督医学图像分割中的应用
No ratings yet
不确定性引导的互一致性学习在半监督医学图像分割中的应用
10 pages
Text Detection and Recognition For Images of Medical Laboratory Reports With A Deep Learning Approach
No ratings yet
Text Detection and Recognition For Images of Medical Laboratory Reports With A Deep Learning Approach
10 pages
Attention_Based_Cross-Domain_Synthesis_and_Segmentation_From_Unpaired_Medical_Images (1)
No ratings yet
Attention_Based_Cross-Domain_Synthesis_and_Segmentation_From_Unpaired_Medical_Images (1)
13 pages
Ddxnet: A Deep Learning Model For Automatic Interpretation of Electronic Health Records, Electrocardiograms and Electroencephalograms
No ratings yet
Ddxnet: A Deep Learning Model For Automatic Interpretation of Electronic Health Records, Electrocardiograms and Electroencephalograms
11 pages
sensors-21-04758-v2
No ratings yet
sensors-21-04758-v2
48 pages
Chestx-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks On Weakly-Supervised Classification and Localization of Common Thorax Diseases
No ratings yet
Chestx-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks On Weakly-Supervised Classification and Localization of Common Thorax Diseases
19 pages
ADC Technology Imp
No ratings yet
ADC Technology Imp
10 pages
1 Kundur IntroDSP Handouts
No ratings yet
1 Kundur IntroDSP Handouts
13 pages
Div XUser Guide 521
No ratings yet
Div XUser Guide 521
120 pages
Error Modeling and Improved Position Estimation For Optical Incremental Encoders by Means of Time Stamping
No ratings yet
Error Modeling and Improved Position Estimation For Optical Incremental Encoders by Means of Time Stamping
7 pages
DCS Comprehension Questions
No ratings yet
DCS Comprehension Questions
4 pages
Nghiên Cứu Và So Sánh Về Cấu Trúc Mã Cực Cho Kênh AWGN
No ratings yet
Nghiên Cứu Và So Sánh Về Cấu Trúc Mã Cực Cho Kênh AWGN
9 pages
Chapter 12.1, Communication Systems, Carlson.: Reference
No ratings yet
Chapter 12.1, Communication Systems, Carlson.: Reference
17 pages
10 - Pulse Amplitude Modulation (PAM) PDF
No ratings yet
10 - Pulse Amplitude Modulation (PAM) PDF
56 pages
Slope Overload Distortion
50% (6)
Slope Overload Distortion
5 pages
Lab Sheet ETM3136 DTL1 Pulse Code Modulation (2010)
No ratings yet
Lab Sheet ETM3136 DTL1 Pulse Code Modulation (2010)
38 pages
Rohini 88503790579
No ratings yet
Rohini 88503790579
3 pages
P1 - 16 and Quantization... DC
No ratings yet
P1 - 16 and Quantization... DC
8 pages
Lecture Week2 Embedded Systems
No ratings yet
Lecture Week2 Embedded Systems
55 pages
Info Theory Polyanskiy Wu
No ratings yet
Info Theory Polyanskiy Wu
730 pages
EC8002 MCC Question Bank Watermark
No ratings yet
EC8002 MCC Question Bank Watermark
109 pages
Ee247 - Lecture24
No ratings yet
Ee247 - Lecture24
32 pages
Pulse Code Modulation
No ratings yet
Pulse Code Modulation
7 pages
Accelerating Inference for High Resolution Images With Quantization and Distributed Deep Learning
No ratings yet
Accelerating Inference for High Resolution Images With Quantization and Distributed Deep Learning
8 pages
17ec741 - Multimedia Information Representation - Module 2
No ratings yet
17ec741 - Multimedia Information Representation - Module 2
54 pages
Lossy Compression Algorithms
100% (2)
Lossy Compression Algorithms
18 pages
Comm2 2012
No ratings yet
Comm2 2012
298 pages
Ex Solution Pulse
No ratings yet
Ex Solution Pulse
3 pages
Image Processing Compression Techniques
No ratings yet
Image Processing Compression Techniques
16 pages
Lec11 PDF
No ratings yet
Lec11 PDF
6 pages
A/D Converter (ADC) Introduction
No ratings yet
A/D Converter (ADC) Introduction
7 pages
Uniform Color Quantization
No ratings yet
Uniform Color Quantization
5 pages
Fuzzy Logic Controller Design and Implementation For Industrial Applications
No ratings yet
Fuzzy Logic Controller Design and Implementation For Industrial Applications
6 pages
Iniyavan Thesisdraft Delft PDF
No ratings yet
Iniyavan Thesisdraft Delft PDF
102 pages
Group 4 Midterm Project - Written Report
No ratings yet
Group 4 Midterm Project - Written Report
25 pages
EC6502-Principal of Digital Signal Processing - 2013 - Regulation PDF
No ratings yet
EC6502-Principal of Digital Signal Processing - 2013 - Regulation PDF
13 pages

Codebook Guided Bootstrapping-医学影像

Uploaded by

Codebook Guided Bootstrapping-医学影像

Uploaded by

CoBooM: Codebook Guided Bootstrapping for

Medical Image Representation Learning

Azad Singh[0000−0002−6607−1130] and Deepak Mishra[0000−0002−4078−9400]

Indian Institute of Technology, Jodhpur 342307, Rajasthan, India

Abstract. Self-supervised learning (SSL) has emerged as a promising

Keywords: Self-supervised Learning · Codebook · Chest X-ray

Discriminative SSL Approaches: Discriminative SSL has seen advancements

Fig. 1. The architecture overview of the proposed framework. EMA is an exponential

pre-training on multiple unlabeled histopathology datasets, improving feature

Fig. 1 provides an architectural layout of the proposed SSL framework, com-

random set of augmentations. x1 is processed by fϕ to output feature map yϕ

The Quantizer module utilizes codebook, a predefined table containing K dis-

DiversiFuse (Feature Fusion with Multi-Head Cross Attention): Within

complementary information, enhancing the model’s ability to capture complex

3.2 Loss Function

⟨zθ , zϕ ⟩ ⟨zθ , zθ′ ⟩

Descriptions of Datasets: For pre-training, we utilize a publicly available

Table 1. Performance evaluation of the proposed approach in terms of AUC score

NIH SIIM MuReD ODIR

5 Results and Discussion

Linear Probing Evaluation: Table 1 presents the experimental results on NIH

Ours DiRA PCRLv2 CAiD Ours DiRA PCRLv2 CAiD

NIH SIIM MuReD ODIR

Semi-Supervised Evaluation: Table 2 presents the test set performance of the

Ablation Studies: We conduct an ablation study to examine the impact of

In this work, we propose an efficient SSL pre-training by integrating the discrete

medical attributes with limited resource availability through empirical studies.

You might also like