Codebook Guided Bootstrapping-医学影像
Codebook Guided Bootstrapping-医学影像
1 Introduction
Expensive annotations for medical images promote Self-Supervised Learning
(SSL) [6,15,13,8]. Recent developments demonstrate its effectiveness across di-
verse modalities, such as X-rays, MRIs, CT, and histopathology [16,23]. However,
despite advancements, existing methods like SimCLR [6], MoCo [15], BYOL [13],
and VICReg [3] encounter challenges when applied to medical images, in terms of
effectively creating positive and negative pairs. The complexity occurs due to in-
herent feature overlapping among different anatomical sub-structures and across
diverse image samples. Current SSL methods oversee the anatomical overlapping
and, thus, potentially compromise the model’s performance and generalization
capabilities.
In this work, we propose a simple yet effective technique involving learning
generalized features guided by a codebook [24,32], enabling the capturing of con-
cise discrete features. By associating similar anatomical features with common
2 A. Singh, D. Mishra
codes and distinguishing features with distinct codes, the codebook facilitates a
structured learning process, which overcomes the challenges associated, such as
defining effective positive and negative pairs [27]. This establishes a systematic
representation where recurring patterns are encoded consistently. For instance,
the presence of lung fields, ribs, and cardiac contours, common across chest
X-rays, may share the same or similar codes, providing a concise and shared
representation of prevalent features and creating a sparse but informative sum-
mary of the entire dataset. This introduces a strong structured inductive bias by
implicitly guiding the SSL model toward making assumptions about the common
patterns and structures present.
In this context, we propose an SSL framework named CoBooM: Codebook
Guided Bootstrapping for Medical Image Representation Learning. Specifically,
CoBooM encompasses a Context and Target Encoders for learning continuous
features and a Quantizer module to quantize the features using codebook and
integrate them with continuous features using the novel DiversiFuse sub-module.
The DiversiFuse sub-module utilizes cross-attention mechanisms that capitalize
on the complementary information offered by these two representations. The in-
troduction of the codebook encourages the SSL model to recognize and prioritize
the shared generalized common features during the training process. In addition,
the complementary integration of the continuous and discrete representations al-
lows the model to capture fine-grained features, contributing to a smooth and
rich embedding space. This leads to a more holistic and refined understanding of
the underlying data. We conduct experiments across diverse modalities to vali-
date its effectiveness, encompassing chest X-ray and fundus images. We evaluate
the proposed approach under linear probing and semi-supervised evaluation pro-
tocols and observe more than 3% performance gains in downstream classification
and segmentation tasks.
2 Background
3 Methodology
3.1 Quantizer
4 Experimental Setup
Cardio.
Atelec.
Effus.
Mass
Fig. 2. Diagnostic maps for Atelectasis, Effusion, Cardiomegaly, and Mass correspond-
ing to the X-ray images from NIH indicate that CoBooM captures pathological features
effectively compared to other best-performing baseline methods. The bounding box in-
dicates the ground truth.
Table 2. Semi-supervised fine-tuning evaluation in terms of AUC score (%) on the NIH,
MuRed, and the ODIR datasets, and dice score for the pneumothorax segmentation.
we observe the highest dice score of 59.6% with a margin of more than 2% com-
pared to the best-performing baseline method.
Optimal Performance with Minimal Fine-Tuning: Upon comparing the
results presented in Table 1 and 2, a noteworthy observation is that our pro-
posed method demonstrates minimal or no need for fine-tuning of the backbone
encoder, especially with lower numbers of labeled training samples. Specifically,
at 1%, the proposed method achieves AUC scores of 65.1% and 65.8% under
the linear-probing and semi-supervised fine-tuning evaluation protocols, respec-
tively. Similarly, for 5% and 10% labeled training samples, our method’s AUC
scores remain comparable with negligible margins. This trend contrasts baseline
methods, where a substantial performance gain is observed from linear probing
to semi-supervised fine-tuning. This highlights the effectiveness of our proposed
method while demonstrating a remarkable capacity to achieve optimal perfor-
mance with minimal fine-tuning to adapt to different tasks. This signifies the
proposed approach’s adaptability and highlights its potential to derive meaning-
ful and transferable representations with minimal fine-tuning, which aligns with
the practical requirements of real-world settings where computational resources
may be limited.
6 Conclusion
References
1. Society for imaging informatics in medicine: Siim-acr pneumothorax segmentation
(2019), https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/
overview/description
2. Azizi, S., Mustafa, B., Ryan, F., Beaver, Z., Freyberg, J., Deaton, J., Loh, A.,
Karthikesalingam, A., Kornblith, S., Chen, T., Natarajan, V., Norouzi, M.: Big
self-supervised models advance medical image classification (2021)
3. Bardes, A., Ponce, J., LeCun, Y.: Vicreg: Variance-invariance-covariance regular-
ization for self-supervised learning (2022)
4. Boyd, J., Liashuha, M., Deutsch, E., Paragios, N., Christodoulidis, S.,
Vakalopoulou, M.: Self-supervised representation learning using visual field expan-
sion on digital pathology. In: Proceedings of the IEEE/CVF International Confer-
ence on Computer Vision. pp. 639–647 (2021)
5. Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised
learning of visual features by contrasting cluster assignments. Advances in Neural
Information Processing Systems 33, 9912–9924 (2020)
6. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for con-
trastive learning of visual representations (2020)
7. Chen, X., Fan, H., Girshick, R., He, K.: Improved baselines with momentum con-
trastive learning. arXiv preprint arXiv:2003.04297 (2020)
8. Chen, X., He, K.: Exploring simple siamese representation learning (2020)
9. Ciga, O., Xu, T., Martel, A.L.: Self supervised contrastive learning for digital
histopathology. Machine Learning with Applications 7, 100198 (2022)
10. Dugas, E., Jared, Jorge, Cukierski, W.: Diabetic retinopathy detection (2015),
https://kaggle.com/competitions/diabetic-retinopathy-detection
11. Gangloff, H., Pham, M.T., Courtrai, L., Lefèvre, S.: Leveraging vector-quantized
variational autoencoder inner metrics for anomaly detection. In: 2022 26th Inter-
national Conference on Pattern Recognition (ICPR). pp. 435–441. IEEE (2022)
12. Gorade, V., Mittal, S., Jha, D., Bagci, U.: Synergynet: Bridging the gap between
discrete and continuous representations for precise medical image segmentation. In:
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer
Vision. pp. 7768–7777 (2024)
13. Grill, J.B., Strub, F., Altché, F., Tallec, C., Richemond, P.H., Buchatskaya, E., Do-
ersch, C., Pires, B.A., Guo, Z.D., Azar, M.G., Piot, B., Kavukcuoglu, K., Munos,
R., Valko, M.: Bootstrap your own latent: A new approach to self-supervised learn-
ing (2020)
14. Haghighi, F., Taher, M.R.H., Gotway, M.B., Liang, J.: Dira: Discriminative,
restorative, and adversarial learning for self-supervised medical image analysis.
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition. pp. 20824–20834 (2022)
15. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised
visual representation learning (2020)
10 A. Singh, D. Mishra
16. Huang, S.C., Pareek, A., Jensen, M., Lungren, M.P., Yeung, S., Chaudhari, A.S.:
Self-supervised learning for medical image classification: a systematic review and
implementation guidelines. NPJ Digital Medicine 6(1), 74 (2023)
17. kaggle: Ocular disease recognition, https://www.kaggle.com/andrewmvd/
ocular-disease-recognition-odir5k
18. Kaku, A., Upadhya, S., Razavian, N.: Intermediate layers matter in momentum
contrastive self supervised learning. Advances in Neural Information Processing
Systems 34, 24063–24074 (2021)
19. Kang, M., Song, H., Park, S., Yoo, D., Pereira, S.: Benchmarking self-supervised
learning on diverse pathology datasets. In: Proceedings of the IEEE/CVF Confer-
ence on Computer Vision and Pattern Recognition. pp. 3344–3354 (2023)
20. Kobayashi, K., Hataya, R., Kurose, Y., Miyake, M., Takahashi, M., Nakagawa, A.,
Harada, T., Hamamoto, R.: Decomposing normal and abnormal features of medical
images for content-based image retrieval of glioma imaging. Medical image analysis
74, 102227 (2021)
21. Rodríguez, M.A., AlMarzouqi, H., Liatsis, P.: Multi-label retinal disease classi-
fication using transformers. IEEE Journal of Biomedical and Health Informatics
(2022)
22. Sowrirajan, H., Yang, J., Ng, A.Y., Rajpurkar, P.: Moco-cxr: Moco pretraining
improves representation and transferability of chest x-ray models (2021)
23. Taher, M.R.H., Haghighi, F., Gotway, M.B., Liang, J.: Caid: Context-aware in-
stance discrimination for self-supervised learning in medical imaging. In: Interna-
tional Conference on Medical Imaging with Deep Learning. pp. 535–551. PMLR
(2022)
24. Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning.
Advances in neural information processing systems 30 (2017)
25. Vu, Y.N.T., Wang, R., Balachandar, N., Liu, C., Ng, A.Y., Rajpurkar, P.: Medaug:
Contrastive learning leveraging patient metadata improves representations for
chest x-ray interpretation (2021)
26. Wang, J., Han, X.H., Xu, Y., Lin, L., Hu, H., Jin, C., Chen, Y.W., et al.: Sparse
codebook model of local structures for retrieval of focal liver lesions using multi-
phase medical images. International journal of biomedical imaging 2017 (2017)
27. Wang, J., Zeng, Z., Chen, B., Dai, T., Xia, S.T.: Contrastive quantization with code
memory for unsupervised image retrieval. In: Proceedings of the AAAI Conference
on Artificial Intelligence. vol. 36, pp. 2468–2476 (2022)
28. Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M.: Chestx-ray8:
Hospital-scale chest x-ray database and benchmarks on weakly-supervised classi-
fication and localization of common thorax diseases. In: Proceedings of the IEEE
conference on computer vision and pattern recognition. pp. 2097–2106 (2017)
29. You, Y., Gitman, I., Ginsburg, B.: Large batch training of convolutional networks.
arXiv preprint arXiv:1708.03888 (2017)
30. Zbontar, J., Jing, L., Misra, I., LeCun, Y., Deny, S.: Barlow twins: Self-supervised
learning via redundancy reduction (2021)
31. Zhang, Y., Sun, K., Liu, Y., Ou, Z., Shen, D.: Vector quantized multi-modal guid-
ance for alzheimer’s disease diagnosis based on feature imputation. In: International
Workshop on Machine Learning in Medical Imaging. pp. 403–412. Springer (2023)
32. Zheng, C., Vedaldi, A.: Online clustered codebook. In: Proceedings of the
IEEE/CVF International Conference on Computer Vision. pp. 22798–22807 (2023)
33. Zhou, H.Y., Lu, C., Chen, C., Yang, S., Yu, Y.: A unified visual information preser-
vation framework for self-supervised pre-training in medical image analysis. IEEE
Transactions on Pattern Analysis and Machine Intelligence (2023)
CoBooM 11
34. Zhou, H.Y., Lu, C., Yang, S., Han, X., Yu, Y.: Preservational learning improves
self-supervised medical image models by reconstructing diverse contexts. In: Pro-
ceedings of the IEEE/CVF International Conference on Computer Vision. pp.
3499–3509 (2021)
35. Zhou, Y., Wang, B., Huang, L., Cui, S., Shao, L.: A benchmark for studying di-
abetic retinopathy: segmentation, grading, and transferability. IEEE Transactions
on Medical Imaging 40(3), 818–828 (2020)