VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis (2402.17300v2)
Abstract: Self-Supervised Learning (SSL) has demonstrated promising results in 3D medical image analysis. However, the lack of high-level semantics in pre-training still heavily hinders the performance of downstream tasks. We observe that 3D medical images contain relatively consistent contextual position information, i.e., consistent geometric relations between different organs, which leads to a potential way for us to learn consistent semantic representations in pre-training. In this paper, we propose a simple-yet-effective Volume Contrast (VoCo) framework to leverage the contextual position priors for pre-training. Specifically, we first generate a group of base crops from different regions while enforcing feature discrepancy among them, where we employ them as class assignments of different regions. Then, we randomly crop sub-volumes and predict them belonging to which class (located at which region) by contrasting their similarity to different base crops, which can be seen as predicting contextual positions of different sub-volumes. Through this pretext task, VoCo implicitly encodes the contextual position priors into model representations without the guidance of annotations, enabling us to effectively improve the performance of downstream tasks that require high-level semantics. Extensive experimental results on six downstream tasks demonstrate the superior effectiveness of VoCo. Code will be available at https://github.com/Luffy03/VoCo.
- Michela Antonelli et al. The medical segmentation decathlon. Nature Commun., 13(1):4128, 2022.
- Shekoofeh Azizi et al. Big self-supervised models advance medical image classification. In ICCV, pages 3478–3488, 2021.
- Vicregl: Self-supervised learning of local visual features. NIPS, 35:8799–8810, 2022.
- Patrick Bilic et al. The liver tumor segmentation benchmark (lits). Medical Image Analy., 84:102680, 2023.
- Deep clustering for unsupervised learning of visual features. In ECCV, pages 132–149, 2018.
- Mathilde Caron et al. Unsupervised learning of visual features by contrasting cluster assignments. NIPS, 33:9912–9924, 2020.
- Mathilde Caron et al. Emerging properties in self-supervised vision transformers. In ICCV, pages 9650–9660, 2021.
- Location-aware self-supervised transformers. arXiv preprint arXiv:2212.02400, 2022.
- Jigsaw clustering for unsupervised visual representation learning. In CVPR, pages 11526–11535, 2021.
- Ting Chen et al. A simple framework for contrastive learning of visual representations. In ICML, pages 1597–1607, 2020.
- Exploring simple siamese representation learning. In CVPR, pages 15750–15758, 2021.
- An empirical study of training self-supervised vision transformers. arXiv preprint arXiv:2104.02057, 2021.
- Zekai Chen et al. Masked image modeling advances 3d medical image analysis. In WACV, pages 1970–1980, 2023.
- The cancer imaging archive (tcia): maintaining and operating a public information repository. Jour. of Dig. Imag., 26:1045–1057, 2013.
- Jiequan Cui et al. Parametric contrastive learning. In ICCV, pages 715–724, 2021.
- Jiequan Cui et al. Generalized parametric contrastive learning. IEEE Trans. Pattern Analy. Mach. Intell., 2023.
- Unsupervised visual representation learning by context prediction. In ICCV, pages 1422–1430, 2015.
- Alexey Dosovitskiy et al. An image is worth 16x16 words: Transformers for image recognition at scale. ICLR, 2020.
- Weakly-supervised 3d medical image segmentation using geometric prior and contrastive similarity. IEEE Trans. Medi. Imag., 2023.
- Disco: Remedying self-supervised learning on lightweight models with distilled contrastive learning. In ECCV, pages 237–253, 2022.
- Florin-Cristian Ghesu et al. Multi-scale deep reinforcement learning for real-time 3d-landmark detection in ct scans. IEEE Trans. Pattern Anal. Mach. Intell., 41(1):176–189, 2017.
- Jean-Bastien Grill et al. Bootstrap your own latent-a new approach to self-supervised learning. NIPS, 33:21271–21284, 2020.
- Katharina Grünberg et al. Annotating medical image data. Medical Image Analy., pages 45–67, 2017.
- Fatemeh Haghighi et al. Transferable visual words: Exploiting the semantics of anatomical patterns for self-supervised learning. IEEE Trans. Medical Imag., 40(10):2857–2868, 2021.
- Fatemeh Haghighi et al. Dira: Discriminative, restorative, and adversarial learning for self-supervised medical image analysis. In CVPR, pages 20824–20834, 2022.
- Ali Hatamizadeh et al. Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. In MICCAIW, pages 272–284, 2021.
- Ali Hatamizadeh et al. Unetr: Transformers for 3d medical image segmentation. In WACV, pages 574–584, 2022.
- Kaiming He et al. Masked autoencoders are scalable vision learners. In CVPR, pages 16000–16009, 2022.
- Momentum contrast for unsupervised visual representation learning. In CVPR, pages 9729–9738, 2020.
- Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
- Intra-and inter-slice contrastive learning for point supervised oct fluid segmentation. IEEE Trans. Image Process., 31:1870–1881, 2022.
- Yuting He et al. Geometric visual similarity learning in 3d medical image self-supervised pre-training. In CVPR, pages 9538–9547, 2023.
- nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods, 18(2):203–211, 2021.
- Yankai Jiang et al. Anatomical invariance modeling and semantic alignment for self-supervised learning in 3d medical image analysis. In ICCV, pages 15859–15869, 2023.
- Bennett Landman et al. Miccai multi-atlas labeling beyond the cranial vault–workshop and challenge. In MICCAIW, volume 5, page 12, 2015.
- Jie Liu et al. Clip-driven universal model for organ segmentation and tumor detection. In ICCV, pages 21152–21164, 2023.
- Qiang Liu et al. A multi-level label-aware semi-supervised framework for remote sensing scene classification. IEEE Trans. Geosci. Remote Sens., 2023.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Jun Ma et al. Abdomenct-1k: Is abdominal organ segmentation a solved problem? IEEE Trans. Pattern Anal. Mach. Intell., 44(10):6695–6714, 2021.
- Improvements to context based self-supervised learning. In CVPR, pages 9339–9348, 2018.
- Nguyen et al. Joint self-supervised image-volume representation learning with intra-inter contrastive clustering. In AAAI, pages 14426–14435, 2023.
- Unsupervised learning of visual representations by solving jigsaw puzzles. In ECCV, pages 69–84. Springer, 2016.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- Maxime Oquab et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
- U-net: Convolutional networks for biomedical image segmentation. In MICCAI, pages 234–241, 2015.
- Rodrigo Santa Cruz et al. Deeppermnet: Visual permutation learning. In CVPR, pages 3949–3957, 2017.
- Arnaud Arindra Adiyoso Setio et al. Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the luna16 challenge. Medical Image Analy., 42:1–13, 2017.
- Amber L Simpson et al. A large annotated medical image dataset for the development and evaluation of segmentation algorithms. arXiv preprint arXiv:1902.09063, 2019.
- Aiham Taleb et al. 3d self-supervised methods for medical imaging. NIPS, 33:18158–18172, 2020.
- Yucheng Tang et al. Self-supervised pre-training of swin transformers for 3d medical image analysis. In CVPR, pages 20730–20740, 2022.
- Xing Tao et al. Revisiting rubik’s cube: self-supervised learning with volume-wise transformation for 3d medical image segmentation. In MICCAI, pages 238–248, 2020.
- Guotai Wang et al. Deepigeos: a deep interactive geodesic framework for medical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 41(7):1559–1572, 2018.
- Xiaosong Wang et al. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In CVPR, pages 2097–2106, 2017.
- Yiqing Wang et al. Swinmm: masked multi-view with swin transformers for 3d medical image segmentation. In MICCAI, 2023.
- Xin Wen et al. Self-supervised visual representation learning with semantic grouping. NIPS, 35:16423–16438, 2022.
- Querying labeled for unlabeled: Cross-image semantic consistency guided semi-supervised semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 45(7):8827–8844, Jul. 2023.
- Deep bilateral filtering network for point-supervised semantic segmentation in remote sensing images. IEEE Trans. Image Process., 31:7419–7434, 2022.
- Deep covariance alignment for domain adaptive remote sensing image segmentation. IEEE Trans. Geosci. Remote Sens., 60:1–11, 2022.
- Sparsely annotated semantic segmentation with adaptive gaussian mixtures. In CVPR, pages 15454–15464, 2023.
- Unimiss: Universal medical self-supervised learning via breaking dimensionality barrier. In ECCV, pages 558–575, 2022.
- Zhenda Xie et al. Simmim: A simple framework for masked image modeling. In CVPR, pages 9653–9663, 2022.
- Shuangfei Zhai et al. Position prediction as an effective pretraining strategy. arXiv preprint arXiv:2207.07611, 2022.
- Dive into the details of self-supervised learning for medical image analysis. Medical Image Anal., 89:102879, 2023.
- Hongyi Zhang et al. Mixup: Beyond empirical risk minimization. In ICLR, 2018.
- Kang Zhang et al. Clinically applicable ai system for accurate diagnosis, quantitative measurements, and prognosis of covid-19 pneumonia using computed tomography. Cell, 181(6):1423–1433, 2020.
- Positional label for self-supervised vision transformer. In AAAI, pages 3516–3524, 2023.
- Hong-Yu Zhou et al. Comparing to learn: Surpassing imagenet pretraining on radiographs by comparing image representations. In MICCAI, pages 398–407, 2020.
- Hong-Yu Zhou et al. Preservational learning improves self-supervised medical image models by reconstructing diverse contexts. In ICCV, pages 3499–3509, 2021.
- Hong-Yu Zhou et al. A unified visual information preservation framework for self-supervised pre-training in medical image analysis. IEEE Trans. Pattern Anal. Mach. Intell., 2023.
- Zongwei Zhou et al. Models genesis. Medical Image Analy., 67:101840, 2021.
- Advancing volumetric medical image segmentation via global-local masked autoencoder. arXiv preprint arXiv:2306.08913, 2023.
- Xiahai Zhuang. Multivariate mixture model for myocardial segmentation combining multi-source images. IEEE Trans. Pattern Analy. Mach. Intell., 41(12):2933–2946, 2018.
- Xinrui Zhuang et al. Self-supervised feature learning for 3d medical images by playing a rubik’s cube. In MICCAI, pages 420–428, 2019.