Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MIS-FM: 3D Medical Image Segmentation using Foundation Models Pretrained on a Large-Scale Unannotated Dataset (2306.16925v1)

Published 29 Jun 2023 in cs.CV

Abstract: Pretraining with large-scale 3D volumes has a potential for improving the segmentation performance on a target medical image dataset where the training images and annotations are limited. Due to the high cost of acquiring pixel-level segmentation annotations on the large-scale pretraining dataset, pretraining with unannotated images is highly desirable. In this work, we propose a novel self-supervised learning strategy named Volume Fusion (VF) for pretraining 3D segmentation models. It fuses several random patches from a foreground sub-volume to a background sub-volume based on a predefined set of discrete fusion coefficients, and forces the model to predict the fusion coefficient of each voxel, which is formulated as a self-supervised segmentation task without manual annotations. Additionally, we propose a novel network architecture based on parallel convolution and transformer blocks that is suitable to be transferred to different downstream segmentation tasks with various scales of organs and lesions. The proposed model was pretrained with 110k unannotated 3D CT volumes, and experiments with different downstream segmentation targets including head and neck organs, thoracic/abdominal organs showed that our pretrained model largely outperformed training from scratch and several state-of-the-art self-supervised training methods and segmentation models. The code and pretrained model are available at https://github.com/openmedlab/MIS-FM.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. 3D U-Net: Learning dense volumetric segmentation from sparse annotation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 424–432, 2016.
  2. Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. In ECCV workshop, pages 205–218, 2022.
  3. Emerging Properties in Self-Supervised Vision Transformers. Proceedings of the IEEE International Conference on Computer Vision, pages 9630–9640, 2021.
  4. TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv preprint arXiv:2102.04306, pages 1–13, 2021.
  5. Self-supervised learning for medical image analysis using image context restoration. Medical Image Analysis, 58:101539, 2019.
  6. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):834 – 848, 2017.
  7. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning, pages 1597–1607, 2020.
  8. Masked Image Modeling Advances 3D Medical Image Analysis. IEEE Winter Conference on Applications of Computer Vision, pages 1969–1979, 2023.
  9. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In ICLR, pages 1–21, 2021.
  10. A denoising self-supervised approach for COVID-19 pneumonia lesion segmentation with limited annotated CT images. In Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 3705–3708. IEEE, 2021.
  11. UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, volume 1, pages 61–71. Springer International Publishing, 2021.
  12. Unsupervised representation learning by predicting image rotations. In ICLR, pages 1–16, 2018.
  13. CA-Net: Comprehensive attention convolutional neural networks for explainable medical image segmentation. IEEE Transactions on Medical Imaging, 40(2):699–711, 2021.
  14. CE-Net: Context Encoder Network for 2D Medical Image Segmentation. IEEE Transactions on Medical Imaging, 38(10):2281–2292, 2019.
  15. UNETR: Transformers for 3D Medical Image Segmentation. In IEEE Winter Conference on Applications of Computer Vision, pages 1748–1758, 2022.
  16. Masked autoencoders are scalable vision learners. In IEEE Conference on Computer Vision and Pattern Recognition, pages 15979–15988, 2022.
  17. Momentum Contrast for Unsupervised Visual Representation Learning. In IEEE Conference on Computer Vision and Pattern Recognition, pages 9729–9738, 2020.
  18. Transformers in medical image analysis. Intelligent Medicine, 3(1):59–78, 2023.
  19. Fully transformer network for skin lesion analysis. Medical Image Analysis, 77:102357, 2022.
  20. STU-Net : Scalable and transferable medical image segmentation models empowered by large-scale supervised pre-training. arXiv:2304.06716, 2023.
  21. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods, 18(2):203–211, 2021.
  22. Self-Supervised Visual Feature Learning with Deep Neural Networks: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(11):4037–4058, 2021.
  23. SegTHOR: Segmentation of Thoracic Organs at Risk in CT images. International Conference on Image Processing Theory, Tools and Applications (IPTA), (July 2019):1–6, 2020.
  24. CLIP-driven universal model for organ segmentation and tumor detection. arXiv:2301.00785, pages 1–20, 2023.
  25. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In IEEE International Conference on Computer Vision, pages 9992–10002, 2021.
  26. Deep Convolutional Neural Networks for Computer-Aided Detection : CNN Architectures , Dataset Characteristics and Transfer Learning. IEEE Transactions on Medical Imaging, 35(5):1285–1298, 2016.
  27. AbdomenCT-1K: Is Abdominal Organ Segmentation a Solved Problem? IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):6695–6714, 2022.
  28. V-Net: Fully convolutional neural networks for volumetric medical image segmentation. In International Conference on 3D Vision, pages 565–571, 2016.
  29. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles. In European Conference on Computer Vision, pages 69–84, 2016.
  30. Context Encoders: Feature Learning by Inpainting. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2536–2544, 2016.
  31. Learning transferable visual models from natural language supervision. In ICML, pages 1–36, 2021.
  32. Transfusion: Understanding transfer learning for medical imaging. In NeurIPS, pages 1–11, 2019.
  33. Evaluation of segmentation methods on head and neck CT: Auto-segmentation challenge 2015. Medical Physics, 44(5):2020–2036, 2017.
  34. U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 234–241, 2015.
  35. Recalibrating fully convolutional networks with spatial and channel ’squeeze and excitation’ blocks. IEEE Transactions on Medical Imaging, 38(2):540–549, 2019.
  36. UNETR++: Delving into efficient and accurate 3D medical image segmentation. arXiv:2212.04497, 2022.
  37. An artificial intelligence framework for automatic segmentation and volumetry of vestibular schwannomas from contrast-enhanced T1-weighted and high-resolution T2-weighted MRI. Journal of Neurosurgery, 134(1):171–179, 2019.
  38. Deep learning in medical image analysis. Annual Review of Biomedical Engineering, 19(1):221–248, 2017.
  39. Embracing imperfect datasets: A review of deep learning solutions for medical image segmentation. Medical Image Analysis, 63(2020):101693, 2019.
  40. Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis. IEEE Conference on Computer Vision and Pattern Recognition, pages 20730–20740, 2022.
  41. Medical Transformer: Gated axial-attention for medical image segmentation. In International Conference on Medical Image Computing and Computer Assisted Intervention, pages 36–46, 2021.
  42. PyMIC: A deep learning toolkit for annotation-efficient medical image segmentation. Computer Methods and Programs in Biomedicine, 231:107398, 2023.
  43. Semi-supervised segmentation of radiation-induced pulmonary fibrosis from lung CT scans with multi-scale guided dense attention. IEEE Transactions on Medical Imaging, 41(3):531–542, 2022.
  44. TotalSegmentator: robust segmentation of 104 anatomical structures in CT images. arXiv:2208.05868, 2022.
  45. CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 171–180, 2021.
  46. DeSD: Self-supervised learning with deep self-distillation for 3D medical image segmentation. In Medical Image Computing and Computer-Assisted Intervention, volume 3, pages 545–555, 2022.
  47. Colorful Image Colorization. In European Conference on Computer Vision, pages 649–666, 2016.
  48. On the challenges and perspectives of foundation models for medical image analysis. arXiv:2306.05705, pages 1–8, 2023.
  49. Deep learning empowered volume delineation of whole-body organs-at-risk for accelerated radiotherapy. Nature Communications, 13:6566, 2022.
  50. nnFormer: Interleaved Transformer for Volumetric Segmentation. arXiv preprint arXiv: 2109.03201, 2022.
  51. Unet++: A nested u-net architecture for medical image segmentation. In MICCAI workshop on DLMIA, volume 11045, pages 3–11, 2018.
  52. Models Genesis. Medical Image Analysis, 67:101840, 2021.
  53. Rubik’s Cube+: A self-supervised feature learning framework for 3D medical image analysis. Medical Image Analysis, 64:101746, 2020.
Citations (21)

Summary

  • The paper introduces a novel self-supervised pretraining strategy called Volume Fusion (VF) that leverages 110K unannotated 3D CT volumes for image segmentation.
  • It combines convolutional and transformer blocks in the Parallel Convolution and Transformer Network (PCT-Net) to enhance local and global feature extraction.
  • Empirical results show significant improvements in Dice similarity and reduced surface distances, outperforming models trained from scratch and other methods.

Insights into MIS-FM: Utilization of Foundation Models for 3D Medical Image Segmentation

The paper presents a paper on leveraging large-scale unannotated datasets for pretraining foundation models specifically designed for 3D medical image segmentation. This paper addresses a pertinent challenge in medical imaging: the high cost and difficulty associated with acquiring and annotating large datasets necessary for training effective segmentation models.

The authors introduce a novel self-supervised learning strategy termed Volume Fusion (VF), which enables the utilization of unannotated 3D medical images in pretraining segmentation models. This approach generates training tasks designed to enhance the model's ability to perceive and segment images, effectively mimicking the segmentation process without requiring manual annotations. VF operates by fusing patches from a foreground sub-volume into a background sub-volume using predefined discrete fusion coefficients. The model is then tasked with predicting these coefficients for each voxel, thereby framing the problem as a pseudo-segmentation task.

In addition to the innovative pretraining strategy, the paper outlines a new network architecture combining convolutional and transformer blocks. Dubbed Parallel Convolution and Transformer Network (PCT-Net), it merges local feature extraction capabilities of CNNs with the global context understanding afforded by transformers. This hybrid architecture demonstrates significant adaptability and effectiveness across varied scales of anatomical structures and lesions in medical images.

The paper pretrains the model on a substantial dataset comprising 110,000 unannotated 3D CT volumes. The empirical evaluation spans multiple downstream segmentation tasks covering anatomical areas such as head and neck, thoracic, and abdominal organs. A notable outcome from these experiments is the consistent outperforming of models trained from scratch, alongside those leveraging other state-of-the-art self-supervised training methodologies. Noteworthy metrics reported include improvements in Dice similarity coefficients and reductions in Average Symmetric Surface Distances, highlighting the efficacy of the proposed pretraining approach.

The implications of this research are manifold. Practically, the proposed VF strategy offers a pathway to efficiently harness large volumes of unannotated medical image data, substantially lowering the barrier to developing high-performing medical image segmentation models. Theoretically, the paper opens avenues to further investigate how self-supervised learning can be optimized to bridge the gap between pretext and downstream tasks, especially in domains requiring intricate spatial understanding, such as radiology.

Looking forward, the authors hint at exploring the adaptability of their pretrained model and strategies across various imaging modalities and segmentation tasks, potentially extending to lesions beyond organs. Such developments could vastly improve diagnostic capabilities and precision in clinical practice by providing robust and generalizable image analysis tools.

The released code and pretrained models establish a foundation that can be built upon by researchers aiming to explore self-supervised learning applications in medical imaging. Continued exploration in this direction holds promise for creating more sophisticated models, potentially revolutionizing automated analysis and segmentation in diverse medical imaging domains.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub