MIS-FM: 3D Medical Image Segmentation using Foundation Models Pretrained on a Large-Scale Unannotated Dataset (2306.16925v1)
Abstract: Pretraining with large-scale 3D volumes has a potential for improving the segmentation performance on a target medical image dataset where the training images and annotations are limited. Due to the high cost of acquiring pixel-level segmentation annotations on the large-scale pretraining dataset, pretraining with unannotated images is highly desirable. In this work, we propose a novel self-supervised learning strategy named Volume Fusion (VF) for pretraining 3D segmentation models. It fuses several random patches from a foreground sub-volume to a background sub-volume based on a predefined set of discrete fusion coefficients, and forces the model to predict the fusion coefficient of each voxel, which is formulated as a self-supervised segmentation task without manual annotations. Additionally, we propose a novel network architecture based on parallel convolution and transformer blocks that is suitable to be transferred to different downstream segmentation tasks with various scales of organs and lesions. The proposed model was pretrained with 110k unannotated 3D CT volumes, and experiments with different downstream segmentation targets including head and neck organs, thoracic/abdominal organs showed that our pretrained model largely outperformed training from scratch and several state-of-the-art self-supervised training methods and segmentation models. The code and pretrained model are available at https://github.com/openmedlab/MIS-FM.
- 3D U-Net: Learning dense volumetric segmentation from sparse annotation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 424–432, 2016.
- Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. In ECCV workshop, pages 205–218, 2022.
- Emerging Properties in Self-Supervised Vision Transformers. Proceedings of the IEEE International Conference on Computer Vision, pages 9630–9640, 2021.
- TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv preprint arXiv:2102.04306, pages 1–13, 2021.
- Self-supervised learning for medical image analysis using image context restoration. Medical Image Analysis, 58:101539, 2019.
- DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):834 – 848, 2017.
- A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning, pages 1597–1607, 2020.
- Masked Image Modeling Advances 3D Medical Image Analysis. IEEE Winter Conference on Applications of Computer Vision, pages 1969–1979, 2023.
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In ICLR, pages 1–21, 2021.
- A denoising self-supervised approach for COVID-19 pneumonia lesion segmentation with limited annotated CT images. In Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 3705–3708. IEEE, 2021.
- UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, volume 1, pages 61–71. Springer International Publishing, 2021.
- Unsupervised representation learning by predicting image rotations. In ICLR, pages 1–16, 2018.
- CA-Net: Comprehensive attention convolutional neural networks for explainable medical image segmentation. IEEE Transactions on Medical Imaging, 40(2):699–711, 2021.
- CE-Net: Context Encoder Network for 2D Medical Image Segmentation. IEEE Transactions on Medical Imaging, 38(10):2281–2292, 2019.
- UNETR: Transformers for 3D Medical Image Segmentation. In IEEE Winter Conference on Applications of Computer Vision, pages 1748–1758, 2022.
- Masked autoencoders are scalable vision learners. In IEEE Conference on Computer Vision and Pattern Recognition, pages 15979–15988, 2022.
- Momentum Contrast for Unsupervised Visual Representation Learning. In IEEE Conference on Computer Vision and Pattern Recognition, pages 9729–9738, 2020.
- Transformers in medical image analysis. Intelligent Medicine, 3(1):59–78, 2023.
- Fully transformer network for skin lesion analysis. Medical Image Analysis, 77:102357, 2022.
- STU-Net : Scalable and transferable medical image segmentation models empowered by large-scale supervised pre-training. arXiv:2304.06716, 2023.
- nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods, 18(2):203–211, 2021.
- Self-Supervised Visual Feature Learning with Deep Neural Networks: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(11):4037–4058, 2021.
- SegTHOR: Segmentation of Thoracic Organs at Risk in CT images. International Conference on Image Processing Theory, Tools and Applications (IPTA), (July 2019):1–6, 2020.
- CLIP-driven universal model for organ segmentation and tumor detection. arXiv:2301.00785, pages 1–20, 2023.
- Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In IEEE International Conference on Computer Vision, pages 9992–10002, 2021.
- Deep Convolutional Neural Networks for Computer-Aided Detection : CNN Architectures , Dataset Characteristics and Transfer Learning. IEEE Transactions on Medical Imaging, 35(5):1285–1298, 2016.
- AbdomenCT-1K: Is Abdominal Organ Segmentation a Solved Problem? IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):6695–6714, 2022.
- V-Net: Fully convolutional neural networks for volumetric medical image segmentation. In International Conference on 3D Vision, pages 565–571, 2016.
- Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles. In European Conference on Computer Vision, pages 69–84, 2016.
- Context Encoders: Feature Learning by Inpainting. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2536–2544, 2016.
- Learning transferable visual models from natural language supervision. In ICML, pages 1–36, 2021.
- Transfusion: Understanding transfer learning for medical imaging. In NeurIPS, pages 1–11, 2019.
- Evaluation of segmentation methods on head and neck CT: Auto-segmentation challenge 2015. Medical Physics, 44(5):2020–2036, 2017.
- U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 234–241, 2015.
- Recalibrating fully convolutional networks with spatial and channel ’squeeze and excitation’ blocks. IEEE Transactions on Medical Imaging, 38(2):540–549, 2019.
- UNETR++: Delving into efficient and accurate 3D medical image segmentation. arXiv:2212.04497, 2022.
- An artificial intelligence framework for automatic segmentation and volumetry of vestibular schwannomas from contrast-enhanced T1-weighted and high-resolution T2-weighted MRI. Journal of Neurosurgery, 134(1):171–179, 2019.
- Deep learning in medical image analysis. Annual Review of Biomedical Engineering, 19(1):221–248, 2017.
- Embracing imperfect datasets: A review of deep learning solutions for medical image segmentation. Medical Image Analysis, 63(2020):101693, 2019.
- Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis. IEEE Conference on Computer Vision and Pattern Recognition, pages 20730–20740, 2022.
- Medical Transformer: Gated axial-attention for medical image segmentation. In International Conference on Medical Image Computing and Computer Assisted Intervention, pages 36–46, 2021.
- PyMIC: A deep learning toolkit for annotation-efficient medical image segmentation. Computer Methods and Programs in Biomedicine, 231:107398, 2023.
- Semi-supervised segmentation of radiation-induced pulmonary fibrosis from lung CT scans with multi-scale guided dense attention. IEEE Transactions on Medical Imaging, 41(3):531–542, 2022.
- TotalSegmentator: robust segmentation of 104 anatomical structures in CT images. arXiv:2208.05868, 2022.
- CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 171–180, 2021.
- DeSD: Self-supervised learning with deep self-distillation for 3D medical image segmentation. In Medical Image Computing and Computer-Assisted Intervention, volume 3, pages 545–555, 2022.
- Colorful Image Colorization. In European Conference on Computer Vision, pages 649–666, 2016.
- On the challenges and perspectives of foundation models for medical image analysis. arXiv:2306.05705, pages 1–8, 2023.
- Deep learning empowered volume delineation of whole-body organs-at-risk for accelerated radiotherapy. Nature Communications, 13:6566, 2022.
- nnFormer: Interleaved Transformer for Volumetric Segmentation. arXiv preprint arXiv: 2109.03201, 2022.
- Unet++: A nested u-net architecture for medical image segmentation. In MICCAI workshop on DLMIA, volume 11045, pages 3–11, 2018.
- Models Genesis. Medical Image Analysis, 67:101840, 2021.
- Rubik’s Cube+: A self-supervised feature learning framework for 3D medical image analysis. Medical Image Analysis, 64:101746, 2020.