MIS-FM: 3D Medical Image Segmentation using Foundation Models Pretrained on a Large-Scale Unannotated Dataset (2306.16925v1)

Published 29 Jun 2023 in cs.CV

Abstract: Pretraining with large-scale 3D volumes has a potential for improving the segmentation performance on a target medical image dataset where the training images and annotations are limited. Due to the high cost of acquiring pixel-level segmentation annotations on the large-scale pretraining dataset, pretraining with unannotated images is highly desirable. In this work, we propose a novel self-supervised learning strategy named Volume Fusion (VF) for pretraining 3D segmentation models. It fuses several random patches from a foreground sub-volume to a background sub-volume based on a predefined set of discrete fusion coefficients, and forces the model to predict the fusion coefficient of each voxel, which is formulated as a self-supervised segmentation task without manual annotations. Additionally, we propose a novel network architecture based on parallel convolution and transformer blocks that is suitable to be transferred to different downstream segmentation tasks with various scales of organs and lesions. The proposed model was pretrained with 110k unannotated 3D CT volumes, and experiments with different downstream segmentation targets including head and neck organs, thoracic/abdominal organs showed that our pretrained model largely outperformed training from scratch and several state-of-the-art self-supervised training methods and segmentation models. The code and pretrained model are available at https://github.com/openmedlab/MIS-FM.

References (53)

Citations (21)

View on Semantic Scholar

Summary

The paper introduces a novel self-supervised pretraining strategy called Volume Fusion (VF) that leverages 110K unannotated 3D CT volumes for image segmentation.
It combines convolutional and transformer blocks in the Parallel Convolution and Transformer Network (PCT-Net) to enhance local and global feature extraction.
Empirical results show significant improvements in Dice similarity and reduced surface distances, outperforming models trained from scratch and other methods.

Insights into MIS-FM: Utilization of Foundation Models for 3D Medical Image Segmentation

The paper presents a paper on leveraging large-scale unannotated datasets for pretraining foundation models specifically designed for 3D medical image segmentation. This paper addresses a pertinent challenge in medical imaging: the high cost and difficulty associated with acquiring and annotating large datasets necessary for training effective segmentation models.

The authors introduce a novel self-supervised learning strategy termed Volume Fusion (VF), which enables the utilization of unannotated 3D medical images in pretraining segmentation models. This approach generates training tasks designed to enhance the model's ability to perceive and segment images, effectively mimicking the segmentation process without requiring manual annotations. VF operates by fusing patches from a foreground sub-volume into a background sub-volume using predefined discrete fusion coefficients. The model is then tasked with predicting these coefficients for each voxel, thereby framing the problem as a pseudo-segmentation task.

In addition to the innovative pretraining strategy, the paper outlines a new network architecture combining convolutional and transformer blocks. Dubbed Parallel Convolution and Transformer Network (PCT-Net), it merges local feature extraction capabilities of CNNs with the global context understanding afforded by transformers. This hybrid architecture demonstrates significant adaptability and effectiveness across varied scales of anatomical structures and lesions in medical images.

The paper pretrains the model on a substantial dataset comprising 110,000 unannotated 3D CT volumes. The empirical evaluation spans multiple downstream segmentation tasks covering anatomical areas such as head and neck, thoracic, and abdominal organs. A notable outcome from these experiments is the consistent outperforming of models trained from scratch, alongside those leveraging other state-of-the-art self-supervised training methodologies. Noteworthy metrics reported include improvements in Dice similarity coefficients and reductions in Average Symmetric Surface Distances, highlighting the efficacy of the proposed pretraining approach.

The implications of this research are manifold. Practically, the proposed VF strategy offers a pathway to efficiently harness large volumes of unannotated medical image data, substantially lowering the barrier to developing high-performing medical image segmentation models. Theoretically, the paper opens avenues to further investigate how self-supervised learning can be optimized to bridge the gap between pretext and downstream tasks, especially in domains requiring intricate spatial understanding, such as radiology.

Looking forward, the authors hint at exploring the adaptability of their pretrained model and strategies across various imaging modalities and segmentation tasks, potentially extending to lesions beyond organs. Such developments could vastly improve diagnostic capabilities and precision in clinical practice by providing robust and generalizable image analysis tools.

The released code and pretrained models establish a foundation that can be built upon by researchers aiming to explore self-supervised learning applications in medical imaging. Continued exploration in this direction holds promise for creating more sophisticated models, potentially revolutionizing automated analysis and segmentation in diverse medical imaging domains.

PDF Markdown

GitHub

GitHub - openmedlab/MIS-FM (236 stars)

MIS-FM: 3D Medical Image Segmentation using Foundation Models Pretrained on a Large-Scale Unannotated Dataset (2306.16925v1)

Summary

Insights into MIS-FM: Utilization of Foundation Models for 3D Medical Image Segmentation

Related Papers

GitHub