SegVol: Universal and Interactive Volumetric Medical Image Segmentation (2311.13385v4)

Published 22 Nov 2023 in cs.CV

Abstract: Precise image segmentation provides clinical study with instructive information. Despite the remarkable progress achieved in medical image segmentation, there is still an absence of a 3D foundation segmentation model that can segment a wide range of anatomical categories with easy user interaction. In this paper, we propose a 3D foundation segmentation model, named SegVol, supporting universal and interactive volumetric medical image segmentation. By scaling up training data to 90K unlabeled Computed Tomography (CT) volumes and 6K labeled CT volumes, this foundation model supports the segmentation of over 200 anatomical categories using semantic and spatial prompts. To facilitate efficient and precise inference on volumetric images, we design a zoom-out-zoom-in mechanism. Extensive experiments on 22 anatomical segmentation tasks verify that SegVol outperforms the competitors in 19 tasks, with improvements up to 37.24% compared to the runner-up methods. We demonstrate the effectiveness and importance of specific designs by ablation study. We expect this foundation model can promote the development of volumetric medical image analysis. The model and code are publicly available at: https://github.com/BAAI-DCAI/SegVol.

Authors (4)

Yuxin Du (8 papers)
Fan Bai (38 papers)
Tiejun Huang (130 papers)
Bo Zhao (242 papers)

Citations (27)

View on Semantic Scholar

Summary

A Comprehensive Examination of SegVol: Universal and Interactive Volumetric Medical Image Segmentation

The paper introduces SegVol, a 3D foundational model aimed at performing universal and interactive volumetric medical image segmentation. The model seeks to address existing limitations in medical image segmentation by enabling segmentation across over 200 anatomical categories, with capabilities to process semantic and spatial prompts. This research is pivotal given the complexity and broad spectrum of challenges in volumetric image segmentation, such as those encountered in CT and MRI scans involving various organs and lesions.

Methodology and Model Architecture

The SegVol model is built on a framework that integrates multiple key components: an image encoder using the ViT (Vision Transformer) architecture, a text encoder derived from the CLIP model, a prompt encoder for handling spatial inputs, and a mask decoder for precise output prediction. The model is designed to be lightweight, which enhances its applicability in practical medical settings.

A notable aspect of SegVol is its support for three types of prompts: bounding boxes, points, and text. This multimodal approach allows users to interact with the segmentation process in a novel manner, increasing the model's accuracy and usability for clinical tasks. The authors emphasize the use of a synergistic approach that combines these prompts to achieve high-precision results.

SegVol employs a comprehensive pre-training phase on 96,000 CT scans using the SimMIM algorithm and subsequently undergoes supervised fine-tuning on a curated dataset of 25 volumetric medical image datasets. This training regime ensures the model's robustness across a wide range of segmentation tasks.

Experimental Paradigm

The evaluative framework employed for SegVol is robust, consisting of assessments on both internal validation tasks and large-scale external validation. Internally, SegVol demonstrates its superiority over traditional task-specific models such as 3DUX-Net, SwinUNETR, and nnU-Net. The paper reports a significant improvement in Dice scores, illustrating SegVol's enhanced spatial and semantic understanding.

Externally, comparisons were made with interactive segmentation models like MedSAM and SAM-MED3D. Once again, SegVol outperformed its counterparts, showcasing marked improvements in segmentation accuracy, particularly in complex anatomical structures. The ablation studies further validated the benefit of prompt integration, where semantic and spatial prompts collectively facilitated improved segmentation performance.

Results and Implications

The model's capability to generalize across unseen modalities, such as MRI, is noteworthy. The generalization tests on MRI data from the CHAOS dataset indicated consistent performance, reinforcing SegVol's adaptability and potential utility in diverse clinical environments.

One of the revolutionary aspects of this work is the introduction of a zoom-out-zoom-in technique, which balances computational efficiency with segmentation precision. This approach supports fast yet detailed evaluation of volumetric data, a critical requirement in clinical diagnostics.

SegVol aims to be a versatile tool in medical imaging, potentially aiding various applications such as tumor monitoring, surgical planning, and therapy optimization. Its ability to facilitate real-time, high-precision segmentation can significantly enhance diagnostic workflows.

Future Directions

While SegVol marks a substantial advancement in the field, the research identifies potential areas for future work, such as extending its application to complex referring expression segmentation tasks and further enhancing its adaptive training capabilities with new datasets.

Conclusively, SegVol stands as a robust foundation model in medical image segmentation, setting a new standard for future research and application in the domain. Its comprehensive design, extensive validation, and promising results underscore its potential as a key asset in medical image analysis.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - BAAI-DCAI/SegVol: The official code for "SegVol: Universal and Interactive Volumetric Medical Image Segmentation". (178 stars)