MA-SAM: Modality-agnostic SAM Adaptation for 3D Medical Image Segmentation (2309.08842v1)

Published 16 Sep 2023 in cs.CV

Abstract: The Segment Anything Model (SAM), a foundation model for general image segmentation, has demonstrated impressive zero-shot performance across numerous natural image segmentation tasks. However, SAM's performance significantly declines when applied to medical images, primarily due to the substantial disparity between natural and medical image domains. To effectively adapt SAM to medical images, it is important to incorporate critical third-dimensional information, i.e., volumetric or temporal knowledge, during fine-tuning. Simultaneously, we aim to harness SAM's pre-trained weights within its original 2D backbone to the fullest extent. In this paper, we introduce a modality-agnostic SAM adaptation framework, named as MA-SAM, that is applicable to various volumetric and video medical data. Our method roots in the parameter-efficient fine-tuning strategy to update only a small portion of weight increments while preserving the majority of SAM's pre-trained weights. By injecting a series of 3D adapters into the transformer blocks of the image encoder, our method enables the pre-trained 2D backbone to extract third-dimensional information from input data. The effectiveness of our method has been comprehensively evaluated on four medical image segmentation tasks, by using 10 public datasets across CT, MRI, and surgical video data. Remarkably, without using any prompt, our method consistently outperforms various state-of-the-art 3D approaches, surpassing nnU-Net by 0.9%, 2.6%, and 9.9% in Dice for CT multi-organ segmentation, MRI prostate segmentation, and surgical scene segmentation respectively. Our model also demonstrates strong generalization, and excels in challenging tumor segmentation when prompts are used. Our code is available at: https://github.com/cchen-cc/MA-SAM.

PDF Abstract

Modality-Agnostic SAM Adaptation for 3D Medical Image Segmentation

The paper "MA-SAM: Modality-agnostic SAM Adaptation for 3D Medical Image Segmentation" introduces an innovative framework for adapting the Segment Anything Model (SAM) to medical image segmentation tasks, specifically targeting 3D medical data. SAM, originally developed for general-purpose image segmentation across a vast dataset of natural images, encounters significant challenges when directly applied to the medical domain due to substantial differences in texture and dimensional information. This paper presents a parameter-efficient adaptation of SAM, leveraging its pre-trained weights while injecting domain-specific knowledge through strategic modifications.

Enhancements and Methodology

The adaptation focuses on integrating volumetric and temporal information crucial for analyzing 3D medical images, such as those from CT, MRI, and surgical videos. The core methodology revolves around a parameter-efficient fine-tuning strategy using the FacT approach, which applies tensor decomposition to minimize the number of updated parameters. This technique efficiently updates only weight increments while retaining the majority of SAM's pre-trained weights, thereby facilitating effective adaptation without exhaustive computations.

Furthermore, the framework introduces 3D adapters into the SAM's transformer blocks. These adapters enable the extraction of the third-dimensional insights from medical imaging data, effectively bridging the gap between SAM's 2D pre-training background and the requirements of 3D medical image analysis. This integration allows for a coherent blend of SAM's generalizing capabilities with specific medical imaging needs.

Another crucial enhancement involves the adaptation of the SAM's mask decoder to accommodate a progressive up-sampling mechanism, thereby restoring prediction resolution to the original image dimensions. This is especially relevant in medical imaging, where high resolution is vital for identifying small, intricate anatomical structures and lesions.

Comparative Evaluation and Results

The paper rigorously evaluates MA-SAM across multiple medical image segmentation tasks utilizing diverse datasets: abdominal multi-organ segmentation with CT scans, prostate segmentation across multifarious MRI data sources, and surgical scene segmentation within video sequences. The framework consistently outperforms the current state-of-the-art models, including nnU-Net, and demonstrates a 9.9% improvement in Dice score for surgical scene segmentation—illustrating its ability to offer superior segmentation quality and adaptability across different modalities, all without the use of prompts.

Furthermore, MA-SAM exhibits robust generalization capabilities, a noteworthy trait for medical applications where data availability and consistency can vary significantly. The framework's few-shot learning capability further accentuates its applicability to new datasets without extensive retraining, highlighting its potential for broad deployment in medical imaging contexts.

Practical and Theoretical Implications

This research advocates for the potential of adapting large foundation models like SAM to the medical field, emphasizing the ability to effectively leverage extensive pre-training data while integrating domain-specific adaptations for multidimensional analysis. Practically, this suggests that similar model architectures might be adapted across various specialized domains with tailored modifications.

Theoretically, the approach illuminates the efficacy of PETL methods like FacT in transferring foundation model capabilities across different domains, underscoring future opportunities for research into hybrid models that marry extensive pre-trained knowledge with specialized adaptations. Such models could redefine domains beyond traditional boundaries, broadening AI's applicability span.

Conclusion

The MA-SAM framework marks a significant advancement in adapting a generalized foundation model, SAM, to the intricacies of medical image segmentation within 3D contexts. It sets a precedent in leveraging foundation models with strategic adaptations for specialized applications—an approach poised to influence future research in automated, adaptable AI systems across diversified data landscapes. As further development unfolds, models like MA-SAM could revolutionize medical imaging analysis, melding deep learning's generalizing prowess with precise medical insight.

PDF Markdown Bookmark Chat (Pro)

Authors (13)

Cheng Chen (262 papers)
Juzheng Miao (10 papers)
Dufan Wu (27 papers)
Zhiling Yan (12 papers)
Sekeun Kim (15 papers)
Jiang Hu (84 papers)
Aoxiao Zhong (16 papers)
Zhengliang Liu (91 papers)
Lichao Sun (186 papers)
Xiang Li (1002 papers)
Tianming Liu (161 papers)
Pheng-Ann Heng (196 papers)
Quanzheng Li (122 papers)

Citations (41)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - cchen-cc/MA-SAM: PyTorch implementation for MA-SAM (84 stars)