DeSAM: Decoupled Segment Anything Model for Generalizable Medical Image Segmentation (2306.00499v2)

Published 1 Jun 2023 in eess.IV and cs.CV

Abstract: Deep learning-based medical image segmentation models often suffer from domain shift, where the models trained on a source domain do not generalize well to other unseen domains. As a prompt-driven foundation model with powerful generalization capabilities, the Segment Anything Model (SAM) shows potential for improving the cross-domain robustness of medical image segmentation. However, SAM performs significantly worse in automatic segmentation scenarios than when manually prompted, hindering its direct application to domain generalization. Upon further investigation, we discovered that the degradation in performance was related to the coupling effect of inevitable poor prompts and mask generation. To address the coupling effect, we propose the Decoupled SAM (DeSAM). DeSAM modifies SAM's mask decoder by introducing two new modules: a prompt-relevant IoU module (PRIM) and a prompt-decoupled mask module (PDMM). PRIM predicts the IoU score and generates mask embeddings, while PDMM extracts multi-scale features from the intermediate layers of the image encoder and fuses them with the mask embeddings from PRIM to generate the final segmentation mask. This decoupled design allows DeSAM to leverage the pre-trained weights while minimizing the performance degradation caused by poor prompts. We conducted experiments on publicly available cross-site prostate and cross-modality abdominal image segmentation datasets. The results show that our DeSAM leads to a substantial performance improvement over previous state-of-theart domain generalization methods. The code is publicly available at https://github.com/yifangao112/DeSAM.

PDF Abstract

Overview of DeSAM: Decoupling Segment Anything Model for Generalizable Medical Image Segmentation

The paper "DeSAM: Decoupling Segment Anything Model for Generalizable Medical Image Segmentation" addresses a significant challenge in the domain of medical image segmentation. Deep learning models have demonstrated exceptional capabilities within domain-specific datasets; however, their performance notably deteriorates when applied to unseen domains due to domain shifts. Here, the authors explore the potential for improving the Segment Anything Model (SAM), which traditionally excels in natural image segmentation, by decoupling its architecture to enhance its cross-domain robustness in medical contexts.

Decoupling Strategy

The fundamental innovation of this work is the introduction of the Decoupling Segment Anything Model (DeSAM), which aims to separate the mask generation process from prompt embeddings. This is crucial because the coupling effect of SAM, where poor prompts (such as those lying outside the target or excessive in size) can adversely affect mask segmentation, limits its applicability in fully automated medical scenarios. DeSAM employs a modified architecture by integrating a Prompt-Invariant Mask Module (PIMM) and a Prompt-Relevant IoU Module (PRIM), thereby decoupling mask generation from prompt influence.

Technical Approach and Architecture

The DeSAM architecture utilizes a pre-trained model from SAM's Vision Transformer (ViT-H) as its foundational encoder while keeping it frozen to avoid the computational overhead associated with training. This design choice allows the image embeddings to be precomputed, reducing GPU memory requirements during model training.

PIMM: Inspired by U-Net and UNETR architectures, PIMM uses multiple-scale image embeddings to generate masks, independent of prompt embeddings, allowing for the learning of prompt-invariant features.
PRIM: While retaining the transformer-based structure similar to SAM's decoder, PRIM discriminates mask embeddings from prompt influences by integrating an IoU prediction head for prompt relevance assessment, which enhances segmentation quality even in fully automated use cases.

Results and Performance

The empirical evaluation of DeSAM was conducted on a multi-site prostate dataset. The results indicate a notable improvement in Dice score by an average of 8.96%, outperforming baseline methods and other state-of-the-art domain generalization techniques. DeSAM showed marked improvements in segmentation accuracy across six distinct domains, emphasizing its robustness to varying distribution shifts seen in real-world clinical applications.

The decoupling approach has been particularly beneficial in reducing false-positive predictions commonly observed in grid points mode of automatic segmentation, underscoring the efficacy of separating mask generation from prompt embeddings. Furthermore, DeSAM's efficiency is highlighted by its ability to operate on consumer-grade hardware without compromising performance.

Discussion and Implications

The introduction of DeSAM represents a significant advancement in addressing cross-domain generalizability, a critical barrier in the deployment of automated diagnostic tools in clinical settings. The methodological decoupling introduced here is not only computationally efficient but also scalable, providing a pathway to leverage pre-trained models in versatile medical imaging tasks without the burden of complex data augmentation pipelines or exorbitant computational demands.

Future research directions could explore the integration of DeSAM with other domain generalization strategies to further boost robust adaptability across diverse medical imaging modalities. This work sets a precedent for investigating decoupling mechanisms as a broader strategy in enhancing the adaptability and resilience of machine learning models within the medical domain.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Yifan Gao (69 papers)
Wei Xia (147 papers)
Dingdu Hu (1 paper)
Xin Gao (208 papers)
Wenkui Wang (2 papers)

Citations (21)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - yifangao112/DeSAM: The official repository for DeSAM: Decoupling Segment Anything Model for Generalizable Medical Image Segmentation. (128 stars)