Beyond Adapting SAM: Towards End-to-End Ultrasound Image Segmentation via Auto Prompting (2309.06824v2)

Published 13 Sep 2023 in cs.CV and cs.AI

Abstract: End-to-end medical image segmentation is of great value for computer-aided diagnosis dominated by task-specific models, usually suffering from poor generalization. With recent breakthroughs brought by the segment anything model (SAM) for universal image segmentation, extensive efforts have been made to adapt SAM for medical imaging but still encounter two major issues: 1) severe performance degradation and limited generalization without proper adaptation, and 2) semi-automatic segmentation relying on accurate manual prompts for interaction. In this work, we propose SAMUS as a universal model tailored for ultrasound image segmentation and further enable it to work in an end-to-end manner denoted as AutoSAMUS. Specifically, in SAMUS, a parallel CNN branch is introduced to supplement local information through cross-branch attention, and a feature adapter and a position adapter are jointly used to adapt SAM from natural to ultrasound domains while reducing training complexity. AutoSAMUS is realized by introducing an auto prompt generator (APG) to replace the manual prompt encoder of SAMUS to automatically generate prompt embeddings. A comprehensive ultrasound dataset, comprising about 30k images and 69k masks and covering six object categories, is collected for verification. Extensive comparison experiments demonstrate the superiority of SAMUS and AutoSAMUS against the state-of-the-art task-specific and SAM-based foundation models. We believe the auto-prompted SAM-based model has the potential to become a new paradigm for end-to-end medical image segmentation and deserves more exploration. Code and data are available at https://github.com/xianlin7/SAMUS.

PDF Abstract

Overview of SAMUS: Ultrasound Image Segmentation Enhancement

The paper introduces SAMUS, a model designed to adapt the Segment Anything Model (SAM) for ultrasound image segmentation, focusing on clinical applicability and robust generalization. While SAM has demonstrated superior segmentation capacity across various natural image domains, its performance deteriorates in medical imaging due to distinct domain-specific challenges, such as low contrast and complex object shapes. SAMUS aims to address these limitations.

Objectives and Methodology

SAMUS integrates a parallel CNN branch with a Vision Transformer (ViT) to enhance feature extraction, specifically by adding local detail to the global ViT features. This hybrid architecture is crucial for medical imaging where small and low-contrast details are significant. The model incorporates a cross-branch attention module that enables interaction between the CNN branch, which captures low-level local features, and the ViT branch, which models global dependencies. This integration refines boundary detection and object identification, centralizing the recognition of complex and small medical targets.

To facilitate domain adaptation, SAMUS employs a feature adapter for fine-tuning the ViT to medical images and a position adapter to handle the shift from high-resolution (1024×1024) to lower-resolution (256×256) images. This resolution adaptation allows deployment on entry-level GPUs, thus lowering computational costs and enhancing accessibility in clinical settings.

Dataset and Performance Evaluation

SAMUS was validated on a considerably extensive ultrasound dataset, US30K, comprising approximately 30,000 images across six object categories. Results indicate that SAMUS outperforms state-of-the-art models, including task-specific and universal foundation models, when assessed for both task-specific tasks and generalization to new domains. Specifically, SAMUS demonstrated superior Dice scores and reduced Hausdorff distances across various datasets including TN3K, BUSI, CAMUS-LV, CAMUS-MYO, and CAMUS-LA, underscoring its enhanced segmentation accuracy and edge-detection capability.

Key Results and Implications

The paper claims that SAMUS achieves notable segmentation improvement with a considerably reduced GPU memory footprint—down to 28% of SAM’s demands—and accelerates inference speed, making it well-suited for routine clinical deployment. The extensive adaptation techniques manifest in remarkable generalization performance across unseen datasets, marking a substantial improvement over baseline SAM adaptations such as MedSAM and SAMed.

Future Directions

The development of SAMUS implies significant contributions to universal model applications in medical imaging. By reducing resource requirements and integrating effective domain adaptation techniques, SAMUS opens pathways for broader deployment of advanced models across lesser-equipped clinical environments. The successful integration of CNNs and transformers in this manner could inspire further research into hybrid architectures for medical imaging tasks. Additionally, the availability of the US30K dataset contributes a valuable resource for continued exploration and benchmarking in the field of ultrasound image segmentation.

Conclusion

SAMUS exemplifies an effective adaptation of a universal image segmentation model specifically tailored to the complexity and constraints of medical imaging. Its demonstrated capacity to deliver high performance with reduced resource demands highlights its potential for real-world clinical implementation. Future research can leverage this foundation to push the boundaries of model applicability and efficacy in diverse medical imaging modalities.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Xian Lin (13 papers)
Yangyang Xiang (3 papers)
Zengqiang Yan (20 papers)
Li Yu (193 papers)

Citations (10)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - xianlin7/SAMUS (145 stars)