Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Medical SAM 2: Segment medical images as video via Segment Anything Model 2 (2408.00874v2)

Published 1 Aug 2024 in cs.CV

Abstract: Medical image segmentation plays a pivotal role in clinical diagnostics and treatment planning, yet existing models often face challenges in generalization and in handling both 2D and 3D data uniformly. In this paper, we introduce Medical SAM 2 (MedSAM-2), a generalized auto-tracking model for universal 2D and 3D medical image segmentation. The core concept is to leverage the Segment Anything Model 2 (SAM2) pipeline to treat all 2D and 3D medical segmentation tasks as a video object tracking problem. To put it into practice, we propose a novel \emph{self-sorting memory bank} mechanism that dynamically selects informative embeddings based on confidence and dissimilarity, regardless of temporal order. This mechanism not only significantly improves performance in 3D medical image segmentation but also unlocks a \emph{One-Prompt Segmentation} capability for 2D images, allowing segmentation across multiple images from a single prompt without temporal relationships. We evaluated MedSAM-2 on five 2D tasks and nine 3D tasks, including white blood cells, optic cups, retinal vessels, mandibles, coronary arteries, kidney tumors, liver tumors, breast cancer, nasopharynx cancer, vestibular schwannoma, mediastinal lymph nodules, cerebral artery, inferior alveolar nerve, and abdominal organs, comparing it against state-of-the-art (SOTA) models in task-tailored, general and interactive segmentation settings. Our findings demonstrate that MedSAM-2 surpasses a wide range of existing models and updates new SOTA on several benchmarks. The code is released on the project page: https://supermedintel.github.io/Medical-SAM2/.

Citations (29)

Summary

  • The paper introduces MedSAM-2—the first SAM-2 based model that segments both 2D and 3D medical images by treating them as video sequences.
  • It employs innovative techniques like One-prompt Segmentation and a confidence memory bank to reduce user interaction and boost segmentation accuracy.
  • Evaluations across 15 benchmarks show MedSAM-2 outperforms state-of-the-art models, achieving an average Dice score of 88.57% in multi-organ segmentation.

Overview of Medical SAM 2: Segment Medical Images as Video via Segment Anything Model 2

The paper "Medical SAM 2: Segment Medical Images as Video via Segment Anything Model 2" authored by Jiayuan Zhu, Yunli Qi, and Junde Wu from the University of Oxford presents Medical SAM 2 (MedSAM-2), which extends the Segment Anything Model 2 (SAM 2) methodology to both 2D and 3D medical image segmentation tasks. The research introduces an innovative approach by treating medical images as video sequences, thus leveraging SAM 2's capabilities in handling complex image segmentation scenarios including object motion and occlusion.

Key Contributions

  1. Novel Segmentation Model: MedSAM-2 is the first medical image segmentation model built on the SAM-2 framework, addressing unique challenges in medical imaging.
  2. Medical-Images-as-Videos Philosophy: This approach enables MedSAM-2 to tackle 3D medical images effectively and introduces the One-prompt Segmentation capability, which significantly reduces the interaction required from users.
  3. Unique Modules and Pipelines: MedSAM-2 incorporates novel components such as a confidence memory bank and a weighted pick-up strategy to enhance its functionality.
  4. Benchmark Evaluations: Extensive evaluation across 15 different benchmarks with 26 distinct tasks demonstrates MedSAM-2's superior performance and state-of-the-art results in both 2D and 3D medical image segmentation.

Methodological Advancements

Adapting SAM 2 for 3D Medical Images

Medical images often exist in 3D formats like CT, MRI, and ultrasound. Conventional deep learning models primarily designed for 2D images face limitations when applied to 3D data. The authors circumvent this issue by treating 3D medical images as sequences of 2D slices. SAM 2, which excels in video segmentation, leverages temporal associations between contiguous slices to manage challenges like blurry boundaries caused by patient or device motion, thus facilitating accurate segmentation.

One-prompt Segmentation for 2D Images

For 2D medical images without temporal connections, MedSAM-2 unlocks a One-prompt Segmentation capability. By treating a collection of these images as a 'medical image flow', MedSAM-2 can autonomously segment similar objects across different images from a single initial prompt. This reduces the need for user inputs, enhancing efficiency, especially in clinical settings.

Results and Comparisons

The performance of MedSAM-2 was verified against an array of state-of-the-art segmentation methods, presenting significant improvements:

  • 3D Medical Image Segmentation: Evaluations on the BTCV multi-organ segmentation dataset show that MedSAM-2 achieves an average Dice score of 88.57%, surpassing both fully supervised models like MedSegDiff and interactive models like MedSAM.
  • 2D Medical Image Segmentation: MedSAM-2 consistently outperforms other methods on tasks involving optic discs, brain tumors, thyroid nodules, and skin lesions. This indicates its strong generalization across different medical imaging modalities.

Notably, MedSAM-2 also excels under the One-prompt Segmentation setting, outperforming few-shot and one-shot models by substantial margins across various tasks, demonstrating its robust cross-task generalization capabilities.

Implications and Future Directions

Practical Implications: The ability to reduce user interaction while maintaining high accuracy is particularly beneficial for clinicians, as it simplifies the segmentation process and allows for more efficient workflows. This has direct implications for diagnostic procedures, treatment planning, and image-guided surgeries.

Theoretical Implications: The novel approach of treating medical images as videos opens new pathways for leveraging video processing techniques in medical image analysis. The successful integration of components like the confidence memory bank further highlights the potential for enhancing model robustness and accuracy through innovative memory mechanisms.

Future Developments: Future research could explore real-time processing enhancements to further improve the applicability of MedSAM-2 in clinical settings. Additionally, adapting this model to other segmentation scenarios beyond medical imaging could reveal broader applications of the video segmentation framework introduced.

Overall, the contributions and findings of this paper signify a noteworthy step forward in medical image segmentation, demonstrating substantial improvements in both accuracy and user interaction efficiency.

Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com