Papers
Topics
Authors
Recent
2000 character limit reached

Medical SAM 2: Segment medical images as video via Segment Anything Model 2

Published 1 Aug 2024 in cs.CV | (2408.00874v2)

Abstract: Medical image segmentation plays a pivotal role in clinical diagnostics and treatment planning, yet existing models often face challenges in generalization and in handling both 2D and 3D data uniformly. In this paper, we introduce Medical SAM 2 (MedSAM-2), a generalized auto-tracking model for universal 2D and 3D medical image segmentation. The core concept is to leverage the Segment Anything Model 2 (SAM2) pipeline to treat all 2D and 3D medical segmentation tasks as a video object tracking problem. To put it into practice, we propose a novel \emph{self-sorting memory bank} mechanism that dynamically selects informative embeddings based on confidence and dissimilarity, regardless of temporal order. This mechanism not only significantly improves performance in 3D medical image segmentation but also unlocks a \emph{One-Prompt Segmentation} capability for 2D images, allowing segmentation across multiple images from a single prompt without temporal relationships. We evaluated MedSAM-2 on five 2D tasks and nine 3D tasks, including white blood cells, optic cups, retinal vessels, mandibles, coronary arteries, kidney tumors, liver tumors, breast cancer, nasopharynx cancer, vestibular schwannoma, mediastinal lymph nodules, cerebral artery, inferior alveolar nerve, and abdominal organs, comparing it against state-of-the-art (SOTA) models in task-tailored, general and interactive segmentation settings. Our findings demonstrate that MedSAM-2 surpasses a wide range of existing models and updates new SOTA on several benchmarks. The code is released on the project page: https://supermedintel.github.io/Medical-SAM2/.

Citations (29)

Summary

  • The paper introduces MedSAM-2, which segments 3D medical images as video using sequential slice processing for improved accuracy.
  • It incorporates a Confidence Memory Bank and Weighted Pick-up strategy to enhance segmentation without continuous user intervention.
  • MedSAM-2 achieves superior Dice scores across 15 diverse medical datasets and outperforms few-shot methods in one-prompt segmentation.

Introduction

The paper "Medical SAM 2: Segment medical images as video via Segment Anything Model 2" introduces MedSAM-2, an advancement in the domain of medical image segmentation. Building on the capabilities of the Segment Anything Model 2 (SAM 2), MedSAM-2 leverages the concept of treating medical images as video sequences to enhance segmentation performance on both 2D and 3D datasets.

Methodology

SAM 2 Application on 3D Medical Images

MedSAM-2 utilizes SAM 2, originally designed for video segmentation, by conceptualizing 3D medical images as video sequences. This technique capitalizes on the inherent continuity in 3D data, akin to video frames, allowing for improved segmentation by leveraging the temporal associations between image slices. The model employs a memory system for sequential slice processing, wherein slice embeddings are conditioned with historical data to refine accuracy. Figure 1

Figure 1: An illustration showcasing the capability of MedSAM-2, depicting the segmentation of temporally-associated frames from a single prompt in a 3D slice.

One-prompt Segmentation for 2D Medical Images

MedSAM-2's notable feature is its ability to perform One-prompt Segmentation on 2D medical images, which involves segmenting similar targets across unrelated frames using a single prompt. By treating sets of 2D medical images as video sequences, MedSAM-2 surpasses the generalization capabilities of existing models, allowing clinicians to efficiently manage segmentation tasks with minimal user input.

Framework Components

MedSAM-2 incorporates two key mechanisms: the Confidence Memory Bank and the Weighted Pick-up strategy. These components facilitate memory-enhanced segmentation without continuous user intervention. The Confidence Memory Bank selectively retains high-confidence predictions, while the Weighted Pick-up strategy optimizes embedding merging to improve task generalization. Figure 2

Figure 2: MedSAM-2 Framework highlighting its approach to treating medical images as videos with the addition of specialized memory components and strategies.

MedSAM-2 Architecture

The architecture of MedSAM-2 extends SAM 2’s capabilities, featuring an image encoder, memory encoder, and attention mechanisms. This design enables real-time segmentation refining through user inputs and facilitates significant performance improvements in both universal and One-prompt Segmentation tasks. Figure 3

Figure 3: A comparative analysis of MedSAM, MedSAM-2, and ground truth reflects MedSAM-2’s enhanced segmentation accuracy in sequential 3D medical image tasks.

Experimental Evaluation

Performance on Diverse Medical Image Datasets

MedSAM-2 was tested across 15 datasets encompassing various segmentation tasks, including abdominal, brain, thyroid, and skin modalities. The model consistently delivered superior Dice scores compared to state-of-the-art interactive and fully-supervised systems, demonstrating versatile applicability and robust segmentation fidelity across multiple medical imaging contexts.

One-Prompt Segmentation Evaluation

In a challenging One-prompt Segmentation setting, MedSAM-2 outperformed existing few-shot and one-shot methods across ten datasets with different prompts. This exceptional capacity underscores MedSAM-2’s strength in generalizing from minimal input and processing disjointed data sequences with precision. Figure 4

Figure 4: MedSAM-2 versus Few/One-shot Models across different datasets under One-prompt Segmentation, highlighting MedSAM-2’s superior average scores.

Conclusion

MedSAM-2 presents a significant advancement in medical image segmentation, aiming to democratize access to highly accurate, low-interaction segmentation methods by leveraging video processing principles. This research proposes a paradigm shift wherein medical images are processable with singular prompts, simplifying clinical workflows and enhancing diagnostic accuracy across a spectrum of medical applications.

The framework promises flexibility and performance across diverse medical imaging tasks, setting a precedent for future adaptations in real-time segmentation models and automated diagnostic tools in clinical settings. Further exploration into its application outside medical domains could extend its utility, making MedSAM-2 a versatile tool in AI-aided analysis.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 14 tweets with 1631 likes about this paper.