SAM2-3dMed: Empowering SAM2 for 3D Medical Image Segmentation (2510.08967v1)

Published 10 Oct 2025 in eess.IV and cs.CV

Abstract: Accurate segmentation of 3D medical images is critical for clinical applications like disease assessment and treatment planning. While the Segment Anything Model 2 (SAM2) has shown remarkable success in video object segmentation by leveraging temporal cues, its direct application to 3D medical images faces two fundamental domain gaps: 1) the bidirectional anatomical continuity between slices contrasts sharply with the unidirectional temporal flow in videos, and 2) precise boundary delineation, crucial for morphological analysis, is often underexplored in video tasks. To bridge these gaps, we propose SAM2-3dMed, an adaptation of SAM2 for 3D medical imaging. Our framework introduces two key innovations: 1) a Slice Relative Position Prediction (SRPP) module explicitly models bidirectional inter-slice dependencies by guiding SAM2 to predict the relative positions of different slices in a self-supervised manner; 2) a Boundary Detection (BD) module enhances segmentation accuracy along critical organ and tissue boundaries. Extensive experiments on three diverse medical datasets (the Lung, Spleen, and Pancreas in the Medical Segmentation Decathlon (MSD) dataset) demonstrate that SAM2-3dMed significantly outperforms state-of-the-art methods, achieving superior performance in segmentation overlap and boundary precision. Our approach not only advances 3D medical image segmentation performance but also offers a general paradigm for adapting video-centric foundation models to spatial volumetric data.

Summary

The paper introduces SAM2-3dMed that leverages SRPP and BD modules to adapt SAM2 for enhanced 3D medical image segmentation.
The paper employs a composite loss function combining Dice, MSE, and weighted binary cross-entropy to improve boundary delineation and volumetric accuracy.
The paper demonstrates superior performance on MSD benchmark datasets, notably achieving significant Dice improvements on the Pancreas task.

SAM2-3dMed: Empowering SAM2 for 3D Medical Image Segmentation

Introduction

"Accurate segmentation of 3D medical images is imperative for applications such as disease diagnosis and treatment planning. The Segment Anything Model 2 (SAM2), renowned for its success in video object segmentation, faces challenges when applied to 3D medical images due to two domain discrepancies: bidirectional anatomical continuity and precise boundary delineation." To bridge these gaps, the paper introduces SAM2-3dMed as an adaptation of SAM2 specifically tailored for 3D medical imaging. The core contributions include a Slice Relative Position Prediction (SRPP) module and a Boundary Detection (BD) module, which respectively model inter-slice dependencies and enhance boundary precision.

Figure 1: The comparison focuses on inter-frame dependencies in videos (a) vs. inter-slice dependencies in 3D medical images (b), and the importance of boundary segmentation for videos (c) vs. that for medical images (d).

Methodology

SAM2-3dMed Architecture

SAM2-3dMed builds on the SAM2 backbone, incorporating innovations to adapt to 3D data:

SAM2 Backbone: Utilizes SAM2’s pre-trained Image Encoder for extracting features, processing slices independently.
SRPP Module: Employs a self-supervised task to predict relative slice positions, enforcing the model to understand spatial context.
BD Module: Enhances boundary segmentation via a parallel decoding branch with boundary-focused attention mechanisms.

These modules collectively optimize 3D segmentation, leveraging SAM2’s feature representations while adapting its mechanisms to capture volumetric dependencies.

Figure 2: Overview of the proposed SAM2-3dMed Model.

Loss Function

The model employs a composite loss function:

$\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{seg}} + \lambda_1\mathcal{L}_{\text{srpp}} + \lambda_2\mathcal{L}_{\text{bd}}$

$\mathcal{L}_{\text{seg}}$ : Dice loss for segmentation.
$\mathcal{L}_{\text{srpp}}$ : MSE loss for SRPP.
$\mathcal{L}_{\text{bd}}$ : Weighted binary cross-entropy for boundary detection.

This structured loss approach fosters robust feature learning across tasks, improving volumetric segmentation accuracy.

Experiments

Datasets and Evaluation

The research utilized diverse datasets from the Medical Segmentation Decathlon (MSD), focusing on Lung, Spleen, and Pancreas tasks, to validate SAM2-3dMed against state-of-the-art baselines. Performance metrics included Dice, IoU, HD95, and NSD to measure segmentation overlap and boundary accuracy.

Results

SAM2-3dMed demonstrated superior performance across benchmarks, with notable Dice and NSD improvements, particularly in the Pancreas dataset. It significantly exceeded the capabilities of traditional CNN-based and newer Transformer-based models by enhancing boundary detection and spatial consistency:

Pancreas Dataset: Achieved a +2.98% Dice improvement over the next best method, underscoring boundary precision and anatomical accuracy.
Figure 3: Typical segmentation maps for the three tasks. The cyan boxes highlight lower inter-slice continuity, and the orange arrows highlight worse boundary segmentations.

Ablation Studies

Comprehensive ablation studies validated each module's contributions:

Pre-training: Critical for overcoming data scarcity, enhancing Dice by over 60% on challenging datasets.
SRPP Module: Vital for slice continuity, improving spatial prediction accuracy.
BD Module: Essential for boundary accuracy, reducing HD95 markedly across datasets.
Figure 4: Visual comparison of segmentation results with and without Pre-training.

Conclusion

SAM2-3dMed represents a significant step towards adapting video segmentation models for 3D medical imaging. By focusing on inter-slice contextual understanding and boundary precision, it not only achieves state-of-the-art performance but also establishes a versatile framework for future adaptations of video-centric models to medical volumetric data, paving the way for enhanced clinical applications where data availability is limited.