SAM-Brain3D: Brain 3D Segmentation & Diagnosis

Updated 22 February 2026

SAM-Brain3D is a brain-specific 3D segmentation framework that integrates diverse MRI modalities and clinical data for robust neuroimaging analysis.
It employs a volumetrically pretrained vision transformer with a Hypergraph Dynamic Adapter to fuse multi-scale, patient-specific features for improved segmentation and classification.
The framework sets new benchmarks in both zero-shot segmentation and brain disease diagnosis by effectively handling heterogeneous and multi-modal data.

SAM-Brain3D is a brain-specific 3D segmentation and foundation modeling framework developed to address the heterogeneous and multi-modal demands of neuroimaging and brain disease analysis. It unifies a volumetrically pretrained vision transformer architecture with advanced dynamic adapters for downstream segmentation and classification tasks, integrating MRI, PET, and clinical data. The model is trained on a large-scale multi-dataset, multi-modality corpus, establishing new benchmarks for flexible, precise brain structural and pathology segmentation across a range of clinical and research applications (Deng et al., 1 May 2025).

1. Model Architecture and Pretraining Data

SAM-Brain3D inherits the 3D convolutional ViT-based encoder–decoder design of SAM-Med3D. The core model architecture consists of:

A 3D ViT-style image encoder operating on volumetric MRI.
A prompt encoder mapping user-provided inputs (points, boxes, masks) into embedding space.
A decoder that fuses image and prompt embeddings via lightweight feature–mask cross-attention, producing per-voxel segmentations.

Fully 3D, the model is trained on 4,451 unique brain subjects, yielding over 66,000 image-label pairs from 9 public brain MRI segmentation datasets covering 14 distinct MRI sub-modalities. These include classical T1-weighted, FLAIR, T2 (OASIS, ISLES22, ATLASR2), adult/pediatric/sub-Saharan BraTS cohorts, 12-channel multi-contrast (UCSF-PDGM), meningioma, and metastatic data. By exposing the encoder to diverse contrasts, the model learns strong modality priors aligning with high anatomical and acquisition variability (Deng et al., 1 May 2025).

2. Training Objectives and Optimization

All parameters are trainable, initialized from a SAM-Med3D-Turbo checkpoint. The model is optimized using Adam with an initial learning rate of 8×10⁻⁵, batch size 12, for 200 epochs, reducing the LR by 10-fold at epochs 120 and 180. Standard Dice or cross-entropy segmentation losses are used, as in SAM-Med3D (Deng et al., 1 May 2025).

Targets span up to 35 structures/class labels, including fine and gross anatomical regions, stroke lesions, glioma (across populations), meningioma, metastases, and multiple tissue classes depending on the dataset.

3. Hypergraph Dynamic Adapter (HyDA) for Downstream Adaptation

HyDA is a lightweight, patient-specific adapter enabling multi-modal, multi-scale, and classification tasks on top of the frozen SAM-Brain3D encoder. The principal mechanisms are:

Multi-modal Hypergraph Construction: Each imaging (MRI, PET) or tabular modality yields per-subject embeddings. Vertices represent subjects, and per-vertex k-nearest neighbor connections form the edges of a modality-specific hypergraph. Incidence matrices from all modalities are concatenated.
Hypergraph Convolution and Fusion: A spatial HGNN⁺ layer passes messages to form updated embeddings f. A hypergraph classifier head produces class probabilities p_g.
Dynamic Kernel Generation: For each subject, the hypergraph features are reshaped to form 3×3×3 convolutional kernels, which are then used to convolve low-level 3D feature maps from the encoder, producing multi-scale, subject-specific fused feature maps.
Squeeze-and-Excitation and Residual Addition: This semantic fusion output is channel-attention scaled (squeeze-and-excitation), flattened, and residual-added to the pooled modality embeddings.
Dual-Head Classification: Both the hypergraph branch (p_g) and a standard discriminative MLP classifier (p_d) yield predictions, averaged (p = (p_g + p_d)/2). The composite loss is the sum of cross-entropy and focal loss from both heads.

HyDA introduces only ≈2.8M parameters, with the 3D transformer backbone remaining frozen at adaptation time (Deng et al., 1 May 2025).

4. Quantitative Performance and Benchmarks

Zero-shot 3D Brain Segmentation

BraTS21 (seen): average Dice ≈ 43.8% (SAM-Brain3D), a +1.4pt improvement over SAM-Med3D-Turbo.
iSeg19 (unseen, infant): average Dice ≈ 30.3% vs. 19.6% for SAM-Med3D (+10.7 pts). This suggests that pretraining specifically on brain datasets significantly improves both in-domain and out-of-domain generalization.

Brain Disease Diagnosis

Alzheimer’s Disease (pMCI/sMCI, MRI+PET+clinical): with HyDA, F1-score = 71.70%, AUC = 84.29%, outperforming prior multimodal methods (e.g., Multimodal CNN, VAP-Former, MMSDL).
MGMT methylation (glioblastoma, 4 MRI): mean AUC = 64.4% (lowest cross-fold variance among CNN/Transformer/hypergraph baselines).

Ablation Studies

HGNN⁺ outperforms alternative dynamic hypergraph or static graph approaches.
All-modality fusion (MRI, PET, clinical) is optimal; absence of any modality decreases accuracy by 2–3 points.
Hypergraph neighbor count k in 20–28 is best; accuracy is stable for k=4…40.

Clinical Robustness

Requires all modalities; does not handle missing data natively.
Class imbalance can affect sensitivity and AUC for rare conditions.

5. Comparison With Prior and Contemporary 3D SAM Variants

SAM-Brain3D represents a substantial departure from earlier 2D or slice-wise adaptations of SAM for medical imaging, where architectures typically stack 2D predictions or prompts (Chenna et al., 2024, Zhang et al., 2023). Instead, it operates fully volumetrically, with prompt encoders and decoders implemented in true 3D.

Dynamic adapters, such as HyDA, distinguish the framework from parameter-efficient low-rank adaptation/fine-tuning protocols used in GBT-SAM and MedSAM (Diana-Albelda et al., 6 Mar 2025), or signed-distance map fusion strategies (Moore, 22 Nov 2025). The design avoids retraining or domain-specific adapter installation for each new task, enabling seamless migration between segmentation and classification endpoints.

6. Integration of Multimodal and Patient-Specific Representations

SAM-Brain3D with HyDA is designed for patient-specific adaptation:

Patient-wise dynamic convolutional kernels are generated in the adapter from hypergraph features, fusing global, semantic, and local details.
The approach supports arbitrary combinations of volumetric imaging (multi-contrast MRI, PET) and clinical data.
Downstream tasks—segmentation, progression prediction, molecular subtype classification—are consistently handled by a frozen foundation model plus lightweight adapters.

This design achieves robust segmentation and diagnostic performance in heterogeneous clinical settings, as well as sample-efficient transfer to previously unseen tasks and modalities.

7. Limitations, Clinical Implications, and Future Directions

Limitations include the requirement for complete modality presence at inference time, and the susceptibility to class imbalance in rare disease subgroups. Proposed improvements are robust missing-modality handling via partial hypergraphs or dropout strategies, integrating imbalance-aware losses, and extending to continuous regression tasks.

Clinically, SAM-Brain3D enables a single inference pipeline for segmentation and diagnosis, with patient-tailored fusion supporting complex, multi-scale brain-disease patterns. A plausible implication is that this approach could standardize cross-center and cross-dataset neuroimaging analyses without retraining for new diagnostic targets.

References