FeTal-SAM: Fetal Imaging Segmentation
- FeTal-SAM is a segmentation framework that adapts the Segment Anything Model (SAM) for fetal imaging by integrating domain-specific prompt engineering and data augmentation techniques.
- It employs a diffusion model with LoRA for fetal head ultrasound segmentation, enabling high-precision outcomes even with limited annotated data.
- For fetal brain MRI, the system leverages atlas-based priors and dense prompt generation to achieve accurate multi-structure segmentation across diverse cohorts.
FeTal-SAM refers to two distinct but conceptually aligned segmentation frameworks in medical imaging, both leveraging the Segment Anything Model (SAM) as a foundation and augmenting it for fetal imaging tasks: (1) a diffusion model-based data augmentation pipeline for fetal head segmentation in ultrasound (Wang et al., 30 Jun 2025), and (2) an atlas-guided foundation model approach for multi-structure fetal brain segmentation in MRI (Zeng et al., 22 Jan 2026). These systems address core challenges in fetal neuroimaging and prenatal screening, including extreme scarcity of annotated data and the necessity for clinically-adaptable, user-driven segmentation workflows.
1. Foundational Overview
FeTal-SAM in ultrasound (Wang et al., 30 Jun 2025) and MRI (Zeng et al., 22 Jan 2026) share the foundational philosophy of extending the generalist SAM to the fetal imaging domain through domain-specific prompt engineering, data augmentation, and architectural modifications. Both pipelines aim to reduce reliance on large, bespoke annotated datasets and to enable rapid adaptation to varied anatomical or clinical requirements.
The ultrasound-oriented FeTal-SAM employs synthetic data generation with generative diffusion models to expand limited real-world datasets and fine-tune the SAM mask decoder, thereby attaining high-precision fetal head segmentation even with minimal labeled data. Conversely, the MRI-tailored FeTal-SAM fuses multi-atlas registration and prompt generation with SAM’s zero-shot segmentation capacity, allowing structure-specific mask prediction across diverse label schemes and anatomical definitions without retraining.
2. System Architectures and Methodological Distinctions
FeTal-SAM for Fetal Head Ultrasound Segmentation
The pipeline architecture entails a three-stage process:
- Mask-guided fine-tuning of Stable Diffusion (SD) using LoRA: Training employs a small set of real fetal head ultrasound images and associated masks, formulated as three-channel tensors where the mask is explicitly injected as a channel.
- Synthetic sample generation: Finetuned SD generates matched image-mask tuples for data augmentation. The synthetic mask channel is extracted by thresholding (with optional manual correction).
- Supervised SAM mask decoder fine-tuning: SAM’s vision transformer (ViT-B) backbone and CLIP prompt encoder remain frozen; only the mask decoder is trained using a combined real and synthetic dataset (fixed training size of 500 samples per setting), with bounding box prompts derived from ground truth masks.
FeTal-SAM for Fetal Brain MRI Segmentation
The MRI pipeline integrates atlas-based priors into the SAM workflow:
- Multi-atlas rigid and affine registration: Each subject’s fetal brain MRI volume is registered to multiple atlases, aligning both intensity and ground-truth label maps.
- Atlas-derived prompt generation: Dense spatial prompts are constructed by encoding warped atlas labels using a dedicated U-Net and fusing channel-attended atlas image features. Bounding box prompts are computed by averaging structure extents across atlases.
- Decoder modification and structure-wise prediction: The Med-SAM ViT-B image encoder is frozen; the decoder is adapted to accept image features, fused dense prompts, and box prompts. A structure-specific loop yields binary masks per 2D slice, subsequently fused in 3D using STAPLE.
- Multi-orientation fusion and full-volume reconstruction: Inference is repeated across axial, coronal, and sagittal planes, with STAPLE providing robust label fusion to construct the final multi-label volume.
3. Diffusion- and Atlas-Based Data/Prompt Generation
Diffusion-Based Synthetic Data for Ultrasound
The system utilizes denoising diffusion probabilistic models implemented in latent SD space, with LoRA facilitating low-rank adaptation for efficient parameterization. Training images are augmented via in-channel mask injection and conditioned via trimester-specific text prompts. Sampling produces three-channel outputs from which masks are thresholded and grayscale images are derived.
Key attributes:
- LoRA fine-tuning per trimester on images, producing well-structured, variable synthetic data.
- For a given experiment, synthetic data is adaptively generated to maintain a fixed training set size across all real-sample regimes: .
- Mask-guided generative augmentation enables stable convergence and low variance in downstream SAM fine-tuning.
Atlas-Prompting for MRI
Multi-atlas registration yields spatially aligned label templates and image priors. The label encoder maps these templates into dense prompts, which are fused with features from registered atlas images using channel attention. This methodology distinguishes between anatomical regions with high image contrast (where the SAM decoder is image-driven) and regions predominantly defined by spatial priors (where performance is capped by input contrast and registration fidelity).
Dense and box prompts enable user-specified binary segmentation without retraining, supporting arbitrary label definitions or parcellation schemes across institutions and research studies.
4. Empirical Performance and Comparative Analysis
Ultrasound Results
- Few-shot generalization: On out-of-distribution evaluation (HC18→ES, HC18→AF) with real samples, Dice scores reach (ES) and (AF).
- Ablation against classic augmentations: Across , diffusion-augmented FeTal-SAM consistently outperforms Weak Augmentation (WA) and Strong Augmentation (SA), e.g., for on AF: (FeTal-SAM) vs (WA) and (SA).
MRI Results
- dHCP and CRL cohort evaluation: Med-SAM without fetal fine-tuning achieves a DSC ≈ $0.25$ (dHCP) and $0.39$ (CRL), improving to $0.60/0.49$ after mask decoder fine-tuning. FeTal-SAM achieves DSC ≈ $0.88$ (dHCP) and $0.80$ (CRL), with mean 95th percentile Hausdorff distances near $1$ mm.
- Comparison with 3D models: FeTal-SAM lags 3D U-Net and Swin-UNETR by less than DSC and Jaccard on well-contrasted structures. For low-contrast anatomy (e.g., amygdala, hippocampus), FeTal-SAM underperforms by 10–15% DSC due to limited image contrast and the absence of spatial priors in the learned weights.
- Prompting flexibility: Segmentation of user-specified anatomy is supported simply by substituting atlas templates, obviating retraining cycles.
5. Technical Limitations and Prospects
Documented Constraints
- Ultrasound system: Manual mask correction after thresholding is required; fine-tuning SD models for each trimester adds procedural overhead; pipeline is specialized to fetal head segmentation, with generalization to other organs yet unstudied.
- MRI system: Slice-wise 2D inference limits true 3D context modeling, which is crucial for small or ambiguous structures. Atlas registration errors (under/over-prompting) directly affect segmentation accuracy. Performance on low-contrast regions is fundamentally limited by both imaging physics and inter-expert annotation ambiguity.
Prospective Extensions
- Automated thresholded mask extraction (e.g., learned postprocessing) for synthetic images in ultrasound.
- Enhanced atlas registration through deep deformable networks and 3D prompt fusion for MRI.
- Application to additional structures (e.g., fetal heart, diverse brain parcellations), alternative imaging modalities, and integration of multi-condition prompt architectures (e.g., incorporating gestational age, acquisition plane).
- Uncertainty estimation and predictive confidence measures to flag ambiguous segmentations and guide expert review.
6. Clinical and Research Applications
FeTal-SAM approaches facilitate scalable, reproducible, and flexible segmentation workflows necessary for large-cohort studies, clinical screening, and longitudinal analysis in perinatal research:
- Ultrasound: Enables high-precision fetal head measurements and potentially facilitates automated screening for growth anomalies in datasets with minimal expert labeling (Wang et al., 30 Jun 2025).
- MRI: Supports volumetric analysis and growth trajectory quantification for diverse anatomical structures—rapidly adapting to novel or cohort-specific parcellation schemes without network retraining. This capacity is particularly relevant for cross-cohort and multi-institutional neurodevelopmental research, where label sets and annotation protocols differ (Zeng et al., 22 Jan 2026).
A plausible implication is that foundation model-based segmentation pipelines, when coupled with carefully engineered generative or atlas-prior-based prompting, will catalyze the development of clinically adaptable, annotation-efficient analysis tools for prenatal imaging.
7. Summary Table: FeTal-SAM Variants
| Modality/Task | Prompt/Conditioning | Data Augmentation | Performance (Dice) |
|---|---|---|---|
| Fetal head ultrasound (Wang et al., 30 Jun 2025) | Bounding box, real mask | Diffusion-based, LoRA | 94.66% (ES), 94.38% (AF) |
| Fetal brain MRI (Zeng et al., 22 Jan 2026) | Atlas label, dense prompt | Atlas-based priors | 0.88 (dHCP), 0.80 (CRL) |
Both frameworks exemplify the extension of large, prompt-driven segmentation models to specialized, annotation-sparse domains in medical imaging, with empirical validation across independent cohorts and anatomical targets.