Papers
Topics
Authors
Recent
2000 character limit reached

AtlasSegFM: One-Shot Segmentation Adaptation

Updated 27 December 2025
  • AtlasSegFM is a one-shot segmentation framework that leverages a single annotated atlas to adapt pretrained models to new clinical scenarios.
  • It fuses deformable atlas registration for prompt generation with a test-time learnable fusion adapter that refines segmentation outputs.
  • Experimental evaluations show significant Dice score improvements, especially for small or underrepresented structures across diverse imaging modalities.

AtlasSegFM is a one-shot customization framework designed to adapt segmentation foundation models to novel clinical contexts utilizing a single annotated atlas. By fusing global anatomical priors from atlas registration with the refinement capabilities of pretrained segmentation foundation models, AtlasSegFM achieves robust, generalizable performance across diverse medical imaging modalities and anatomical targets, especially excelling on small or underrepresented structures. Central to AtlasSegFM are: (1) context-aware prompt generation via deformable atlas registration, and (2) a test-time, learnable fusion adapter that combines atlas and model predictions. These components are trained per-case, with the core segmentation model remaining entirely frozen, requiring only one annotated support image for adaptation. The framework is validated across public and in-house datasets and integrates seamlessly into existing clinical inference pipelines (Zhang et al., 20 Dec 2025).

1. Registration-Driven Contextual Prompt Generation

AtlasSegFM initiates adaptation through a two-stage test-time registration aligning a pre-segmented atlas (Xatlas,Yatlas)(X_\mathrm{atlas}, Y_\mathrm{atlas}) to a new query volume XqueryX_\mathrm{query}. The registration seeks a spatial transform T:R3R3T: \mathbb{R}^3 \to \mathbb{R}^3 minimizing

T=argminTD(XatlasT,Xquery)+λR(T),T^\star = \arg\min_T \mathcal{D}(X_\mathrm{atlas} \circ T, X_\mathrm{query}) + \lambda \mathcal{R}(T),

where D\mathcal{D} denotes similarity (SSD, NCC, or MI) and R(T)\mathcal{R}(T) enforces transformation smoothness. Practically, registration comprises rigid and affine steps, followed by a test-time optimized VoxelMorph-style U-Net, converging within ≈1.5 min on RTX4090 hardware.

Once TT^\star is computed, atlas labels are warped to the query (Matlas=YatlasTM_\mathrm{atlas} = Y_\mathrm{atlas} \circ T^\star), delivering a coarse, globally consistent segmentation. This mask supplies predefined prompt types to the downstream foundation model: click-prompts (centroid of largest component), box-prompts (minimal axis-aligned bounding box), or full mask-prompts, as dictated by the prompt preference of specific foundation models (such as nnInteractive or MedSAM2).

2. Test-Time Fusion Adapter: Architecture and Training

Segmentation outputs from the foundation model fFMf_\mathrm{FM} using atlas-derived prompts (Mfm=fFM(Xquery,Prompti)M_\mathrm{fm} = f_\mathrm{FM}(X_\mathrm{query}, \mathrm{Prompt}_i)) may lack satisfactory global context or miss fine details. AtlasSegFM introduces a lightweight fusion module employing a Kalman-filter-style update: Mfinal=(1K)Mfm+KMatlas,M_\mathrm{final} = (1 - K) M_\mathrm{fm} + K\,M_\mathrm{atlas}, with gain field K=σ(g([Matlas,Mfm]))K = \sigma(g([M_\mathrm{atlas}, M_\mathrm{fm}])), predicted by a small 3D network gg acting on the concatenated atlas and model probability maps. gg consists of parallel 3D max-pool paths (kernels 3,5,7), channel-wise fusion, several 3D convolutions, and a 1×11 \times 1 convolution, with a sigmoid activation to ensure K(x)[0,1]K(x) \in [0,1] per voxel.

The fusion adapter gg is trained at test time (using the single available support atlas and label) to minimize supervised Dice loss: Lfuse=12xMfinal(x)Ysupport(x)xMfinal(x)+xYsupport(x).\mathcal{L}_\mathrm{fuse} = 1 - \frac{2 \sum_x M_\mathrm{final}(x)\, Y_\mathrm{support}(x)}{\sum_x M_\mathrm{final}(x) + \sum_x Y_\mathrm{support}(x)}. This optimization (≈0.3 min, ≈0.1M parameters) leaves the backbone fFMf_\mathrm{FM} frozen.

3. One-Shot Customization Dynamics

AtlasSegFM’s adaptation pipeline is uniquely defined by per-query, per-context test-time optimization on a single annotated atlas. The operational sequence is:

a) Optimize the registration network on (Xatlas,Xquery)(X_\mathrm{atlas}, X_\mathrm{query}) for D+λR\mathcal{D} + \lambda \mathcal{R};

b) Generate compatible prompts from the warped atlas labels MatlasM_\mathrm{atlas};

c) Run the frozen foundation model fFMf_\mathrm{FM} to obtain MfmM_\mathrm{fm};

d) Tune only the fusion network gg via Dice loss using the support label YsupportY_\mathrm{support}.

No further backbone fine-tuning or multi-shot data are required; both test-time modules (registration, fusion) are derived from the provided single atlas. The “one-shot” mechanism enables customization to new tasks with minimal annotation or retraining overhead.

4. Experimental Evaluation Across Datasets and Tasks

AtlasSegFM underwent evaluation on six datasets, covering CT and MRI, and encompassing both large organs and small, intricate structures:

Dataset Imaging Modality Key Structures
Abd-CT CT Liver, kidneys, spleen
Abd-MR MRI (T2-SPIR) Liver, kidneys, spleen
AVT CTA Aortic vessel tree
Fe-MRA MRA 12 arteries/veins
OASIS MRI Whole brain
BrainRT CT/MRI Brain, organs-at-risk

Reported metrics include Dice, Hausdorff 95, Normalized Surface Dice (NSD), and clDice (continuity-aware for vessels).

Key findings:

  • AtlasSegFM outperformed foundation models without customization (nnInteractive, vesselFM), 2D slice-wise ICL baselines (UniverSeg, Tyche, Iris), and supervised nnU-Net (on small data).
  • On Abd-CT, mean Dice increased from ~67% (best ICL) to 72.9% (+5.4%); on BrainRT “organs-at-risk,” from 39% to 77.1%.
  • Pre-registration improved atlas Dice from ~44% to 70% (Abd-MR).
  • Prompting fFMf_\mathrm{FM} with MatlasM_\mathrm{atlas} raised Dice by 17% versus no prompt.
  • Adaptive fusion contributed an additional 12% Dice enhancement.
  • Gains were most evident for small, fine structures (e.g., optic nerves, thin vessels).

5. Computational Efficiency and Deployment Considerations

Only two compact test-time modules are learned: registration (≈1.3M parameters, 5-layer U-Net) and fusion adapter (≈0.1M). The foundation model backbone remains entirely frozen.

Empirical runtime (for a 2563256^3 volume, RTX4090):

  • Registration: ≈1.5 min
  • Foundation model inference: ≈0.01 min
  • Fusion adaptation: ≈0.3 min
  • Total: ≈1.8 min per query

Peak GPU memory usage (~10 GB) is driven by the 3D U-Nets. Integration is straightforward, requiring only a single support atlas per new segmentation context. No offline retraining or multi-shot fine-tuning is necessary, enabling direct deployment in clinical radiology and radiotherapy workflows.

6. Implications and Comparative Context

AtlasSegFM advances a paradigm where global anatomical context—instantiated via registration and transformed prompts—supplements the locality and prompt-dependence of current segmentation foundation models. This architecture particularly addresses shortcomings in contexts underrepresented in foundation model pretraining, as well as the limitations of precise prompting for challenging anatomies.

By decoupling test-time adaptation from backbone fine-tuning and reducing support requirements to a single annotated atlas, AtlasSegFM is positioned as a lightweight, flexible solution for real-world deployment, facilitating rapid customization for novel clinical tasks with minimal annotation cost (Zhang et al., 20 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to AtlasSegFM.