nnSAM: Plug-and-Play SAM Extension

Updated 12 November 2025

The paper advances nnSAM by integrating a frozen SAM encoder with nnUNet using level-set and curvature losses to enhance medical segmentation.
It refines prompt optimization with an adversarial reinforcement learning agent that improves robustness against challenging prompt scenarios.
The nnSAM frameworks enable plug-and-play adaptability and domain-specific regularization, significantly boosting performance in few-shot settings.

nnSAM (Plug-and-play Segment Anything Model) refers to a family of methods that extend the Segment Anything Model (SAM) with domain adaptation, robustness, or additional capabilities, typically in a modular, training-optional, or prompt-based fashion. Notably, the term "nnSAM" appears in distinct but related research lines: (1) a model for medical image segmentation combining SAM and nnUNet with level-set/curvature supervision (Li et al., 2023), and (2) a plug-in adversarial agent for prompt optimization to boost SAM’s robustness (Liu et al., 23 Sep 2025). This article surveys the core technical frameworks, mathematical foundations, training paradigms, and benchmarking of “nnSAM” in both senses.

1. Architectural Foundations

The two principal variants of nnSAM target distinct settings but share a focus on plug-and-play enhancement of SAM via feature-fusion, auxiliary losses, or external optimization agents.

Backbone integration: nnSAM attaches a frozen SAM encoder (pre-trained ViT on SA-1B) in parallel to an nnUNet encoder. SAM’s output embeddings (originally $64\times64$ from a $1024\times1024$ input) are bilinearly resized to match each nnUNet encoder level, then concatenated channel-wise.
Fused decoder: The nnUNet decoder splits into (a) a segmentation head for per-pixel class probabilities $\{p_j(a,b)\}$ , and (b) a regression head for predicting signed distance maps (level sets) $\phi'(a,b)$ , from which boundary curvature is derived.
Training: Only nnUNet parameters (encoder–decoder, segmentation, regression heads) are updated. The SAM encoder’s weights remain frozen.

Agent wrapping: SAM is wrapped at inference time by a “defender” agent trained to refine point prompt sets; a paired “attacker” agent synthesizes worst-case prompts during training.
Prompt environment: Each image is represented as a dual-space graph $G=(V,E)$ , with nodes as candidate prompts/patches. Node features combine DINOv2 semantic embeddings and 2D coordinates.
Plug-in operation: The defender agent is the only run-time addition; SAM’s architecture and weights remain unchanged.

2. Mathematical Formulation and Optimization

To enforce anatomical priors, nnSAM augments the segmentation loss with boundary-shape losses using level sets and curvature:

Signed distance function loss:

$\phi(a,b)= \begin{cases} -d(a,b) & \text{inside} \ 0 & \text{boundary} \ +d(a,b) & \text{outside} \end{cases}$

where $d(a,b)$ is the Euclidean distance to the ground-truth boundary.

$\mathrm{Loss}_l = \frac{1}{HWC}\sum_{a,b,j}(\phi_j(a,b)-\phi'_j(a,b))^2$

Curvature loss: Level-set $\phi$ is sharpened by $\hat\phi = \sigma(-1000\,\phi)$ , and local curvature is

$\kappa_{\hat\phi} = \frac{|(1+\hat\phi_a^2)\hat\phi_{bb} + (1+\hat\phi_b^2)\hat\phi_{aa} - 2\hat\phi_a\hat\phi_b\hat\phi_{ab}|}{2(1+\hat\phi_a^2+\hat\phi_b^2)^{3/2}}$

Curvature discrepancy is penalized as:

$\mathrm{Loss}_c = \frac{1}{HWC}\sum_{a,b,j}|\kappa_{\hat\phi_j}(a,b) - \kappa_{\hat\phi'_j}(a,b)|$

Total loss:

$\mathrm{Loss} = \lambda_1 \mathrm{Loss}_s + \lambda_2 \mathrm{Loss}_l + \lambda_3 \mathrm{Loss}_c$

with $\lambda_1 = 1$ , $\lambda_2 = 0.1$ , $\lambda_3 = 10^{-4}$ .

Prompt refinement is framed as a two-player reinforcement learning game:

State: $(G, \sigma_t)$ , where $\sigma_t\in\{0,1\}^n$ encodes active prompts.
Attacker action: Activates nodes (adds prompts), maximizing decrease in mask quality.
Defender action: Deactivates nodes (removes prompts), maximizing recovery of mask quality.
Reward:
- Attacker: $r_t^{\mathrm{atk}} = S(P_{t-1}) - S(P_t)$
- Defender: $r_t^{\mathrm{def}} = S(P_t) - S(P_{t-1})$ , $S$ is IoU or Dice.

Both agents are Deep Q-Networks (GCN-based, two layers with width 128). $Q$ -learning is conducted using experience replay and temporal difference loss: $\mathcal L = \left( r_t + \gamma \max_{a'} Q_{\theta^-}(s_{t+1},a') - Q_\theta(s_t,a_t) \right)^2$ Only the defender agent is needed at inference.

3. Training Paradigms and Implementation

Medical datasets: MR brain white-matter, CT heart, Chest X-ray lungs, CT liver.
Few-shot regimes: Experiments performed with as few as 5–20 labeled images.
nnUNet preprocessing: Intensity normalization, geometric augmentations.
SAM preprocessing: Images resized to $1024\times1024$ for SAM, then embeddings up/downsampled as needed.

Prompt graph construction: Grid or feature-matched candidate prompts with DINOv2 embeddings; adjacency encodes spatial + semantic proximity.
Episodes: Each step alternates attacker/defender actions, updating the prompt set and querying SAM for mask quality.
Optimization: Adam, learning rate $1\times10^{-4}$ , $\gamma=0.99$ , $\epsilon$ -greedy exploration annealed.
Stability: Gradient clipping and double-DQN to avoid overestimation.

Runtime Usage

Medical nnSAM: Operates as a single end-to-end U-Net model with frozen SAM embedding; suitable for batch inference and training on small medical datasets.
Prompt optimizer nnSAM: Pure PyTorch wrapper; performs $\sim$ 50 defender steps ( $\sim$ 0.1s overhead), calls SAM with refined prompt set; SAM weights never updated.

4. Comparative Quantitative Performance

Task	Method	Dice (%)	ASD (mm)
MR brain WM	nnSAM	82.77 ± 10.12	1.14 ± 1.03
	nnUNet	79.25 ± 17.24	1.36 ± 1.63
	AutoSAM	77.44 ± 14.69	1.69 ± 1.55
CT heart substructures	nnSAM	94.19 ± 1.51	1.36 ± 0.42
	nnUNet	93.76 ± 2.95	1.48 ± 0.65
CT liver	nnSAM	85.24 ± 23.74	6.18 ± 16.02
	nnUNet	83.69 ± 26.32	6.70 ± 15.66
Chest X-ray lungs	nnSAM	93.63 ± 1.49	1.47 ± 0.42
	nnUNet	93.01 ± 2.41	1.63 ± 0.57

Few-shot: On brain WM with 5 samples, nnSAM yields $+6.3\%$ Dice absolute gain over nnUNet; improvement persists with 10–20 samples.
Interpretation: The largest benefit is observed under severe annotation scarcity, attributed to combined SAM features and level-set/curvature regularization.

Dataset	mIoU Gain over Grid/Feature Prompts (%)
PASCAL VOC	+25.5
ISIC	+9.2
Kvasir	+23.4

Ablation: Without attacker training, generalization degrades by $\sim$ 10% mIoU. Using dual-space graphs (vs. single-space) improves mIoU by $\sim$ 5% over either semantic-only or spatial-only graphs.
Robustness: The defender agent tightens segmentation boundaries and prunes outliers, especially when facing noisy or adversarial prompt initializations.

5. Domain Adaptation, Scalability, and Extension

Domain-agnostic design: The medical nnSAM exploits SAM’s domain-agnostic embeddings while learning medical priors via the nnUNet+curvature losses, enabling superior performance in limited-data domains.
Plug-and-play inference: The prompt optimizer nnSAM acts as a drop-in front-end; no SAM re-training is required. Evaluation on natural, medical, and aerial imagery demonstrates generalization without retraining.
Sample efficiency: Both lines of work highlight significant gains in limited annotation settings (e.g., 5–20 training samples in medical, 1-shot segmentation in prompt optimization).
Computational profile: Overhead for the prompt optimizer (~0.1s/defender, most time in SAM call) is negligible for practical use; end-to-end nnSAM runs as a single neuroimaging model.

6. Limitations and Future Directions

Level-set/curvature head: Assumes regular object shapes; may underperform for irregular structures such as tumors (Li et al., 2023), and currently limited to 2D.
Prompt optimizer agent: Defender success depends on the quality of initial prompts; with semantically irrelevant or poorly distributed prompts, recovery is limited (Liu et al., 23 Sep 2025).
3D extension: Fusion of 3D SAM embeddings with 3D nnUNet architectures remains an unresolved technical challenge.
Annotation minimization: While nnSAM methods work in few-shot, manual labels are still needed; one/zero-shot segmentation remains an important next step.
Semantic generalization: Integration of prompt-free SAM finetuning or unsupervised shape priors, and further development of unsupervised or adaptive prompt generation methods, are posited as promising directions.

7. Significance and Broader Impact

The presented nnSAM frameworks represent modular, plug-and-play strategies to bridge foundational models (e.g., SAM) and high-performance domain adaptation (e.g., for medical imaging or robust prompt optimization). By combining frozen foundation model features, automated configuration, and domain-specific regularization, nnSAM consistently outperforms conventional architectures on small and heterogenous data with minimal human tuning. The adversarial prompt agent variant demonstrates that defense-for-attack RL paradigms can enable training-free, robust, and generalizable segmentation without modifying SAM’s backbone, facilitating practical deployment across domains.

PDF Markdown Chat (Pro)

References (2)

nnSAM: Plug-and-play Segment Anything Model Improves nnUNet Performance (2023)

Attack for Defense: Adversarial Agents for Point Prompt Optimization Empowering Segment Anything Model (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to nnSAM: Plug-and-play Segment Anything Model.

nnSAM: Plug-and-Play SAM Extension

1. Architectural Foundations

nnSAM for Medical Segmentation (Li et al., 2023)

nnSAM as an Adversarial Prompt Optimizer (Liu et al., 23 Sep 2025)

2. Mathematical Formulation and Optimization

Level-set and Curvature Supervision (Li et al., 2023)

Adversarial DQN for Prompt Optimization (Liu et al., 23 Sep 2025)

3. Training Paradigms and Implementation

Data and Preprocessing (Li et al., 2023)

nnSAM Adversarial Agent Training (Liu et al., 23 Sep 2025)

Runtime Usage

4. Comparative Quantitative Performance

Medical Segmentation Benchmarks (Li et al., 2023)

Prompt Optimization Benchmarks (Liu et al., 23 Sep 2025)

5. Domain Adaptation, Scalability, and Extension

6. Limitations and Future Directions

7. Significance and Broader Impact

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

nnSAM: Plug-and-Play SAM Extension

1. Architectural Foundations

nnSAM for Medical Segmentation (Li et al., 2023)

nnSAM as an Adversarial Prompt Optimizer (Liu et al., 23 Sep 2025)

2. Mathematical Formulation and Optimization

Level-set and Curvature Supervision (Li et al., 2023)

Adversarial DQN for Prompt Optimization (Liu et al., 23 Sep 2025)

3. Training Paradigms and Implementation

Data and Preprocessing (Li et al., 2023)

nnSAM Adversarial Agent Training (Liu et al., 23 Sep 2025)

Runtime Usage

4. Comparative Quantitative Performance

Medical Segmentation Benchmarks (Li et al., 2023)

Prompt Optimization Benchmarks (Liu et al., 23 Sep 2025)

5. Domain Adaptation, Scalability, and Extension

6. Limitations and Future Directions

7. Significance and Broader Impact

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics