SAGE-UNet: Adaptive Segmentation Architecture

Updated 30 November 2025

SAGE-UNet is an advanced, adaptive architecture that combines static CNN–Transformer backbones with dynamic, sparsely gated expert routing for precise medical segmentation.
It incorporates a Shape-Adapting Hub to harmonize heterogeneous modules, enabling local-global reasoning while reducing redundant computations.
The model achieves state-of-the-art Dice scores in colonoscopic and histopathological tasks, with dynamic gating yielding significant performance gains over conventional approaches.

SAGE-UNet is an input-adaptive, dynamically routed neural architecture for medical image segmentation, particularly targeting the challenges of cellular heterogeneity in whole slide images (WSIs) and colonoscopic lesion analysis. It operationalizes the Shape-Adapting Gated Experts (SAGE) framework, converting a static CNN–Transformer hybrid backbone (e.g., U-Net) into a sparsely gated mixture-of-experts model. SAGE-UNet features a dual-path design with hierarchical gating and a Shape-Adapting Hub (SA-Hub) for harmonizing architectural diversity between CNN and Transformer modules. Its adaptive computation paradigm reduces redundancy, enables local-global reasoning, and achieves state-of-the-art (SOTA) segmentation accuracy across multiple medical benchmarks (Thai et al., 23 Nov 2025).

1. SAGE-UNet Architecture and Dynamic Routing

SAGE-UNet generalizes traditional UNet architectures by introducing two parallel computational streams at every network layer:

Main Path: Preserves the operations of the original pretrained backbone, ensuring representational continuity.
Expert Path: Selectively activates a sparse set (Top- $K$ ) of experts—either shared or domain-specialized—using a multi-level gating mechanism.

Let $z_{i-1} \in \mathbb{R}^{H\times W\times C}$ denote the input feature map to the $i$ -th layer. The main path computes $z_i^{(\mathrm{main})} = f_i(z_{i-1})$ , while the expert path extracts a global embedding $\bar z_{i-1}$ and applies two gating stages:

Shared Expert Gate: Computes $g_s = \sigma(\bar z_{i-1} W_{\mathrm{gate}}^{(i)} + b_{\mathrm{gate}}^{(i)})$ , where $\sigma$ is a logistic sigmoid, to distribute probability mass between shared and fine-grained experts.
Semantic Affinity Routing (SAR): For each candidate expert $j$ , computes $L_{i,j}$ using query-key similarity and adaptive noise, then composes $L'_{i,j}$ by augmenting scores with $\log(g_s)$ (for shared) or $\log(1-g_s)$ (for specialized).

The model selects the Top- $K$ experts per layer, and their normalized gated outputs form the expert-path output:

$z_i^{(\mathrm{expert})} = \sum_{j\in\mathcal I} w_j\,\hat z_i^{(j)}$

where $\mathcal I$ denotes the indices of the $K$ most relevant experts.

A learnable scalar $\alpha_i$ fuses the main and expert paths:

$z_i = \alpha_i\,z_i^{(\mathrm{main})} + (1-\alpha_i)\,z_i^{(\mathrm{expert})}$

This design enables dynamic, input-dependent routing and adaptive capacity allocation.

2. Shape-Adapting Hub (SA-Hub) and Heterogeneous Expert Integration

The SA-Hub mediates architectural mismatch between the CNN and Transformer experts:

Input Adapter $S_{\mathrm{in}}$ : Transforms a 2D CNN feature map into the target format (e.g., token sequence) required by the expert module.
Output Adapter $S_{\mathrm{out}}$ : Projects the expert's output back into the CNN-style spatial and channel dimensions, enabling seamless path fusion.

These adapters allow shared use of heterogeneous expert backbones (e.g., ConvNeXt layers, ViT transformer blocks) within the same stage, achieving both semantic consistency and architectural interoperability.

3. Architectural Specification and Implementation

SAGE-UNet is instantiated atop the TransUNet architecture augmented by SAGE mechanisms. Specific configuration parameters are:

Expert Pool: Total of $M=20$ experts per layer (4 shared, 16 fine-grained).
Expert Selection: Top- $K=4$ experts activated per layer.
Expert Injection: Experts are integrated at every encoder and decoder block, following the stage structure of ConvNeXt (4 stages) and ViT (16 transformer blocks).
Gating and Routing: Hierarchical logit modulation (Eq. (5)–(7)), with expert selection affecting final decoder logits.

A summary of the modular design:

Component	Function	Associated Model Elements
Main Path	Preserves original backbone operations	$f_i, z_i^{(\mathrm{main})}$
Expert Path	Dynamic, Top- $K$ expert selection	$g_s, L_{i,j}, w_j, z_i^{(\mathrm{expert})}$
SA-Hub	Adapts between CNN and Transformer	$S_\mathrm{in}, S_\mathrm{out}$
Dual-Path Fusion	Learns balance $\alpha_i$	$z_i$

4. Training Procedure and Loss Functions

SAGE-UNet is trained end-to-end with a composite loss function per minibatch:

$\mathcal L_{\mathrm{total}} = \lambda_{\mathrm{CE}} \mathcal L_{\mathrm{CE}}(\mathbf P,\mathbf Y) + \lambda_{\mathrm{Dice}} \mathcal L_{\mathrm{Dice}}(\mathbf P,\mathbf Y) + \lambda_{\mathrm{lb}} \sum_{i=1}^T\sum_{j=1}^M f_j^{(i)} P_j^{(i)}$

where:

$\mathcal L_{\mathrm{CE}}$ is the cross-entropy loss,
$\mathcal L_{\mathrm{Dice}}$ is the Dice loss,
$\mathcal L_{\mathrm{lb}}$ is the load-balancing loss, encouraging even utilization of the expert pool.

Hyperparameters are set as: $\lambda_{\mathrm{CE}}=1$ , $\lambda_{\mathrm{Dice}}=1.5$ , $\lambda_{\mathrm{lb}}=1$ . Optimization uses AdamW with a two-stage learning rate schedule.

5. Quantitative Results and Ablation Studies

SAGE-UNet achieves new SOTA Dice scores across multiple colorectal histopathology datasets:

Dataset (subset)	Dice Score (%)	Dice Gain over Baseline (%)
EBHI (Adenocarcinoma)	95.57	+3.3
DigestPath (colon patch)	95.16	+1.7
GlaS (A+B)	94.17	+2.65 (approx)

These results surpass ConvNeXt-UNet, SegFormer, and EViT-UNet benchmarks. Ablations demonstrate:

Sigmoid (vs. softmax) gating increases EBHI Dice from 95.05% to 95.57%.
Increasing Top- $K$ from 1→4 yields a +5.4% Dice improvement.
Scaling shared experts from 1→4 adds +0.47% Dice.

6. Domain Generalization Performance

On GlaS Test B (designed to assess domain shift), SAGE-UNet achieves 94.67% Dice, surpassing EViT-UNet by +1.4% and UNet++ by +2.74%. Qualitative evaluations indicate robust boundary delineation under morphological shifts. Shared gating scalars $g_s$ suggest CNN stages favor shared experts ( $g_s\gg0.5$ ) while Transformer stages operate near $g_s\approx0.5$ , evidencing context-dependent dispatch that may underlie improved adaptability (Thai et al., 23 Nov 2025).

7. Significance, Limitations, and Future Directions

SAGE-UNet demonstrates several empirical and architectural advantages:

Adaptive Computation: Reduces redundant expert evaluation for simple regions, concentrating capacity on complex instances.
Hierarchical Expert Routing: Enables flexible balance between general and specialized processing, with improved interpretability and segmentation accuracy.
Heterogeneous Module Fusion: SA-Hub allows seamless collaboration between architectures tailored for distinct representational granularities.

Identified limitations include increased implementation complexity and slower per-layer inference due to multi-expert evaluation, despite overall sparsity. The optimal design of the expert pool (depth and width) remains open; automated architecture search represents a plausible avenue for improving the performance-efficiency trade-off. Extension to 3D or multi-modal medical data is identified as a promising future direction (Thai et al., 23 Nov 2025).

SAGE-UNet establishes dynamic expert routing with shape-adapting fusion as a scalable and accurate paradigm for complex medical segmentation challenges.

PDF Markdown Chat (Pro)

References (1)

Shape-Adapting Gated Experts: Dynamic Expert Routing for Colonoscopic Lesion Segmentation (2025)

SAGE-UNet: Adaptive Segmentation Architecture

1. SAGE-UNet Architecture and Dynamic Routing

2. Shape-Adapting Hub (SA-Hub) and Heterogeneous Expert Integration

3. Architectural Specification and Implementation

4. Training Procedure and Loss Functions

5. Quantitative Results and Ablation Studies

6. Domain Generalization Performance

7. Significance, Limitations, and Future Directions

Whiteboard

Follow Topic

Continue Learning

SAGE-UNet: Adaptive Segmentation Architecture

1. SAGE-UNet Architecture and Dynamic Routing

2. Shape-Adapting Hub (SA-Hub) and Heterogeneous Expert Integration

3. Architectural Specification and Implementation

4. Training Procedure and Loss Functions

5. Quantitative Results and Ablation Studies

6. Domain Generalization Performance

7. Significance, Limitations, and Future Directions

Sponsor

Whiteboard

Follow Topic

Continue Learning

Related Topics