SAGE-UNet: Adaptive Segmentation Architecture
- SAGE-UNet is an advanced, adaptive architecture that combines static CNN–Transformer backbones with dynamic, sparsely gated expert routing for precise medical segmentation.
- It incorporates a Shape-Adapting Hub to harmonize heterogeneous modules, enabling local-global reasoning while reducing redundant computations.
- The model achieves state-of-the-art Dice scores in colonoscopic and histopathological tasks, with dynamic gating yielding significant performance gains over conventional approaches.
SAGE-UNet is an input-adaptive, dynamically routed neural architecture for medical image segmentation, particularly targeting the challenges of cellular heterogeneity in whole slide images (WSIs) and colonoscopic lesion analysis. It operationalizes the Shape-Adapting Gated Experts (SAGE) framework, converting a static CNN–Transformer hybrid backbone (e.g., U-Net) into a sparsely gated mixture-of-experts model. SAGE-UNet features a dual-path design with hierarchical gating and a Shape-Adapting Hub (SA-Hub) for harmonizing architectural diversity between CNN and Transformer modules. Its adaptive computation paradigm reduces redundancy, enables local-global reasoning, and achieves state-of-the-art (SOTA) segmentation accuracy across multiple medical benchmarks (Thai et al., 23 Nov 2025).
1. SAGE-UNet Architecture and Dynamic Routing
SAGE-UNet generalizes traditional UNet architectures by introducing two parallel computational streams at every network layer:
- Main Path: Preserves the operations of the original pretrained backbone, ensuring representational continuity.
- Expert Path: Selectively activates a sparse set (Top-) of experts—either shared or domain-specialized—using a multi-level gating mechanism.
Let denote the input feature map to the -th layer. The main path computes , while the expert path extracts a global embedding and applies two gating stages:
- Shared Expert Gate: Computes , where is a logistic sigmoid, to distribute probability mass between shared and fine-grained experts.
- Semantic Affinity Routing (SAR): For each candidate expert , computes using query-key similarity and adaptive noise, then composes by augmenting scores with (for shared) or (for specialized).
The model selects the Top- experts per layer, and their normalized gated outputs form the expert-path output:
where denotes the indices of the most relevant experts.
A learnable scalar fuses the main and expert paths:
This design enables dynamic, input-dependent routing and adaptive capacity allocation.
2. Shape-Adapting Hub (SA-Hub) and Heterogeneous Expert Integration
The SA-Hub mediates architectural mismatch between the CNN and Transformer experts:
- Input Adapter : Transforms a 2D CNN feature map into the target format (e.g., token sequence) required by the expert module.
- Output Adapter : Projects the expert's output back into the CNN-style spatial and channel dimensions, enabling seamless path fusion.
These adapters allow shared use of heterogeneous expert backbones (e.g., ConvNeXt layers, ViT transformer blocks) within the same stage, achieving both semantic consistency and architectural interoperability.
3. Architectural Specification and Implementation
SAGE-UNet is instantiated atop the TransUNet architecture augmented by SAGE mechanisms. Specific configuration parameters are:
- Expert Pool: Total of experts per layer (4 shared, 16 fine-grained).
- Expert Selection: Top- experts activated per layer.
- Expert Injection: Experts are integrated at every encoder and decoder block, following the stage structure of ConvNeXt (4 stages) and ViT (16 transformer blocks).
- Gating and Routing: Hierarchical logit modulation (Eq. (5)–(7)), with expert selection affecting final decoder logits.
A summary of the modular design:
| Component | Function | Associated Model Elements |
|---|---|---|
| Main Path | Preserves original backbone operations | |
| Expert Path | Dynamic, Top- expert selection | |
| SA-Hub | Adapts between CNN and Transformer | |
| Dual-Path Fusion | Learns balance |
4. Training Procedure and Loss Functions
SAGE-UNet is trained end-to-end with a composite loss function per minibatch:
where:
- is the cross-entropy loss,
- is the Dice loss,
- is the load-balancing loss, encouraging even utilization of the expert pool.
Hyperparameters are set as: , , . Optimization uses AdamW with a two-stage learning rate schedule.
5. Quantitative Results and Ablation Studies
SAGE-UNet achieves new SOTA Dice scores across multiple colorectal histopathology datasets:
| Dataset (subset) | Dice Score (%) | Dice Gain over Baseline (%) |
|---|---|---|
| EBHI (Adenocarcinoma) | 95.57 | +3.3 |
| DigestPath (colon patch) | 95.16 | +1.7 |
| GlaS (A+B) | 94.17 | +2.65 (approx) |
These results surpass ConvNeXt-UNet, SegFormer, and EViT-UNet benchmarks. Ablations demonstrate:
- Sigmoid (vs. softmax) gating increases EBHI Dice from 95.05% to 95.57%.
- Increasing Top- from 1→4 yields a +5.4% Dice improvement.
- Scaling shared experts from 1→4 adds +0.47% Dice.
6. Domain Generalization Performance
On GlaS Test B (designed to assess domain shift), SAGE-UNet achieves 94.67% Dice, surpassing EViT-UNet by +1.4% and UNet++ by +2.74%. Qualitative evaluations indicate robust boundary delineation under morphological shifts. Shared gating scalars suggest CNN stages favor shared experts () while Transformer stages operate near , evidencing context-dependent dispatch that may underlie improved adaptability (Thai et al., 23 Nov 2025).
7. Significance, Limitations, and Future Directions
SAGE-UNet demonstrates several empirical and architectural advantages:
- Adaptive Computation: Reduces redundant expert evaluation for simple regions, concentrating capacity on complex instances.
- Hierarchical Expert Routing: Enables flexible balance between general and specialized processing, with improved interpretability and segmentation accuracy.
- Heterogeneous Module Fusion: SA-Hub allows seamless collaboration between architectures tailored for distinct representational granularities.
Identified limitations include increased implementation complexity and slower per-layer inference due to multi-expert evaluation, despite overall sparsity. The optimal design of the expert pool (depth and width) remains open; automated architecture search represents a plausible avenue for improving the performance-efficiency trade-off. Extension to 3D or multi-modal medical data is identified as a promising future direction (Thai et al., 23 Nov 2025).
SAGE-UNet establishes dynamic expert routing with shape-adapting fusion as a scalable and accurate paradigm for complex medical segmentation challenges.