RDTE-UNet: Hybrid Medical Segmentation
- RDTE-UNet is a neural architecture for precise medical image segmentation that integrates local convolutions with global transformer modules for enhanced edge detection and fine anatomical detail preservation.
- It introduces three key modules: ASBE for adaptive edge sharpening, HVDA for directional feature attention, and EulerFF for effective feature fusion.
- Evaluated on Synapse CT and BUSI ultrasound datasets, RDTE-UNet achieves higher Dice scores and lower boundary errors, demonstrating significant performance improvements.
RDTE-UNet is a neural architecture designed for precise medical image segmentation, targeting enhanced boundary accuracy and preservation of fine anatomical details. It achieves this by integrating local convolutional modeling with global Transformer-based context awareness. The architecture introduces three principal innovations: the Adaptive Shape-aware Boundary Enhancement (ASBE) module for edge sharpening, the Horizontal-Vertical Detail Attention (HVDA) block for fine-grained directional feature modeling, and the Euler Feature Fusion (EulerFF) module for direction- and channel-sensitive skip connection fusion. Evaluated on Synapse multi-organ CT and BUSI breast ultrasound datasets, RDTE-UNet demonstrates advanced segmentation performance in both quantitative and qualitative terms (Qu et al., 3 Nov 2025).
1. Architectural Overview
RDTE-UNet adopts a U-shaped encoder–decoder topology, comprising five encoder and five decoder stages connected via classical skip-links. The encoder alternates between standard ResBlock stages (1–3) and "Details Transformer" stages (4–5). The architecture is organized as follows:
- Front-end ASBE Module: Replaces the initial convolution with an edge-aware block. ASBE utilizes Adaptive Rectangular Convolution for shape-dependent kernel learning and a boundary-difference operator for edge sharpening.
- Hybrid Encoder Backbone: Stages 1–3 are implemented as ResBlocks for strong local representation. Stages 4–5 transition to a Details Transformer block comprising HVDA—emphasizing horizontal and vertical features using directional “StairConv” mechanisms—followed by a channel MLP and Pre-LN residual structures for global modeling.
- Decoder Path and Fusion: At each upsampling stage, corresponding encoder features are fused with the decoder stream via the EulerFF module, allowing dynamic weighting of horizontal, vertical, and channel responses through a complex-valued Eulerian formulation.
- Output Head: A final convolution maps integrated features to class logits for segmentation.
2. Module Formulations
The RDTE-UNet's three chief modules are mathematically defined as follows:
2.1 Adaptive Shape-Aware Boundary Enhancement (ASBE)
Given ,
Feature branches: Boundary-difference enhancement: Channels fused:
2.2 Horizontal-Vertical Detail Attention (HVDA)
For : Concatenation and fusion:
Self-attention module:
StairConv operations use asymmetric, multi-scale padded convolutions for directional feature extraction.
2.3 Euler Feature Fusion (EulerFF)
Given features (skip) and (decoder), project both into a complex Eulerian domain:
Apply group convolutions separately to real and imaginary parts for horizontal and vertical branches, aggregate channel-wise, and fuse:
3. Network Assembly and Layerwise Pipeline
The layerwise pipeline is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
f0 = ASBE(I) # → R^{H×W×C'} e1 = ResBlock(f0) # H/2, W/2, 2C' e2 = ResBlock(e1) # H/4, W/4, 4C' e3 = ResBlock(e2) # H/8, W/8, 8C' e4 = DetailsTransformer(e3) # H/16, W/16, 16C' e5 = DetailsTransformer(e4) # H/32, W/32, 32C' d5 = Deconv(e5) # H/16, W/16, 16C' d5 = EulerFF(skip=e4, dec=d5) d4 = Deconv(d5) # H/8, W/8, 8C' d4 = EulerFF(skip=e3, dec=d4) d3 = Deconv(d4) # H/4, W/4, 4C' d3 = EulerFF(skip=e2, dec=d3) d2 = Deconv(d3) # H/2, W/2, 2C' d2 = EulerFF(skip=e1, dec=d2) d1 = Deconv(d2) # H, W, C' d1 = EulerFF(skip=f0, dec=d1) out = Conv_{1×1}(d1) # H×W×N_classes return out |
4. Training Protocols and Hyperparameters
- Loss Function: Combination of Dice loss and cross-entropy
with .
- Optimizer: Adam, parameters , initial learning rate , weight decay .
- Learning Rate Scheduling: Cosine annealing with warm restarts (SGDR), training for 200 epochs.
- Batch Sizes: 8 (Synapse), 16 (BUSI).
- Data Augmentation: Random rotations (±15°), flips, intensity scaling.
5. Quantitative and Qualitative Results
On Synapse CT (multi-organ) and BUSI breast ultrasound benchmarks, RDTE-UNet achieves the following:
Synapse Dataset (60/40 split)
| Method | DSC (%) ↑ | HD95 (mm) ↓ |
|---|---|---|
| Trans-UNet | 79.15 | 28.47 |
| Swin-UNet | 81.03 | 19.54 |
| MT-UNet | 80.72 | 22.48 |
| RWKV-UNet | 85.62 | 14.83 |
| RDTE-UNet | 86.63 ★ | 11.69 ★ |
BUSI Dataset (70/30 split)
| Method | DSC (%) ↑ | HD95 (mm) ↓ |
|---|---|---|
| Trans-UNet | 60.42 | 32.78 |
| Swin-UNet | 62.91 | 30.67 |
| MT-UNet | 62.13 | 39.08 |
| RWKV-UNet | 64.85 | 29.57 |
| RDTE-UNet | 66.31 ★ | 27.73 ★ |
The superior boundary quality and structural consistency are substantiated by sharper edges and fewer false positives, particularly around complex morphologies such as the pancreas and fine-grained vasculature.
6. Context and Comparative Significance
RDTE-UNet advances the segmentation performance over previously reported methods by coupling explicit boundary enhancement (ASBE), refined directionality (HVDA), and mathematically principled fusion (EulerFF). The improvements are most pronounced in boundary-sensitive metrics (HD95) and Dice similarity coefficient (DSC). Its hybrid convolution-transformer backbone distinguishes it from pure CNN or typical hybrid architectures by its two-stream encoder and detail- and shape-aware auxiliary branches. This suggests the benefit of modeling interactions between local and long-range detail, especially when segmenting anatomically complex or ambiguous regions.
A plausible implication is that future segmentation methods in computational medicine may increasingly incorporate direction- and boundary-aware modules alongside global context mechanisms to simultaneously address fine structure delineation and region-level accuracy.