Papers
Topics
Authors
Recent
Search
2000 character limit reached

RDTE-UNet: Hybrid Medical Segmentation

Updated 10 November 2025
  • RDTE-UNet is a neural architecture for precise medical image segmentation that integrates local convolutions with global transformer modules for enhanced edge detection and fine anatomical detail preservation.
  • It introduces three key modules: ASBE for adaptive edge sharpening, HVDA for directional feature attention, and EulerFF for effective feature fusion.
  • Evaluated on Synapse CT and BUSI ultrasound datasets, RDTE-UNet achieves higher Dice scores and lower boundary errors, demonstrating significant performance improvements.

RDTE-UNet is a neural architecture designed for precise medical image segmentation, targeting enhanced boundary accuracy and preservation of fine anatomical details. It achieves this by integrating local convolutional modeling with global Transformer-based context awareness. The architecture introduces three principal innovations: the Adaptive Shape-aware Boundary Enhancement (ASBE) module for edge sharpening, the Horizontal-Vertical Detail Attention (HVDA) block for fine-grained directional feature modeling, and the Euler Feature Fusion (EulerFF) module for direction- and channel-sensitive skip connection fusion. Evaluated on Synapse multi-organ CT and BUSI breast ultrasound datasets, RDTE-UNet demonstrates advanced segmentation performance in both quantitative and qualitative terms (Qu et al., 3 Nov 2025).

1. Architectural Overview

RDTE-UNet adopts a U-shaped encoder–decoder topology, comprising five encoder and five decoder stages connected via classical skip-links. The encoder alternates between standard ResBlock stages (1–3) and "Details Transformer" stages (4–5). The architecture is organized as follows:

  • Front-end ASBE Module: Replaces the initial convolution with an edge-aware block. ASBE utilizes Adaptive Rectangular Convolution for shape-dependent kernel learning and a boundary-difference operator for edge sharpening.
  • Hybrid Encoder Backbone: Stages 1–3 are implemented as ResBlocks for strong local representation. Stages 4–5 transition to a Details Transformer block comprising HVDA—emphasizing horizontal and vertical features using directional “StairConv” mechanisms—followed by a channel MLP and Pre-LN residual structures for global modeling.
  • Decoder Path and Fusion: At each upsampling stage, corresponding encoder features are fused with the decoder stream via the EulerFF module, allowing dynamic weighting of horizontal, vertical, and channel responses through a complex-valued Eulerian formulation.
  • Output Head: A final 1×11\times1 convolution maps integrated features to class logits for segmentation.

2. Module Formulations

The RDTE-UNet's three chief modules are mathematically defined as follows:

2.1 Adaptive Shape-Aware Boundary Enhancement (ASBE)

Given xRH×W×Cx\in\mathbb{R}^{H\times W\times C},

f0=Conv1×1(x)f_0 = \mathrm{Conv}_{1\times1}(x)

Feature branches: fpool=AvgPools(f0),far=ARConv(f0)f_{\mathrm{pool}} = \mathrm{AvgPool}_{s}(f_0), \qquad f_{\mathrm{ar}} = \mathrm{ARConv}(f_0) Boundary-difference enhancement: Δf=fpoolfar,fe=σ(αΔf)far\Delta f = f_{\mathrm{pool}} - f_{\mathrm{ar}}, \qquad f_e = \sigma(\alpha\, \Delta f) \odot f_{\mathrm{ar}} Channels fused: fout=Conv1×1([fe,f0])f_{\mathrm{out}} = \mathrm{Conv}_{1\times1}\left(\left[f_e,\, f_0\right]\right)

2.2 Horizontal-Vertical Detail Attention (HVDA)

For xinRh×w×dx_\mathrm{in}\in\mathbb{R}^{h\times w\times d}: xhd=StairConvh(xin),xvd=StairConvv(xin)x_{hd} = \mathrm{StairConv}_h(x_\mathrm{in}), \qquad x_{vd} = \mathrm{StairConv}_v(x_\mathrm{in}) Concatenation and fusion: xcat=ReLU(BN(Conv1×1([xhd,xvd])))+xhd+xvdx_{\mathrm{cat}} = \mathrm{ReLU}\left(\mathrm{BN}(\mathrm{Conv}_{1\times1}([x_{hd}, x_{vd}]))\right) + x_{hd} + x_{vd}

xfuse=ReLU(BN(Conv3×3(xC))+xC)+xhd+xvdx_{\mathrm{fuse}} = \mathrm{ReLU}\left(\mathrm{BN}(\mathrm{Conv}_{3\times3}(x_C)) + x_C\right) + x_{hd} + x_{vd}

Self-attention module: xRH×W×Cx\in\mathbb{R}^{H\times W\times C}0

xRH×W×Cx\in\mathbb{R}^{H\times W\times C}1

StairConv operations use asymmetric, multi-scale padded convolutions for directional feature extraction.

2.3 Euler Feature Fusion (EulerFF)

Given features xRH×W×Cx\in\mathbb{R}^{H\times W\times C}2 (skip) and xRH×W×Cx\in\mathbb{R}^{H\times W\times C}3 (decoder), project both into a complex Eulerian domain: xRH×W×Cx\in\mathbb{R}^{H\times W\times C}4

xRH×W×Cx\in\mathbb{R}^{H\times W\times C}5

Apply group convolutions separately to real and imaginary parts for horizontal and vertical branches, aggregate channel-wise, and fuse: xRH×W×Cx\in\mathbb{R}^{H\times W\times C}6

3. Network Assembly and Layerwise Pipeline

The layerwise pipeline is as follows: f0=Conv1×1(x)f_0 = \mathrm{Conv}_{1\times1}(x)2 This schema reflects the mixed convolutional-transformer design, edge-aware enhancement, and direction-sensitive post-processing prior to segmentation output.

4. Training Protocols and Hyperparameters

  • Loss Function: Combination of Dice loss and cross-entropy

xRH×W×Cx\in\mathbb{R}^{H\times W\times C}7

with xRH×W×Cx\in\mathbb{R}^{H\times W\times C}8.

  • Optimizer: Adam, parameters xRH×W×Cx\in\mathbb{R}^{H\times W\times C}9, initial learning rate f0=Conv1×1(x)f_0 = \mathrm{Conv}_{1\times1}(x)0, weight decay f0=Conv1×1(x)f_0 = \mathrm{Conv}_{1\times1}(x)1.
  • Learning Rate Scheduling: Cosine annealing with warm restarts (SGDR), training for 200 epochs.
  • Batch Sizes: 8 (Synapse), 16 (BUSI).
  • Data Augmentation: Random rotations (±15°), flips, intensity scaling.

5. Quantitative and Qualitative Results

On Synapse CT (multi-organ) and BUSI breast ultrasound benchmarks, RDTE-UNet achieves the following:

Synapse Dataset (60/40 split)

Method DSC (%) ↑ HD95 (mm) ↓
Trans-UNet 79.15 28.47
Swin-UNet 81.03 19.54
MT-UNet 80.72 22.48
RWKV-UNet 85.62 14.83
RDTE-UNet 86.63 ★ 11.69 ★

BUSI Dataset (70/30 split)

Method DSC (%) ↑ HD95 (mm) ↓
Trans-UNet 60.42 32.78
Swin-UNet 62.91 30.67
MT-UNet 62.13 39.08
RWKV-UNet 64.85 29.57
RDTE-UNet 66.31 ★ 27.73 ★

The superior boundary quality and structural consistency are substantiated by sharper edges and fewer false positives, particularly around complex morphologies such as the pancreas and fine-grained vasculature.

6. Context and Comparative Significance

RDTE-UNet advances the segmentation performance over previously reported methods by coupling explicit boundary enhancement (ASBE), refined directionality (HVDA), and mathematically principled fusion (EulerFF). The improvements are most pronounced in boundary-sensitive metrics (HD95) and Dice similarity coefficient (DSC). Its hybrid convolution-transformer backbone distinguishes it from pure CNN or typical hybrid architectures by its two-stream encoder and detail- and shape-aware auxiliary branches. This suggests the benefit of modeling interactions between local and long-range detail, especially when segmenting anatomically complex or ambiguous regions.

A plausible implication is that future segmentation methods in computational medicine may increasingly incorporate direction- and boundary-aware modules alongside global context mechanisms to simultaneously address fine structure delineation and region-level accuracy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RDTE-UNet.