Papers
Topics
Authors
Recent
2000 character limit reached

RDTE-UNet: Hybrid Medical Segmentation

Updated 10 November 2025
  • RDTE-UNet is a neural architecture for precise medical image segmentation that integrates local convolutions with global transformer modules for enhanced edge detection and fine anatomical detail preservation.
  • It introduces three key modules: ASBE for adaptive edge sharpening, HVDA for directional feature attention, and EulerFF for effective feature fusion.
  • Evaluated on Synapse CT and BUSI ultrasound datasets, RDTE-UNet achieves higher Dice scores and lower boundary errors, demonstrating significant performance improvements.

RDTE-UNet is a neural architecture designed for precise medical image segmentation, targeting enhanced boundary accuracy and preservation of fine anatomical details. It achieves this by integrating local convolutional modeling with global Transformer-based context awareness. The architecture introduces three principal innovations: the Adaptive Shape-aware Boundary Enhancement (ASBE) module for edge sharpening, the Horizontal-Vertical Detail Attention (HVDA) block for fine-grained directional feature modeling, and the Euler Feature Fusion (EulerFF) module for direction- and channel-sensitive skip connection fusion. Evaluated on Synapse multi-organ CT and BUSI breast ultrasound datasets, RDTE-UNet demonstrates advanced segmentation performance in both quantitative and qualitative terms (Qu et al., 3 Nov 2025).

1. Architectural Overview

RDTE-UNet adopts a U-shaped encoder–decoder topology, comprising five encoder and five decoder stages connected via classical skip-links. The encoder alternates between standard ResBlock stages (1–3) and "Details Transformer" stages (4–5). The architecture is organized as follows:

  • Front-end ASBE Module: Replaces the initial convolution with an edge-aware block. ASBE utilizes Adaptive Rectangular Convolution for shape-dependent kernel learning and a boundary-difference operator for edge sharpening.
  • Hybrid Encoder Backbone: Stages 1–3 are implemented as ResBlocks for strong local representation. Stages 4–5 transition to a Details Transformer block comprising HVDA—emphasizing horizontal and vertical features using directional “StairConv” mechanisms—followed by a channel MLP and Pre-LN residual structures for global modeling.
  • Decoder Path and Fusion: At each upsampling stage, corresponding encoder features are fused with the decoder stream via the EulerFF module, allowing dynamic weighting of horizontal, vertical, and channel responses through a complex-valued Eulerian formulation.
  • Output Head: A final 1×11\times1 convolution maps integrated features to class logits for segmentation.

2. Module Formulations

The RDTE-UNet's three chief modules are mathematically defined as follows:

2.1 Adaptive Shape-Aware Boundary Enhancement (ASBE)

Given xRH×W×Cx\in\mathbb{R}^{H\times W\times C},

f0=Conv1×1(x)f_0 = \mathrm{Conv}_{1\times1}(x)

Feature branches: fpool=AvgPools(f0),far=ARConv(f0)f_{\mathrm{pool}} = \mathrm{AvgPool}_{s}(f_0), \qquad f_{\mathrm{ar}} = \mathrm{ARConv}(f_0) Boundary-difference enhancement: Δf=fpoolfar,fe=σ(αΔf)far\Delta f = f_{\mathrm{pool}} - f_{\mathrm{ar}}, \qquad f_e = \sigma(\alpha\, \Delta f) \odot f_{\mathrm{ar}} Channels fused: fout=Conv1×1([fe,f0])f_{\mathrm{out}} = \mathrm{Conv}_{1\times1}\left(\left[f_e,\, f_0\right]\right)

2.2 Horizontal-Vertical Detail Attention (HVDA)

For xinRh×w×dx_\mathrm{in}\in\mathbb{R}^{h\times w\times d}: xhd=StairConvh(xin),xvd=StairConvv(xin)x_{hd} = \mathrm{StairConv}_h(x_\mathrm{in}), \qquad x_{vd} = \mathrm{StairConv}_v(x_\mathrm{in}) Concatenation and fusion: xcat=ReLU(BN(Conv1×1([xhd,xvd])))+xhd+xvdx_{\mathrm{cat}} = \mathrm{ReLU}\left(\mathrm{BN}(\mathrm{Conv}_{1\times1}([x_{hd}, x_{vd}]))\right) + x_{hd} + x_{vd}

xfuse=ReLU(BN(Conv3×3(xC))+xC)+xhd+xvdx_{\mathrm{fuse}} = \mathrm{ReLU}\left(\mathrm{BN}(\mathrm{Conv}_{3\times3}(x_C)) + x_C\right) + x_{hd} + x_{vd}

Self-attention module: Q=WQxfuse, K=WKxfuse, V=WVxfuseQ = W_Q x_{\mathrm{fuse}},\ K = W_K x_{\mathrm{fuse}},\ V = W_V x_{\mathrm{fuse}}

B=softmax(QKTdk),HVDA(xin)=BVB = \mathrm{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right),\quad \mathrm{HVDA}(x_{\mathrm{in}}) = BV

StairConv operations use asymmetric, multi-scale padded convolutions for directional feature extraction.

2.3 Euler Feature Fusion (EulerFF)

Given features xsx_s (skip) and xdx_d (decoder), project both into a complex Eulerian domain: F(h)=Ahcos(θh)+jAhsin(θh)\mathcal{F}^{(h)} = A_h \cos(\theta_h) + jA_h \sin(\theta_h)

F(v)=Avcos(θv)+jAvsin(θv)\mathcal{F}^{(v)} = A_v \cos(\theta_v) + jA_v \sin(\theta_v)

Apply group convolutions separately to real and imaginary parts for horizontal and vertical branches, aggregate channel-wise, and fuse: Fout=Conv1×1([xs,xd,T~h,T~v,T~c])\mathcal{F}_{\mathrm{out}} = \mathrm{Conv}_{1\times1}\left([x_s, x_d, \widetilde{\mathcal{T}}_h, \widetilde{\mathcal{T}}_v, \widetilde{\mathcal{T}}_c]\right)

3. Network Assembly and Layerwise Pipeline

The layerwise pipeline is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
f0 = ASBE(I)                   # → R^{H×W×C'}
e1 = ResBlock(f0)              # H/2, W/2, 2C'
e2 = ResBlock(e1)              # H/4, W/4, 4C'
e3 = ResBlock(e2)              # H/8, W/8, 8C'
e4 = DetailsTransformer(e3)    # H/16, W/16, 16C'
e5 = DetailsTransformer(e4)    # H/32, W/32, 32C'

d5 = Deconv(e5)                # H/16, W/16, 16C'
d5 = EulerFF(skip=e4, dec=d5)
d4 = Deconv(d5)                # H/8, W/8, 8C'
d4 = EulerFF(skip=e3, dec=d4)
d3 = Deconv(d4)                # H/4, W/4, 4C'
d3 = EulerFF(skip=e2, dec=d3)
d2 = Deconv(d3)                # H/2, W/2, 2C'
d2 = EulerFF(skip=e1, dec=d2)
d1 = Deconv(d2)                # H, W, C'
d1 = EulerFF(skip=f0, dec=d1)

out = Conv_{1×1}(d1)           # H×W×N_classes
return out
This schema reflects the mixed convolutional-transformer design, edge-aware enhancement, and direction-sensitive post-processing prior to segmentation output.

4. Training Protocols and Hyperparameters

  • Loss Function: Combination of Dice loss and cross-entropy

L=λDice(1Dice(P,T))+λCECE(P,T)\mathcal{L} = \lambda_{\mathrm{Dice}} (1 - \mathrm{Dice}(P,T)) + \lambda_{\mathrm{CE}}\,\mathrm{CE}(P,T)

with λDice=λCE=0.5\lambda_{\mathrm{Dice}} = \lambda_{\mathrm{CE}} = 0.5.

  • Optimizer: Adam, parameters (β1=0.9,β2=0.999)(\beta_1=0.9,\, \beta_2=0.999), initial learning rate 1×1041\times10^{-4}, weight decay 1×1051\times10^{-5}.
  • Learning Rate Scheduling: Cosine annealing with warm restarts (SGDR), training for 200 epochs.
  • Batch Sizes: 8 (Synapse), 16 (BUSI).
  • Data Augmentation: Random rotations (±15°), flips, intensity scaling.

5. Quantitative and Qualitative Results

On Synapse CT (multi-organ) and BUSI breast ultrasound benchmarks, RDTE-UNet achieves the following:

Synapse Dataset (60/40 split)

Method DSC (%) ↑ HD95 (mm) ↓
Trans-UNet 79.15 28.47
Swin-UNet 81.03 19.54
MT-UNet 80.72 22.48
RWKV-UNet 85.62 14.83
RDTE-UNet 86.63 ★ 11.69 ★

BUSI Dataset (70/30 split)

Method DSC (%) ↑ HD95 (mm) ↓
Trans-UNet 60.42 32.78
Swin-UNet 62.91 30.67
MT-UNet 62.13 39.08
RWKV-UNet 64.85 29.57
RDTE-UNet 66.31 ★ 27.73 ★

The superior boundary quality and structural consistency are substantiated by sharper edges and fewer false positives, particularly around complex morphologies such as the pancreas and fine-grained vasculature.

6. Context and Comparative Significance

RDTE-UNet advances the segmentation performance over previously reported methods by coupling explicit boundary enhancement (ASBE), refined directionality (HVDA), and mathematically principled fusion (EulerFF). The improvements are most pronounced in boundary-sensitive metrics (HD95) and Dice similarity coefficient (DSC). Its hybrid convolution-transformer backbone distinguishes it from pure CNN or typical hybrid architectures by its two-stream encoder and detail- and shape-aware auxiliary branches. This suggests the benefit of modeling interactions between local and long-range detail, especially when segmenting anatomically complex or ambiguous regions.

A plausible implication is that future segmentation methods in computational medicine may increasingly incorporate direction- and boundary-aware modules alongside global context mechanisms to simultaneously address fine structure delineation and region-level accuracy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to RDTE-UNet.