RDTE-UNet: Hybrid Medical Segmentation

Updated 10 November 2025

RDTE-UNet is a neural architecture for precise medical image segmentation that integrates local convolutions with global transformer modules for enhanced edge detection and fine anatomical detail preservation.
It introduces three key modules: ASBE for adaptive edge sharpening, HVDA for directional feature attention, and EulerFF for effective feature fusion.
Evaluated on Synapse CT and BUSI ultrasound datasets, RDTE-UNet achieves higher Dice scores and lower boundary errors, demonstrating significant performance improvements.

RDTE-UNet is a neural architecture designed for precise medical image segmentation, targeting enhanced boundary accuracy and preservation of fine anatomical details. It achieves this by integrating local convolutional modeling with global Transformer-based context awareness. The architecture introduces three principal innovations: the Adaptive Shape-aware Boundary Enhancement (ASBE) module for edge sharpening, the Horizontal-Vertical Detail Attention (HVDA) block for fine-grained directional feature modeling, and the Euler Feature Fusion (EulerFF) module for direction- and channel-sensitive skip connection fusion. Evaluated on Synapse multi-organ CT and BUSI breast ultrasound datasets, RDTE-UNet demonstrates advanced segmentation performance in both quantitative and qualitative terms (Qu et al., 3 Nov 2025).

1. Architectural Overview

RDTE-UNet adopts a U-shaped encoder–decoder topology, comprising five encoder and five decoder stages connected via classical skip-links. The encoder alternates between standard ResBlock stages (1–3) and "Details Transformer" stages (4–5). The architecture is organized as follows:

Front-end ASBE Module: Replaces the initial convolution with an edge-aware block. ASBE utilizes Adaptive Rectangular Convolution for shape-dependent kernel learning and a boundary-difference operator for edge sharpening.
Hybrid Encoder Backbone: Stages 1–3 are implemented as ResBlocks for strong local representation. Stages 4–5 transition to a Details Transformer block comprising HVDA—emphasizing horizontal and vertical features using directional “StairConv” mechanisms—followed by a channel MLP and Pre-LN residual structures for global modeling.
Decoder Path and Fusion: At each upsampling stage, corresponding encoder features are fused with the decoder stream via the EulerFF module, allowing dynamic weighting of horizontal, vertical, and channel responses through a complex-valued Eulerian formulation.
Output Head: A final $1\times1$ convolution maps integrated features to class logits for segmentation.

2. Module Formulations

The RDTE-UNet's three chief modules are mathematically defined as follows:

2.1 Adaptive Shape-Aware Boundary Enhancement (ASBE)

Given $x\in\mathbb{R}^{H\times W\times C}$ ,

$f_0 = \mathrm{Conv}_{1\times1}(x)$

Feature branches: $f_{\mathrm{pool}} = \mathrm{AvgPool}_{s}(f_0), \qquad f_{\mathrm{ar}} = \mathrm{ARConv}(f_0)$ Boundary-difference enhancement: $\Delta f = f_{\mathrm{pool}} - f_{\mathrm{ar}}, \qquad f_e = \sigma(\alpha\, \Delta f) \odot f_{\mathrm{ar}}$ Channels fused: $f_{\mathrm{out}} = \mathrm{Conv}_{1\times1}\left(\left[f_e,\, f_0\right]\right)$

2.2 Horizontal-Vertical Detail Attention (HVDA)

For $x_\mathrm{in}\in\mathbb{R}^{h\times w\times d}$ : $x_{hd} = \mathrm{StairConv}_h(x_\mathrm{in}), \qquad x_{vd} = \mathrm{StairConv}_v(x_\mathrm{in})$ Concatenation and fusion: $x_{\mathrm{cat}} = \mathrm{ReLU}\left(\mathrm{BN}(\mathrm{Conv}_{1\times1}([x_{hd}, x_{vd}]))\right) + x_{hd} + x_{vd}$

$x_{\mathrm{fuse}} = \mathrm{ReLU}\left(\mathrm{BN}(\mathrm{Conv}_{3\times3}(x_C)) + x_C\right) + x_{hd} + x_{vd}$

Self-attention module: $Q = W_Q x_{\mathrm{fuse}},\ K = W_K x_{\mathrm{fuse}},\ V = W_V x_{\mathrm{fuse}}$

$B = \mathrm{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right),\quad \mathrm{HVDA}(x_{\mathrm{in}}) = BV$

StairConv operations use asymmetric, multi-scale padded convolutions for directional feature extraction.

2.3 Euler Feature Fusion (EulerFF)

Given features $x_s$ (skip) and $x_d$ (decoder), project both into a complex Eulerian domain: $\mathcal{F}^{(h)} = A_h \cos(\theta_h) + jA_h \sin(\theta_h)$

$\mathcal{F}^{(v)} = A_v \cos(\theta_v) + jA_v \sin(\theta_v)$

Apply group convolutions separately to real and imaginary parts for horizontal and vertical branches, aggregate channel-wise, and fuse: $\mathcal{F}_{\mathrm{out}} = \mathrm{Conv}_{1\times1}\left([x_s, x_d, \widetilde{\mathcal{T}}_h, \widetilde{\mathcal{T}}_v, \widetilde{\mathcal{T}}_c]\right)$

3. Network Assembly and Layerwise Pipeline

The layerwise pipeline is as follows:

f0 = ASBE(I)                   # → R^{H×W×C'}
e1 = ResBlock(f0)              # H/2, W/2, 2C'
e2 = ResBlock(e1)              # H/4, W/4, 4C'
e3 = ResBlock(e2)              # H/8, W/8, 8C'
e4 = DetailsTransformer(e3)    # H/16, W/16, 16C'
e5 = DetailsTransformer(e4)    # H/32, W/32, 32C'

d5 = Deconv(e5)                # H/16, W/16, 16C'
d5 = EulerFF(skip=e4, dec=d5)
d4 = Deconv(d5)                # H/8, W/8, 8C'
d4 = EulerFF(skip=e3, dec=d4)
d3 = Deconv(d4)                # H/4, W/4, 4C'
d3 = EulerFF(skip=e2, dec=d3)
d2 = Deconv(d3)                # H/2, W/2, 2C'
d2 = EulerFF(skip=e1, dec=d2)
d1 = Deconv(d2)                # H, W, C'
d1 = EulerFF(skip=f0, dec=d1)

out = Conv_{1×1}(d1)           # H×W×N_classes
return out

This schema reflects the mixed convolutional-transformer design, edge-aware enhancement, and direction-sensitive post-processing prior to segmentation output.

4. Training Protocols and Hyperparameters

Loss Function: Combination of Dice loss and cross-entropy

$\mathcal{L} = \lambda_{\mathrm{Dice}} (1 - \mathrm{Dice}(P,T)) + \lambda_{\mathrm{CE}}\,\mathrm{CE}(P,T)$

with $\lambda_{\mathrm{Dice}} = \lambda_{\mathrm{CE}} = 0.5$ .

Optimizer: Adam, parameters $(\beta_1=0.9,\, \beta_2=0.999)$ , initial learning rate $1\times10^{-4}$ , weight decay $1\times10^{-5}$ .
Learning Rate Scheduling: Cosine annealing with warm restarts (SGDR), training for 200 epochs.
Batch Sizes: 8 (Synapse), 16 (BUSI).
Data Augmentation: Random rotations (±15°), flips, intensity scaling.

5. Quantitative and Qualitative Results

On Synapse CT (multi-organ) and BUSI breast ultrasound benchmarks, RDTE-UNet achieves the following:

Synapse Dataset (60/40 split)

Method	DSC (%) ↑	HD95 (mm) ↓
Trans-UNet	79.15	28.47
Swin-UNet	81.03	19.54
MT-UNet	80.72	22.48
RWKV-UNet	85.62	14.83
RDTE-UNet	86.63 ★	11.69 ★

BUSI Dataset (70/30 split)

Method	DSC (%) ↑	HD95 (mm) ↓
Trans-UNet	60.42	32.78
Swin-UNet	62.91	30.67
MT-UNet	62.13	39.08
RWKV-UNet	64.85	29.57
RDTE-UNet	66.31 ★	27.73 ★

The superior boundary quality and structural consistency are substantiated by sharper edges and fewer false positives, particularly around complex morphologies such as the pancreas and fine-grained vasculature.

6. Context and Comparative Significance

RDTE-UNet advances the segmentation performance over previously reported methods by coupling explicit boundary enhancement (ASBE), refined directionality (HVDA), and mathematically principled fusion (EulerFF). The improvements are most pronounced in boundary-sensitive metrics (HD95) and Dice similarity coefficient (DSC). Its hybrid convolution-transformer backbone distinguishes it from pure CNN or typical hybrid architectures by its two-stream encoder and detail- and shape-aware auxiliary branches. This suggests the benefit of modeling interactions between local and long-range detail, especially when segmenting anatomically complex or ambiguous regions.

A plausible implication is that future segmentation methods in computational medicine may increasingly incorporate direction- and boundary-aware modules alongside global context mechanisms to simultaneously address fine structure delineation and region-level accuracy.

PDF Markdown Chat (Pro)

References (1)

RDTE-UNet: A Boundary and Detail Aware UNet for Precise Medical Image Segmentation (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to RDTE-UNet.