Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 148 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 34 tok/s Pro
GPT-5 High 40 tok/s Pro
GPT-4o 101 tok/s Pro
Kimi K2 183 tok/s Pro
GPT OSS 120B 443 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

RDTE-UNet: Hybrid Medical Segmentation

Updated 10 November 2025
  • RDTE-UNet is a neural architecture for precise medical image segmentation that integrates local convolutions with global transformer modules for enhanced edge detection and fine anatomical detail preservation.
  • It introduces three key modules: ASBE for adaptive edge sharpening, HVDA for directional feature attention, and EulerFF for effective feature fusion.
  • Evaluated on Synapse CT and BUSI ultrasound datasets, RDTE-UNet achieves higher Dice scores and lower boundary errors, demonstrating significant performance improvements.

RDTE-UNet is a neural architecture designed for precise medical image segmentation, targeting enhanced boundary accuracy and preservation of fine anatomical details. It achieves this by integrating local convolutional modeling with global Transformer-based context awareness. The architecture introduces three principal innovations: the Adaptive Shape-aware Boundary Enhancement (ASBE) module for edge sharpening, the Horizontal-Vertical Detail Attention (HVDA) block for fine-grained directional feature modeling, and the Euler Feature Fusion (EulerFF) module for direction- and channel-sensitive skip connection fusion. Evaluated on Synapse multi-organ CT and BUSI breast ultrasound datasets, RDTE-UNet demonstrates advanced segmentation performance in both quantitative and qualitative terms (Qu et al., 3 Nov 2025).

1. Architectural Overview

RDTE-UNet adopts a U-shaped encoder–decoder topology, comprising five encoder and five decoder stages connected via classical skip-links. The encoder alternates between standard ResBlock stages (1–3) and "Details Transformer" stages (4–5). The architecture is organized as follows:

  • Front-end ASBE Module: Replaces the initial convolution with an edge-aware block. ASBE utilizes Adaptive Rectangular Convolution for shape-dependent kernel learning and a boundary-difference operator for edge sharpening.
  • Hybrid Encoder Backbone: Stages 1–3 are implemented as ResBlocks for strong local representation. Stages 4–5 transition to a Details Transformer block comprising HVDA—emphasizing horizontal and vertical features using directional “StairConv” mechanisms—followed by a channel MLP and Pre-LN residual structures for global modeling.
  • Decoder Path and Fusion: At each upsampling stage, corresponding encoder features are fused with the decoder stream via the EulerFF module, allowing dynamic weighting of horizontal, vertical, and channel responses through a complex-valued Eulerian formulation.
  • Output Head: A final 1×11\times1 convolution maps integrated features to class logits for segmentation.

2. Module Formulations

The RDTE-UNet's three chief modules are mathematically defined as follows:

2.1 Adaptive Shape-Aware Boundary Enhancement (ASBE)

Given xRH×W×Cx\in\mathbb{R}^{H\times W\times C},

f0=Conv1×1(x)f_0 = \mathrm{Conv}_{1\times1}(x)

Feature branches: fpool=AvgPools(f0),far=ARConv(f0)f_{\mathrm{pool}} = \mathrm{AvgPool}_{s}(f_0), \qquad f_{\mathrm{ar}} = \mathrm{ARConv}(f_0) Boundary-difference enhancement: Δf=fpoolfar,fe=σ(αΔf)far\Delta f = f_{\mathrm{pool}} - f_{\mathrm{ar}}, \qquad f_e = \sigma(\alpha\, \Delta f) \odot f_{\mathrm{ar}} Channels fused: fout=Conv1×1([fe,f0])f_{\mathrm{out}} = \mathrm{Conv}_{1\times1}\left(\left[f_e,\, f_0\right]\right)

2.2 Horizontal-Vertical Detail Attention (HVDA)

For xinRh×w×dx_\mathrm{in}\in\mathbb{R}^{h\times w\times d}: xhd=StairConvh(xin),xvd=StairConvv(xin)x_{hd} = \mathrm{StairConv}_h(x_\mathrm{in}), \qquad x_{vd} = \mathrm{StairConv}_v(x_\mathrm{in}) Concatenation and fusion: xcat=ReLU(BN(Conv1×1([xhd,xvd])))+xhd+xvdx_{\mathrm{cat}} = \mathrm{ReLU}\left(\mathrm{BN}(\mathrm{Conv}_{1\times1}([x_{hd}, x_{vd}]))\right) + x_{hd} + x_{vd}

xfuse=ReLU(BN(Conv3×3(xC))+xC)+xhd+xvdx_{\mathrm{fuse}} = \mathrm{ReLU}\left(\mathrm{BN}(\mathrm{Conv}_{3\times3}(x_C)) + x_C\right) + x_{hd} + x_{vd}

Self-attention module: Q=WQxfuse, K=WKxfuse, V=WVxfuseQ = W_Q x_{\mathrm{fuse}},\ K = W_K x_{\mathrm{fuse}},\ V = W_V x_{\mathrm{fuse}}

B=softmax(QKTdk),HVDA(xin)=BVB = \mathrm{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right),\quad \mathrm{HVDA}(x_{\mathrm{in}}) = BV

StairConv operations use asymmetric, multi-scale padded convolutions for directional feature extraction.

2.3 Euler Feature Fusion (EulerFF)

Given features xsx_s (skip) and xdx_d (decoder), project both into a complex Eulerian domain: F(h)=Ahcos(θh)+jAhsin(θh)\mathcal{F}^{(h)} = A_h \cos(\theta_h) + jA_h \sin(\theta_h)

F(v)=Avcos(θv)+jAvsin(θv)\mathcal{F}^{(v)} = A_v \cos(\theta_v) + jA_v \sin(\theta_v)

Apply group convolutions separately to real and imaginary parts for horizontal and vertical branches, aggregate channel-wise, and fuse: Fout=Conv1×1([xs,xd,T~h,T~v,T~c])\mathcal{F}_{\mathrm{out}} = \mathrm{Conv}_{1\times1}\left([x_s, x_d, \widetilde{\mathcal{T}}_h, \widetilde{\mathcal{T}}_v, \widetilde{\mathcal{T}}_c]\right)

3. Network Assembly and Layerwise Pipeline

The layerwise pipeline is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
f0 = ASBE(I)                   # → R^{H×W×C'}
e1 = ResBlock(f0)              # H/2, W/2, 2C'
e2 = ResBlock(e1)              # H/4, W/4, 4C'
e3 = ResBlock(e2)              # H/8, W/8, 8C'
e4 = DetailsTransformer(e3)    # H/16, W/16, 16C'
e5 = DetailsTransformer(e4)    # H/32, W/32, 32C'

d5 = Deconv(e5)                # H/16, W/16, 16C'
d5 = EulerFF(skip=e4, dec=d5)
d4 = Deconv(d5)                # H/8, W/8, 8C'
d4 = EulerFF(skip=e3, dec=d4)
d3 = Deconv(d4)                # H/4, W/4, 4C'
d3 = EulerFF(skip=e2, dec=d3)
d2 = Deconv(d3)                # H/2, W/2, 2C'
d2 = EulerFF(skip=e1, dec=d2)
d1 = Deconv(d2)                # H, W, C'
d1 = EulerFF(skip=f0, dec=d1)

out = Conv_{1×1}(d1)           # H×W×N_classes
return out
This schema reflects the mixed convolutional-transformer design, edge-aware enhancement, and direction-sensitive post-processing prior to segmentation output.

4. Training Protocols and Hyperparameters

  • Loss Function: Combination of Dice loss and cross-entropy

L=λDice(1Dice(P,T))+λCECE(P,T)\mathcal{L} = \lambda_{\mathrm{Dice}} (1 - \mathrm{Dice}(P,T)) + \lambda_{\mathrm{CE}}\,\mathrm{CE}(P,T)

with λDice=λCE=0.5\lambda_{\mathrm{Dice}} = \lambda_{\mathrm{CE}} = 0.5.

  • Optimizer: Adam, parameters (β1=0.9,β2=0.999)(\beta_1=0.9,\, \beta_2=0.999), initial learning rate 1×1041\times10^{-4}, weight decay 1×1051\times10^{-5}.
  • Learning Rate Scheduling: Cosine annealing with warm restarts (SGDR), training for 200 epochs.
  • Batch Sizes: 8 (Synapse), 16 (BUSI).
  • Data Augmentation: Random rotations (±15°), flips, intensity scaling.

5. Quantitative and Qualitative Results

On Synapse CT (multi-organ) and BUSI breast ultrasound benchmarks, RDTE-UNet achieves the following:

Synapse Dataset (60/40 split)

Method DSC (%) ↑ HD95 (mm) ↓
Trans-UNet 79.15 28.47
Swin-UNet 81.03 19.54
MT-UNet 80.72 22.48
RWKV-UNet 85.62 14.83
RDTE-UNet 86.63 ★ 11.69 ★

BUSI Dataset (70/30 split)

Method DSC (%) ↑ HD95 (mm) ↓
Trans-UNet 60.42 32.78
Swin-UNet 62.91 30.67
MT-UNet 62.13 39.08
RWKV-UNet 64.85 29.57
RDTE-UNet 66.31 ★ 27.73 ★

The superior boundary quality and structural consistency are substantiated by sharper edges and fewer false positives, particularly around complex morphologies such as the pancreas and fine-grained vasculature.

6. Context and Comparative Significance

RDTE-UNet advances the segmentation performance over previously reported methods by coupling explicit boundary enhancement (ASBE), refined directionality (HVDA), and mathematically principled fusion (EulerFF). The improvements are most pronounced in boundary-sensitive metrics (HD95) and Dice similarity coefficient (DSC). Its hybrid convolution-transformer backbone distinguishes it from pure CNN or typical hybrid architectures by its two-stream encoder and detail- and shape-aware auxiliary branches. This suggests the benefit of modeling interactions between local and long-range detail, especially when segmenting anatomically complex or ambiguous regions.

A plausible implication is that future segmentation methods in computational medicine may increasingly incorporate direction- and boundary-aware modules alongside global context mechanisms to simultaneously address fine structure delineation and region-level accuracy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to RDTE-UNet.