Adaptive DB-KAUNet for Retinal Vessel Segmentation

Updated 8 December 2025

The paper introduces a dual-branch architecture that combines CNN and Transformer modules for effective local and global feature extraction in retinal vessel segmentation.
It employs specialized fusion modules, including an LDConv framework, to adaptively enhance features along vessel trajectories, achieving a 1.52% absolute F1 score gain.
The integration of attention mechanisms and geometry-aware fusion significantly improves robustness and precision compared to traditional UNet models.

The Adaptive Dual Branch Kolmogorov-Arnold UNet (DB-KAUNet) is a specialized architecture for retinal vessel segmentation, integrating dual-branch encoding, attention-based fusion, and geometrically adaptive filtering to address the challenge of accurately segmenting thin, tortuous vascular structures in complex backgrounds. DB-KAUNet employs a Heterogeneous Dual-Branch Encoder (HDBE) coupling parallel CNN and Transformer branches, alternates hybrid and geometry-aware fusion blocks, and introduces a Linear Deformable Convolution (LDConv) framework. In conjunction with novel attention and interaction modules, this design enables DB-KAUNet to model both local and global dependencies and adaptively enhance features along vessel trajectories, yielding superior segmentation performance and robustness on benchmark datasets (Xu et al., 1 Dec 2025).

1. Motivations and Design Principles

Traditional UNet and CNN-based vascular segmentation networks are limited by their reliance on fixed receptive fields, making them vulnerable to background noise and unable to capture the high-order, nonlinear structure of elongated and branching vessel morphology. Transformers improve long-range context modeling but are less effective at encoding spatially fine, local vessel details. DB-KAUNet mitigates these limitations through several central innovations:

Heterogeneous Dual-Branch Encoder (HDBE): Parallel branches process features via CNNs for local detail sensitivity and Transformers for global context. This duality enables comprehensive feature representation.
Specialized Fusion Modules: The network alternates between standard Spatial Feature Enhancement (SFE) modules and advanced Spatial Feature Enhancement with Geometrically Adaptive Fusion (SFE-GAF) blocks, providing a controlled balance between general spatial coherence and geometry-aware focus.
Attention and Interaction Mechanisms: Cross-Branch Channel Interaction (CCI) and Position Attention Module (PAM) increase channelwise exchange and spatial relevance reinforcement, setting the stage for adaptive fusion.

The explicit alternation of SFE and SFE-GAF within the encoder is designed to maximize anatomical alignment and feature robustness, passing fused skip connections to the decoder for accurate multi-scale reconstruction (Xu et al., 1 Dec 2025).

2. Network Architecture and Module Integration

The core pipeline of DB-KAUNet is characterized by:

Parallel Pathways: Within the HDBE, the CNN branch (for local structure) and Transformer branch (for global dependency) process the input in parallel.
KANConv and KAT Blocks: These novel modules are interleaved to introduce additional nonlinear modeling capacity and improve expressivity.
Cross-Branch Channel Interaction (CCI): Embeds channel-wise fusion between branches, ensuring mutual reinforcement and feature richness prior to spatial fusion.
SFE and SFE-GAF Modules:
- SFE: Applies PAM to both branch outputs, followed by 5×5 convolution and aggregation via summation.
- SFE-GAF: Replaces the 5×5 convolution with LDConv, whose dynamic offset grid adapts to local vessel geometry.

The SFE modules process outputs in standard encoder blocks (modules 2 and 4), while SFE-GAF is deployed in Kolmogorov–Arnold modules (modules 3 and 5). The fused features (Xₛₖᵢₚ) from these modules are transmitted to the symmetric decoder, following conventional UNet-like skip-connection and upsampling (Xu et al., 1 Dec 2025).

3. Geometrically Adaptive Fusion: SFE-GAF Module

The SFE-GAF module is designed to align fusion with the nonrigid, curved structure of retinal vessels, improving feature selection and reducing the inclusion of incoherent background. The process involves:

Input: Feature maps Lₑ (CNN) and Gₑ (Transformer) after CCI.
Attention: Each undergoes PAM, yielding L̂ and Ĝ in $\mathbb{R}^{C \times H \times W}$ .
LDConv: Both outputs are processed using an LDConv initialized with a fixed "X-shaped" sampling grid $S = \{(0,0), (1,1),..., (9,9)\} \cup \{(0,9), (1,8),...,(9,0)\}$ ( $N=20$ points). For each point, LDConv learns continuous offsets $\Delta p_n \in \mathbb{R}^2$ , causing the kernel to dynamically align with the vessel's direction and curvature.
Sampling and Weighting: At each spatial position $p_0$ ,

$Y(p_0) = \sum_{p_n \in S} w_n \cdot X(p_0 + p_n + \Delta p_n)$

with $w_n$ denoting learnable weights and bilinear sampling for subpixel accuracy.

Fusion: The two adapted feature outputs are combined via element-wise summation:

$X_{skip} = \mathrm{LDConv}(\mathrm{PAM}(L_e)) + \mathrm{LDConv}(\mathrm{PAM}(G_e))$

Integration: $X_{skip}$ is concatenated with the decoder’s upsampled features.

The alternation of standard SFE and SFE-GAF within the encoder ensures both general and vessel-specific spatial coherence (Xu et al., 1 Dec 2025).

4. Mathematical Formulation and Algorithm

A structured pseudocode for the SFE-GAF module, as implemented in DB-KAUNet, encapsulates the mechanism:

function SFE_GAF(L_fuse, G_fuse):
    # Position attention
    L_pa = PAM(L_fuse)
    G_pa = PAM(G_fuse)
    # Geometry-aware deformable convolution (LDConv)
    Δp_L, w_L = offset_weight_branch(L_pa)
    Δp_G, w_G = offset_weight_branch(G_pa)
    L_ld = deform_conv(L_pa, Δp_L, w_L)
    G_ld = deform_conv(G_pa, Δp_G, w_G)
    # Fusion
    X_skip = L_ld + G_ld
    return X_skip

In practice, loops over channels and spatial locations are vectorized. The offset branch employs a 3×3 convolution producing $2N$ offset channels, and the weight branch uses a 1×1 convolution for

N

weights. BatchNorm and GELU activation follow LDConv.

The initial offset is zero (centering sampling on the X-pattern), and training employs gradient clipping (max norm 5.0) to stabilize $\Delta p_n$ learning. All channel reductions are managed so that input and output shapes remain $[B, C, H, W]$ (Xu et al., 1 Dec 2025).

5. Performance Analysis and Ablation Studies

Comprehensive experiments on DRIVE, STARE, and CHASE_DB1 datasets demonstrate the effectiveness of DB-KAUNet. The ablation paper quantifies the benefit of SFE-GAF:

Model Variant	F1	SE	SP	ACC
Baseline (HDBE+CCI+SFE)	0.8812	0.8826	0.9827	0.9701
+ SFE-GAF (HDBE+CCI+SFE+GAF+KAN-Dec)	0.8964	0.8985	0.9848	0.9739

Replacing standard SFE with SFE-GAF yields a 1.52% absolute F1 gain, with similar improvements in sensitivity, specificity, and accuracy. These results confirm that geometry-aware fusion provides measurable advantages for capturing delicate vessel structures, especially under challenging visual conditions (Xu et al., 1 Dec 2025).

6. Relation to Broader Geometric and Adaptive Fusion Methods

The approach underlying SFE-GAF aligns DB-KAUNet with a wider class of geometry-aware adaptive fusion strategies. Notably, SFE-GAF-like constructs have been applied outside the biomedical domain, for example, to Digital Surface Model (DSM) fusion in remote sensing (Albanwan, 27 Apr 2024). In this context, spatial and semantic adaptivity is encoded via per-class bandwidths in the fusion weights, with geometry-awareness expressed through terms relating to surface normal and local context.

DB-KAUNet’s use of adaptive sampling, deformable convolutions, and attention-driven fusion exemplifies a broad paradigm of hybrid models that seek to balance local detail preservation with semantic and spatial adaptivity. This alignment with geometric and class-adaptive filtering suggests applicability to other structured segmentation problems where classical kernels fail to respect irregular and context-dependent object boundaries (Xu et al., 1 Dec 2025, Albanwan, 27 Apr 2024).

7. Implementation and Hyperparameter Details

The main implementation factors for DB-KAUNet are:

Sampling Points: $N=20$ (from a 10×10 cross-diagonal grid).
Offset Branch: 3×3 convolution with padding, producing $2N$ offset channels for LDConv.
Weight Branch: 1×1 convolution yielding $N$ channel weights per position.
PAM: Single-head, with $C \times HW$ affinity.
Activations: BatchNorm followed by GELU after LDConv.
Gradient Clipping: Max norm of 5.0 for offset stability.
Initialization: Offsets $\Delta p_n$ initialized to zero, weights from Kaiming normal.
Interleaving Pattern: SFE modules appear in encoder modules 2 and 4; SFE-GAF is deployed in modules 3 and 5.

The module design ensures minimal computational overhead compared to conventional convolutions, while providing substantial performance returns in both recall and precision, particularly for micro-vessels (Xu et al., 1 Dec 2025).

DB-KAUNet and its SFE-GAF module represent an overview of dual-branch encoding, cross-branch fusion, and dynamic geometry-aware adaptive filtering, substantially advancing the state-of-the-art in retinal vessel segmentation. The method’s architecture introduces general principles for leveraging local-global duality and explicit spatial adaptivity, with outcomes validated through rigorous ablation and cross-dataset evaluation (Xu et al., 1 Dec 2025).