UKANFormer: KAN & Transformer Segmentation

Updated 26 October 2025

UKANFormer is a semantic segmentation framework that combines learnable KAN nonlinearities with transformer-based global-local fusion, enhancing detail recognition.
It employs a U-shaped encoder–decoder with a GL-Trans block that blends local convolutions and global attention to preserve fine details under noisy labels.
The model achieves state-of-the-art performance in coral reef mapping by mitigating supervision noise and demonstrating scalability for diverse applications.

UKANFormer is a semantic segmentation framework that combines Kolmogorov–Arnold Network (KAN) modules with transformer-based mechanisms, designed for robust fine-grained mapping under imperfect supervision. Its primary application in the literature is large-scale coral reef mapping with noisy labels, although variants of the architecture have been proposed for medical imaging and other domains. The following sections detail the model’s theoretical foundation, architectural components, algorithmic advances, experimental findings, and ecological significance.

1. Theoretical Foundation and KAN Principles

UKANFormer is grounded in the Kolmogorov–Arnold representation theorem, which asserts that any multivariate continuous function can be decomposed into superpositions of univariate nonlinear functions. In KAN-based neural networks, this principle is instantiated by replacing conventional fixed activation functions and weight matrices with learnable spline-based activation operators. The KAN block can be formulated as:

$\mathbf{z}_{k+1} = \operatorname{LN}\Bigl(\mathbf{z}_k + \operatorname{DwConv}\bigl(\operatorname{KAN}(\mathbf{z}_k)\bigr)\Bigr)$

where:

$\mathbf{z}_k$ is the token representation at the $k$ -th block,
$\operatorname{KAN}(\cdot)$ performs the nonlinear transformation using learnable edge functions,
$\operatorname{DwConv}(\cdot)$ is a depthwise convolution for computational efficiency,
$\operatorname{LN}$ is layer normalization.

The advantage is a greater model expressiveness for complex nonlinear spatial patterns, robust handling of label ambiguity, and increased interpretability relative to transformer or MLP-based alternatives (Li et al., 5 Jun 2024).

2. Model Architecture and the GL-Trans Decoder

UKANFormer extends the U-shaped encoder–decoder paradigm by introducing a Global-Local Transformer (GL-Trans) block within the decoder. Its major components are:

CNN Backbone Encoder: Extracts hierarchical feature maps from input imagery (e.g., remote sensing or biomedical data).
Tok-KAN Bottleneck: Tokenizes (vectorizes) intermediate features and subjects them to KAN modules for nonlinear adaptation.
GL-Trans Block in Decoder: This hybrid module is divided into:
- Local Branch: Utilizes parallel $1 \times 1$ and $3 \times 3$ convolutions (with batch normalization) to emphasize fine edges and textures:
$F_{\text{local}} = F_{1 \times 1}(X) + F_{3 \times 3}(X)$ - Global Branch: Flattens the feature tensor $X \in \mathbb{R}^{C \times H \times W}$ to $X_{\text{seq}} \in \mathbb{R}^{L \times D}$ and computes:

$Q = W_Q \cdot X_{\text{seq}},\quad K = W_K \cdot X_{\text{seq}},\quad V = W_V \cdot X_{\text{seq}}$

$A = \operatorname{Softmax}\left(\frac{Q K^\top}{\sqrt{D}} + \mathbf{0}\right),\quad F_{\text{global}} = \text{Reshape}(AV)$ - Fusion: Local and global outputs are combined via a depthwise separable convolution and $1 \times 1$ batch-normalized projection:

$F_{\text{Dw}} = \text{DWConv}(F_{\text{local}} + F_{\text{global}})$

$F_{\text{out}} = \operatorname{BatchNorm}(W_{1 \times 1} \cdot F_{\text{Dw}})$

This design enables simultaneous propagation of global semantic context and preservation of local boundary details, essential for fine-grained segmentation in noise-prone tasks (Dou et al., 19 Oct 2025).

3. Algorithmic Advances for Noise-Robust Segmentation

A principal challenge in ecological mapping is the prevalence of noisy or coarse supervision, as exemplified by globally available resources such as the Allen Coral Atlas. UKANFormer addresses this with:

KAN Nonlinearities: These operators can smooth label inconsistencies and recover boundary fidelity by approximating subtle transitions and correcting misaligned or ambiguous ground truth.
GL-Trans Fusion: By structurally blending local and global signals, the network avoids sparse or spatially fragmented mask predictions typical of models trained with noisy labels. This enables visually and structurally superior outputs compared to the labels themselves.

This suggests that architectural innovations can lessen the detrimental effect of label noise on segmentation performance, contrary to conventional assumptions about supervision quality being the limiting factor (Dou et al., 19 Oct 2025).

4. Empirical Performance and Benchmark Results

In comparative studies against coral mapping baselines (UNet, UKAN, UNetFormer), UKANFormer achieves:

Model	Coral-class IoU (%)	Pixel Accuracy (%)
UNet	<67	<83.98
UKAN	<67	<83.98
UNetFormer	<67	<83.98
UKANFormer	67.00	83.98

These metrics indicate state-of-the-art fine-grained segmentation in challenging regions, overcoming the spatial imprecision and semantic inconsistency of noisy training data (Dou et al., 19 Oct 2025). The model consistently retrieves more detailed reef boundaries and preserves large continuous coral regions, attributes crucial for downstream ecological analyses.

5. Domain Extensions and Ecological Significance

Beyond coral reef mapping, UKANFormer’s core advances—KAN-empowered nonlinear modeling and transformer-based context fusion—have broader implications:

Scalable Mapping: Enables regionally adaptive, high-resolution ecosystem monitoring independent of expert supervision density.
Medical Imaging: Variants of UKANFormer (e.g., UKAN-EP, Implicit U-KAN 2.0) integrate attention modules, dynamic loss functions, and NODE-based continuous evolution for segmentation tasks in MRI, ultrasound, and digital pathology. These yield improvements in Dice scores, Hausdorff distance, and robustness to noise (Chen et al., 1 Aug 2024, Cheng et al., 5 Mar 2025).
Interpretability and Efficiency: The KAN paradigm facilitates “white-box” analysis of feature interactions and maintains parameter efficiency using depthwise convolutions and tokenization (Li et al., 5 Jun 2024).
Noise Mitigation: The model’s ability to surpass label quality challenges conventional expectations, supporting deployment in other domains characterized by imperfect annotations (e.g., remote sensing, surveillance).

A plausible implication is that UKANFormer-style frameworks may become foundational for automated, scalable segmentation tasks in both science and industry where supervision quality is an intrinsic bottleneck.

6. Technical Details and Implementation Considerations

The architectural choices in UKANFormer entail several considerations for applied use:

KAN Layers: Spline-based activation functions require careful initialization and regularization to balance expressiveness with stability.
GL-Trans Block: The dual-branch fusion benefits from channel-dimensional balancing and may require tuning for specific data resolutions.
Training Efficiency: Depthwise convolutions, tokenization, and fusion operations allow for computational efficiency (comparable or superior GFLOPs to transformer-heavy baselines), supporting deployment on resource-constrained hardware (Li et al., 5 Jun 2024).
Supervision Strategies: The model’s robustness to noisy labels makes it suitable for large-scale deployment in ecological and biomedical mapping, but post-hoc evaluation against expert annotations remains essential.

UKANFormer integrates KAN-based nonlinear feature adaptation and transformer-driven global-local context fusion, yielding empirically validated advances in segmentation fidelity and noise resistance across a range of challenging scientific domains.

PDF Markdown Chat (Pro)

References (4)

U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation (2024)

UKANFormer: Noise-Robust Semantic Segmentation for Coral Reef Mapping via a Kolmogorov-Arnold Network-Transformer Hybrid (2025)

UKAN-EP: Enhancing U-KAN with Efficient Attention and Pyramid Aggregation for 3D Multi-Modal MRI Brain Tumor Segmentation (2024)

Implicit U-KAN2.0: Dynamic, Efficient and Interpretable Medical Image Segmentation (2025)

Follow Topic

Get notified by email when new papers are published related to UKANFormer.