Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 148 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 34 tok/s Pro
GPT-5 High 40 tok/s Pro
GPT-4o 101 tok/s Pro
Kimi K2 183 tok/s Pro
GPT OSS 120B 443 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

UKAST: KAN-Enhanced Swin Transformer

Updated 11 November 2025
  • UKAST is a novel neural architecture for medical image segmentation that unifies a Swin Transformer encoder with rational-function-based GR-KANs.
  • It employs a U-Net-style encoder-decoder design where GR-KAN blocks replace traditional MLPs to enhance long-range dependency modeling and data efficiency.
  • Empirical evaluations show UKAST achieves state-of-the-art Dice scores across multiple benchmarks with minimal computational overhead, especially in low-data scenarios.

UKAST (U-Net-KAN-Enhanced Swin Transformer) is a neural architecture for medical image segmentation that unifies a Swin Transformer encoder with rational-function-based Kolmogorov-Arnold Networks (KANs) in its feed-forward layers. The architecture integrates Group Rational KANs (GR-KANs) for expressive and data-efficient modeling, addressing the challenges of long-range dependency modeling, computational cost, and data efficiency in segmentation tasks with limited annotated data (Sapkota et al., 6 Nov 2025).

1. Architectural Overview

UKAST employs a U-Net-style encoder-decoder design tailored for dense segmentation prediction. The encoder consists of a four-stage Swin Transformer backbone incorporating shifted-windowed self-attention, residual convolutional projections (RC), and GR-KANs. The input XRC×H×W\mathcal{X}\in\mathbb{R}^{C\times H\times W} is partitioned into non-overlapping patches, which are embedded into tokens via a learned linear projection. Each encoder stage operates at progressively reduced spatial resolution and increased channel depth.

The encoder stage structure is as follows:

  • Residual convolution (RC) projection: v0(s)=RC(zin(s))v^{(s)}_0 = \mathrm{RC}\bigl(z^{(s)}_{\text{in}}\bigr)
  • Windowed multi-head self-attention (W-MSA) + residual: z^1(s)=W ⁣ ⁣MSA(LN(v0(s)))+v0(s)\hat z^{(s)}_1 = \mathrm{W\!-\!MSA}\bigl(\mathrm{LN}(v^{(s)}_0)\bigr) + v^{(s)}_0
  • First GR-KAN feed-forward + residual: z1(s)=GR ⁣ ⁣KAN(LN(z^1(s)))+z^1(s)z^{(s)}_1 = \mathrm{GR\!-\!KAN}\bigl(\mathrm{LN}(\hat z^{(s)}_1)\bigr) + \hat z^{(s)}_1
  • Shifted windowed MSA (SW-MSA) + residual: z^2(s)=SW ⁣ ⁣MSA(LN(z1(s)))+z1(s)\hat z^{(s)}_2 = \mathrm{SW\!-\!MSA}\bigl(\mathrm{LN}(z^{(s)}_1)\bigr) + z^{(s)}_1
  • Second GR-KAN feed-forward + residual: zout(s)=GR ⁣ ⁣KAN(LN(z^2(s)))+z^2(s)z^{(s)}_{\text{out}} = \mathrm{GR\!-\!KAN}\bigl(\mathrm{LN}(\hat z^{(s)}_2)\bigr) + \hat z^{(s)}_2

Intermediate features zout(s)z^{(s)}_{\text{out}} are supplied via lateral skip connections to a symmetric CNN-based decoder. Each decoder stage performs deconvolution (upsampling), a Conv–BatchNorm–ReLU block, and concatenation with encoder-derived features. A final 1×11\times1 convolution projects the output to segmentation logits.

2. Rational-Function KANs and GR-KAN Integration

KANs serve as the feed-forward component in UKAST, replacing standard MLP blocks. Unlike conventional fixed activations (e.g., ReLU, GELU), UKAST leverages rational base functions of the form

ϕ(x)=wF(x),F(x)=P(x)1+Q(x)\phi(x) = w F(x),\qquad F(x) = \frac{P(x)}{1 + |Q(x)|}

where P(x)=a0+a1x++amxmP(x) = a_0 + a_1 x + \cdots + a_m x^m and Q(x)=b1x++bnxnQ(x) = b_1 x + \cdots + b_n x^n are polynomials with empirically chosen degrees m=3m=3, n=4n=4. The denominator ensures numerical stability, termed the “Safe Padé Activation Unit”.

For computational tractability, the Group Rational KAN (GR-KAN) blocks partition input channels dind_{\text{in}} into gg groups (g=8g=8 in experiments), sharing rational polynomial parameters {ai,bj}\{a_i, b_j\} within each group while maintaining independent scalar weights ww per edge. Formally,

GR ⁣ ⁣KAN(x)=W[F(x(1))F(x(g))]+b,\mathrm{GR\!-\!KAN}(\mathbf{x}) = W\left[F(\mathbf{x}_{(1)}) \oplus \cdots \oplus F(\mathbf{x}_{(g)})\right] + b,

with \oplus denoting channel-group concatenation. This structure reduces the number of unique polynomial parameters from din×doutd_{\text{in}}\times d_{\text{out}} to gg, yielding lower FLOPs and memory cost.

3. Computational Characteristics

A quantitative assessment of UKAST’s efficiency relative to SwinUNETR (the immediate baseline) is given in the table below:

Model FFN RC? GFLOPs #Params
SwinUNETR MLP 1.2500 6.302 M
UKAST GR-KAN 1.2467 6.3028 M
SwinUNETR + RC MLP 1.4419 7.1835 M
UKAST + RC GR-KAN 1.4386 7.1841 M

Replacing MLP with GR-KAN in the feed-forward network reduces total GFLOPs by approximately 0.3–0.4% and increases the parameter count by only \sim600 parameters in the RC-augmented variants, resulting in \sim1.4386 GFLOPs and $7.18$M parameters for UKAST+RC.

4. Empirical Evaluation on Medical Segmentation Benchmarks

UKAST was evaluated on four benchmarks: Kvasir-SEG and ISIC-2017 (2D datasets), and BCV (CT) and MMWHS (MRI) for 3D tasks. Dice scores, reported for both fully supervised and limited-data regimes, are as follows:

  • 100% Data (Dice Score)
    • Kvasir: SwinUNETR+RC 81.9 vs UKAST 81.7 (–0.2)
    • ISIC: 78.9 vs 79.9 (+1.0)
    • BCV: 68.9 vs 71.2 (+2.3)
    • MMWHS: 80.4 vs 80.8 (+0.4)
  • Limited Data Regimes (Dice Gain, UKAST above SwinUNETR+RC)
    • ISIC (10%, 25%, 50%, 100%): +1.6 / +0.3 / +1.2 / +1.0
    • BCV: +3.8 / +4.9 / +4.1 / +2.3

Performance in low-data scenarios—especially on 3D volumes—demonstrates that KAN-enhanced Transformers deliver significant data-efficiency improvements over MLP-based counterparts.

5. Comparative Analysis with Other Architectures

The following summarizes key results against contemporary CNN and Transformer baselines (parameters and Dice on [Kvasir, ISIC, BCV, MMWHS]):

Model Params Kvasir ISIC BCV MMWHS
U-Net 2.6 M 71.8 77.3 59.3 71.9
UNETR 8.3 M 63.7 74.7 52.7 70.3
SwinUNETR+RC 7.2 M 81.9 78.9 68.9 80.4
UKAST (Ours) 7.2 M 81.7 79.9 71.2 80.8

UKAST achieves parity or state-of-the-art performance for all listed tasks, with the notable advantage of improved accuracy in data-scarce regimes and comparable computational load.

6. Implementation Details

UKAST is implemented using PyTorch and the MONAI imaging toolkit. Training uses AdamW (learning rate 2×1042\times10^{-4}, weight decay 1×1031\times10^{-3}) with cosine annealing over 400 epochs and batch size 24. Data augmentations include random 320×320320\times320 crops, horizontal/vertical flips, 90° rotations, and Gaussian noise, applied equally to input images and masks. Testing employs overlapping patch-based sliding window inference (50% overlap).

The encoder pseudocode per stage ss is:

1
2
3
4
5
6
7
8
9
for stage s in [1..4]:
    input = previous_output or patch_embeddings
    v0 = ResidualConv(input)
    a1 = W_MSA(LN(v0)) + v0
    b1 = GR_KAN(LN(a1)) + a1
    a2 = SW_MSA(LN(b1)) + b1
    z_out = GR_KAN(LN(a2)) + a2
    store z_out for skip connection
    downsample z_out for next stage
The decoder mirrors this structure, upsampling features and fusing via concatenation and Conv–BN–ReLU sequence before a final 1×11\times1 convolution.

7. Significance and Outlook

UKAST establishes that integrating rational-function KANs as the feed-forward mechanism in hierarchical Swin Transformer encoders yields a model that is not only competitive with existing vision Transformers but also robust to scarce data scenarios, especially for 3D segmentation. The approach incurs negligible additional computational overhead, challenging the notion that expressivity increases must trade off against efficiency. This suggests broader applicability of KAN-augmented attention architectures for other data-efficient vision problems in the biomedical domain and beyond. Future research may further optimize group sizes within GR-KANs or investigate other rational-function parameterizations for even greater flexibility and compactness.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to UKAST.