Vision KAN (ViK): Efficient Nonlinear Vision Models

Updated 5 February 2026

Vision KAN (ViK) models are deep learning architectures that replace scalar weights with learnable univariate functions based on the Kolmogorov–Arnold theorem, enhancing nonlinear feature modeling.
They employ edgewise spline parametrization, innovative token mixing, and quantization strategies to achieve competitive performance on classification, detection, segmentation, and robotics tasks.
ViK architectures prioritize efficient memory management and hardware-compatible inference, using techniques like SHARe-KAN and static memory planning to reduce bandwidth and maintain high throughput.

Kolmogorov–Arnold Networks (KANs) have emerged as a functionally expressive architectural paradigm for deep learning, grounded in the Kolmogorov–Arnold representation theorem, which asserts that any continuous multivariate function can be decomposed into a finite sum of univariate functions and addition. Vision KAN (ViK) designates the family of models that transplant this principle into structured visual domains via learnable univariate (typically spline- or basis-expanded) functions in lieu of scalar weights at each edge of a convolutional or linear operation. In vision, this unlocks high-capacity nonlinear feature transformations and novel token-mixing strategies while confronting acute challenges in memory footprint, hardware deployment, and robustness. ViK architectures have attained competitive or state-of-the-art results on core computer vision tasks, including classification, detection, segmentation, and robotics, while elucidating new possibilities for interpretable, efficient, and theoretically principled model design.

1. Theoretical Foundations and Motivation

The Kolmogorov–Arnold theorem states that any continuous $n$ -variate function $f:[0,1]^n\to\mathbb{R}$ can be written as

$f(x_1, \dots, x_n) = \sum_{j=1}^{2n+1} g_j\left(\sum_{i=1}^n \phi_{ij}(x_i)\right),$

where $g_j$ and $\phi_{ij}$ are univariate continuous functions. In a KAN, these univariate functions are parameterized—commonly as B-splines, rational functions, or other basis expansions—with their coefficients learned via backpropagation.

The “edge-based nonlinearity” paradigm distinguishes KANs from classical MLPs and CNNs. Instead of fixed activations at nodes and linear weights at edges, every edge in KANs is endowed with a learnable univariate function, drastically increasing representational capacity and inducing a holographic information topology in which the model’s knowledge is stored in dense superpositions rather than localizable parameters. This is evidenced by rapid SVD decay in KAN coefficient matrices and catastrophic information loss upon pruning even modest fractions of edges (Smith, 10 Dec 2025).

2. Core Architectures and Mathematical Formalism

Vision KANs implement the Kolmogorov–Arnold decomposition at multiple architectural granularities:

Edgewise Spline Parametrization: Each weight between node $i$ (input) and $j$ (output) is replaced by a function

$\phi_{ij}(x) = \sum_{k=1}^G c_{ijk}\varphi_k(x),$

where $\varphi_k$ is a basis (e.g., B-spline, RBF) and $c_{ijk}$ are trainable.

Convolutional KAN (CKAN): Spatial convolutions are followed by per-channel or per-location KAN blocks instead of standard pointwise nonlinearity (Cang et al., 2024, Dai et al., 23 Oct 2025).
KAN Transformer Block: Drop-in replacement for MLP heads/FFNs in ViTs, where the feed-forward sublayer becomes

$f(\mathbf{x}) = \sum_{q=1}^{2d+1} \Phi_q\left(\sum_{p=1}^d \phi_{q,p}(x_p)\right)$

for $\mathbf{x}\in \mathbb{R}^d$ . Multiple basis (sine, B-spline, RBF, Fourier) and efficiency-improving variants have been explored (S et al., 3 Mar 2025, Dey et al., 7 May 2025).

Nonlinear Token Mixing: In "MultiPatch-RBFKAN," self-attention is replaced by patchwise KAN nonlinearity, axis-separable mixing, and low-rank global projection, slashing the complexity from $O(N^2C)$ to $O(NC)$ (Yang et al., 29 Jan 2026).

Parametric efficiency and computational viability are enhanced via:

Gain-Shape-Bias Vector Quantization: Post-training compression mapping each spline coefficient vector to the closest codebook shape, gain, and bias, minimizing storage with negligible accuracy drop (Smith, 10 Dec 2025).
Structured Pruning Avoidance: Pruning is destructive due to holographic redundancy, so low-rank quantization is applied instead.

3. Practical Implementation and Compression

KAN-based vision models pose a profound memory bandwidth challenge, especially as each edge/function pair requires a vector of spline coefficients (e.g., 55M parameters in detection heads), far outstripping scalar-weighted analogues (Smith, 10 Dec 2025). SHARe-KAN addresses this “memory wall” with:

Vector Quantization: After training, coefficient vectors are quantized by matching to prototypes in a compact shape codebook, with per-edge gain and bias (all stored as int8), yielding almost 88× bandwidth reduction (e.g., 1.13GB → 12.91MB for an SSD head on PASCAL VOC with <1% mAP drop).
Static Memory Planning (LUTHAM): Layers are sized such that codebooks and workspace fit entirely within L2 cache (655KB/layer), and a compiler (ExecuTorch) orchestrates zero-alloc, zero-copy execution, ensuring $>90\%$ L2 hit rate and enabling inference throughput above DRAM-limited baselines.

Holographic parameterization precludes sparsity-based compression, and instead, functional redundancy in spline superpositions is exploited for effective quantization (Smith, 10 Dec 2025).

4. Empirical Results Across Vision Tasks

Classification and Recognition:

On ImageNet-1K, ViK backbones using RBF-based patchwise KAN token mixers reach Top-1 accuracy of 80.3% at base scale, on par with DeiT-S/16 (79.8%) and outperforming attention-free MLP and localized CNN backbones for equivalent parameter count and complexity ( $O(NC)$ ) (Yang et al., 29 Jan 2026).
KAN-based ViT blocks (“Eff-KAN” or Hyb-KAN ViT) yield 82.6–84.5% Top-1 on ImageNet-1K and improve mIoU by 3–4 points on ADE20K semantic segmentation when using wavelet-augmented spectral modules (“Wavelet-KAN”) (Dey et al., 7 May 2025).

Detection and Segmentation:

Vision KAN detection heads, compressed via SHARe-KAN, achieve 85.23% mAP (float32) and 84.74% (int8) on PASCAL VOC, closely matching ResNet-50 baselines (Smith, 10 Dec 2025).
3D medical segmentation with 3D‐Group‐Rational KAN in “TK-Mamba” surpasses Mamba‐UNet and Swin‐UNETR on MSD and KiTS23, with overall Dice 59.28% (vs. 56.67% for SegMamba) (Yang et al., 24 May 2025).
In cross-modal 3D detection, ViK-based fusion outperforms LiDAR-only and MLP-fusion baselines by 9–11 mAP on TUMTraf Intersection, using KANConv layers in both camera and LiDAR encoders (Liu et al., 2024).

Continual Learning and Long-tail Classification:

ViK modules in ViTs provide local plasticity, restricting catastrophic forgetting and yielding superior incremental accuracy (by 1–2 points) in continual-learning protocols on MNIST and CIFAR-100 compared to standard MLP ViTs (Ullah et al., 5 Jul 2025).
As small collaborative models in large–small frameworks (“KCM”), ViK cuts large-model calls by ~40%, substantially increases accuracy on tail classes (by 7–12 points), and exhibits improved robustness to catastrophic forgetting and hallucinations, relative to MLP-based collaborators (Dai et al., 23 Oct 2025).

Robotic Manipulation and Flow Modeling:

“KAN-We-Flow” employs GroupKAN post-RWKV mixing for non-linear calibration, achieving an 86.8% parameter reduction vs. UNet-style policies and state-of-the-art success rates in Adroit and DexArt manipulation benchmarks (Chen et al., 1 Feb 2026).

5. Robustness, Regularization, and Limitations

KANs demonstrate enhanced function-fitting power, especially in high-data, noise-free regimes. However, unconstrained spline-based expansions are sensitive to label noise and overfitting, particularly in convolutional KANs (CKAN), which underperform when data are limited or noisy (Cang et al., 2024). To mitigate this:

Spline Smoothness Regularization: Penalizes second derivatives of the fitted spline to promote smoothness.
Segment Deactivation: Dropout-like stochastic replacement of spline segments with their linear endpoints, regularizing over-complex curvature.
L1 Weight Regularization: Partially closes robustness gaps under data corruption.

CKAN may be at a disadvantage compared to ViTs or CNNs in vision benchmarks where locality is critical (e.g., ResNet-18 still outperforms pure KAN-mixer on CIFAR-10/100 by a large margin) (Cheon, 2024).

6. Integration with Modern Vision Pipelines

ViK models serve as direct substitutes for conventional components:

Token mixers: MultiPatch-RBFKAN replaces self-attention; ViKANformer and Hyb-KAN ViT substitute MLPs in ViT blocks with various KAN expansions (Yang et al., 29 Jan 2026, S et al., 3 Mar 2025, Dey et al., 7 May 2025).
Convolutional backbones: KANConv and CKAN augment or supplant spatial convolutions and enhance feature fusion, as in Kaninfradet3D (Liu et al., 2024).
Backbone–head split: SHARe-KAN and LUTHAM facilitate dynamically loaded KAN heads, enabling multi-task, hot-swappable heads to coexist in cache-limited settings (Smith, 10 Dec 2025).
Hybrid and Multimodal Fusion: Eff-KAN and Wav-KAN modules orchestrate hierarchical, multi-scale, and spectral processing, while CLIP-based text branches enrich segmentation output with semantic priors (Dey et al., 7 May 2025, Yang et al., 24 May 2025).

ViK’s modularity enables use as a nonlinear calibration block atop state-space models (SSMs) like Mamba or RWKV, as groupwise nonlinearity layers, or as parameter-efficient, spectrum-adaptive feature refiners (Yang et al., 24 May 2025, Chen et al., 1 Feb 2026).

7. Implications, Use Cases, and Future Directions

ViK architectures unify functional, holographic nonlinear modeling, hardware-conscious deployment, and explicit spectral/spatial priors. Key implications and open lines include:

Memory-limited inference: SHARe-KAN and LUTHAM demonstrate fully cache-resident KAN deployment on modern accelerators, with 88× bandwidth reduction, enabling edge and embedded applications with strict latency/energy budgets (Smith, 10 Dec 2025).
Attention-free vision modeling: By exploiting KAN-based nonlinearities for token mixing and low-rank global propagation, ViK attains linear scaling, tractable high-resolution throughput, and interpretable bases (Yang et al., 29 Jan 2026).
Interpretability and parameter allocation: The ability to visualize spline and basis-function activations, and to allocate edgewise nonlinearity capacity, suits KANs to domain-adaptive or long-tail recognition settings (Dai et al., 23 Oct 2025).
Generalization and robustness: Regularization and architectural innovations (wavelet bases, rational expansions, group-sharing) are key for stability, especially as model sizes and data non-idealities increase (Cang et al., 2024, Yang et al., 24 May 2025).
Theory and hybridization: Exploring adaptive KAN order, convolution–KAN hybrids, parameter multiplexing, multimodal fusions, and combined spectral/attention methods represents an active area (Dey et al., 7 May 2025, Yang et al., 24 May 2025).

ViK provides a principled and empirically validated pathway for the integration of universal function approximation, efficient nonlinear mixing, and hardware-compatible inference, laying the groundwork for a new class of hybrid and interpretable vision systems.

Markdown Upgrade to Chat

References (11)

SHARe-KAN: Holographic Vector Quantization for Memory-Bound Inference (2025)

Can KAN Work? Exploring the Potential of Kolmogorov-Arnold Networks in Computer Vision (2024)

KCM: KAN-Based Collaboration Models Enhance Pretrained Large Models (2025)

ViKANformer: Embedding Kolmogorov Arnold Networks in Vision Transformers for Pattern-Based Learning (2025)

Hyb-KAN ViT: Hybrid Kolmogorov-Arnold Networks Augmented Vision Transformer (2025)

Vision KAN: Towards an Attention-Free Backbone for Vision with Kolmogorov-Arnold Networks (2026)

TK-Mamba: Marrying KAN with Mamba for Text-Driven 3D Medical Image Segmentation (2025)

Kaninfradet3D:A Road-side Camera-LiDAR Fusion 3D Perception Model based on Nonlinear Feature Extraction and Intrinsic Correlation (2024)

Exploring Kolmogorov-Arnold Network Expansions in Vision Transformers for Mitigating Catastrophic Forgetting in Continual Learning (2025)

10.

KAN We Flow? Advancing Robotic Manipulation with 3D Flow Matching via KAN & RWKV (2026)

11.

Demonstrating the Efficacy of Kolmogorov-Arnold Networks in Vision Tasks (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Vision KAN (ViK).