ConvKANs: Adaptive Nonlinear Convolution
- ConvKANs are neural architectures that replace fixed convolution kernels with learnable nonlinear functions based on spline activations to model complex spatial patterns.
- They achieve higher expressivity and parameter efficiency by reducing network depth while reaching competitive accuracy in tasks like image classification and medical imaging.
- Extensions incorporating input convexity, group equivariance, and complex-valued kernels improve robustness and adapt the model to domain-specific applications such as speech processing and scientific modeling.
ConvKANs (Convolutional Kolmogorov-Arnold Networks) are a class of neural architectures that generalize classical convolutional neural networks by replacing fixed-weight kernels with learnable, nonlinear function-based kernels, typically structured using the Kolmogorov-Arnold representation. This approach imbues the convolution operation with the adaptive, expressive power derived from univariate spline-based activations, enabling efficient modeling of complex spatial dependencies in both visual and non-visual domains. Recent developments extend ConvKANs with input convexity, complex-valued function support, group equivariance, and domain-specific adaptations for medical imaging, speech, and knowledge graphs.
1. Architectural Principles
ConvKANs modify the standard convolution operation by substituting each kernel element with a Kolmogorov-Arnold function—usually implemented as a learnable B-spline:
where is a spline basis function and are trainable coefficients. The canonical kernel element in ConvKANs adopts the form:
with weights determining the contribution of spline and SiLU nonlinearities, respectively (Bodner et al., 19 Jun 2024). The convolutional operation across an input image or feature map is thus generalized:
where each is a unique learnable function.
This representation framework is borrowed from the Kolmogorov-Arnold theorem, which states that any multivariate continuous function can be written as a finite sum of compositions of univariate functions:
Kernel-based extensions (CKNs) formally relate the convolution–activation–pooling pipeline to finite-dimensional projections of kernel maps via methods like Nyström approximation, and techniques such as Ultimate Layer Reversal (ULR) and intertwined Newton updates for efficient gradients are leveraged for scalable training (Jones et al., 2019).
2. Parameter Efficiency and Expressivity
By employing nonlinear spline activations within each kernel, ConvKANs typically require fewer layers and parameters to reach comparable or superior expressivity relative to standard convolutional architectures. For instance, a ConvKAN model (KKAN Small, 95k params) approaches the accuracy of a conventional CNN (Medium, 157k params) on Fashion-MNIST, usually within 0.5% accuracy margin (Bodner et al., 19 Jun 2024). Parameter count per kernel element is , leading to a total per-layer parameterization of for kernel size .
Spline-based activations allow adaptive modeling of richer input–output mappings during training, effectively reducing the need for network depth and supporting parameter-efficient representations. This property is crucial in resource-constrained environments (e.g., edge devices, medical imaging) and also facilitates architectural variants such as 3D ConvKANs for volumetric data (Patel et al., 24 Jul 2024).
3. Extensions: Convexity, Equivariance, and Complex-Valued Kernels
Recent ConvKAN extensions introduce properties tailored for specialized domains:
- Input-Convex ConvKANs (ICKANs) enforce convex or monotonic spline activations, guaranteeing polyconvexity in constitutive modeling tasks (e.g., hyperelasticity). Convexity is ensured by explicit constraints on spline control points:
yielding interpolates suitable for physically-admissible energy densities (Thakolkaran et al., 7 Mar 2025, Deschatre et al., 27 May 2025).
- Equivariant ConvKANs (EKANs) integrate matrix group equivariance by combining gated spline basis functions and blockwise equivariant linear weights, preserving symmetry properties critical in particle physics and dynamical systems. These architectures benefit from reduced parameter counts and increased data efficiency—EKANs can halve the parameter requirement compared to alternative equivariant models while maintaining accuracy (Hu et al., 1 Oct 2024).
- Complex-Valued ConvKANs (CVKANs) adapt the spline basis to the complex domain by using complex-valued RBFs and batch normalization:
enabling expressive modeling of signals and data naturally residing in , such as knot theory invariants. Such models demonstrate improved stability and parameter efficiency (Wolff et al., 4 Feb 2025).
4. Applications Across Domains
ConvKANs have shown competitive or superior results in several fields:
- Medical Imaging: 3D ConvKANs achieve high AUROC (up to 0.99 in 2D, perfect classification in 3D on certain cohorts) for Parkinson's disease MRI-based detection, surpassing or matching CNN and GCN benchmarks in multicenter transfer scenarios (Patel et al., 24 Jul 2024).
- Speech and Language: Integrating KAN layers into dense blocks of SLU networks improves F1 and accuracy, with optimal performance when a KAN layer is placed between two linear layers (FKF configuration). KANs re-weight attention towards contextually relevant input segments, aligning more closely with semantic reasoning (Koudounas et al., 26 May 2025).
- Scientific Modeling: Input-convex ConvKANs/ICKANs model polyconvex hyperelastic constitutive laws, learning interpretable, compact strain energy densities from noisy field data and supporting symbolic regression extraction of closed-form expressions (Thakolkaran et al., 7 Mar 2025).
- Image Classification: ConvKANs and KAN-Mixers surpass both MLP-Mixers and MLP baselines on Fashion-MNIST and CIFAR-10, highlighting their refined feature extraction and interpretability (Canuto et al., 11 Mar 2025).
- Knowledge-Grounded Conversation: ConvKANs serve as a backbone for conversational knowledge graph QA systems, leveraging scalable corpus generation (KdConv, KGConv, ConvKGYarn) and memory modules to enhance multi-turn information grounding (Zhou et al., 2020, Brabant et al., 2023, Pradeep et al., 12 Aug 2024).
5. Model Compression and Scalability
The multiplicative parameter expansion inherent to ConvKANs—each connection parameterized by vector-valued basis coefficients—may be prohibitive for deployment. The MetaCluster framework addresses this by using a meta-learner to map low-dimensional embeddings to coefficient vectors, projecting the coefficient distribution onto a low-dimensional manifold (Raffel et al., 21 Oct 2025). K-means clustering replaces per-edge vectors with shared centroids; post-clustering, the meta-learner is discarded, and a compact codebook plus per-edge indices are retained. This leads to storage reductions by up to 80× with negligible accuracy loss on benchmarks. The compression pipeline consists of:
- Joint meta-learner and network training: produces edge coefficients;
- K-means clustering on vectors;
- Replacement of per-edge vectors by centroid lookups and post-finetuning.
MetaCluster’s success suggests that expressive, function-based ConvKAN architectures can be deployed in memory-constrained, high-performance edge environments.
6. Mathematical Properties and Theoretical Guarantees
ConvKANs inherit universal approximation capabilities from Kolmogorov-Arnold networks. For convex versions (ICKANs), explicit universal approximation theorems guarantee that any Lipschitz convex function can be realized to arbitrary precision by composing sufficiently expressive piecewise-linear or spline layers (Deschatre et al., 27 May 2025). For equivariant extensions, the enforcement of for all in the symmetry group ensures strict equivariance throughout the architecture (Hu et al., 1 Oct 2024). Complex-valued variants maintain interpolation and stability by joint normalization of real and imaginary parts over compact grids (Wolff et al., 4 Feb 2025).
7. Future Directions and Open Challenges
ConvKANs represent a modular and extensible platform leveraging nonlinear function learning in convolutional settings. Current and future research directions include:
- Extending convexity constraints to higher-order spline activations for optimal transport and scientific inference.
- Systematic integration of group-equivariant ConvKANs in vision and language domains.
- Hybridization with kernel-based convolutional approaches, blending explicit kernel maps and learnable function representations (Jones et al., 2019).
- Scalably integrating ConvKANs into end-to-end conversational systems with knowledge graph augmentation.
- Further interpretability of learned nonlinear kernels via symbolic regression and visualization.
- Refinement of meta-learning compression strategies for deployment on hardware-limited devices.
ConvKANs continue to be expanded to accommodate domain-specific priors—convexity for physical modeling, equivariance for scientific application, and complex-valued computation for quantum, signal processing, and knot-theoretic settings. A plausible implication is broad applicability in resource- and domain-constrained deployment contexts, with theoretical guarantees on expressivity, symmetry, and convexity.