Spline-based EfficientKAN

Updated 27 November 2025

The paper presents spline-based EfficientKAN, which systematically compresses redundant spline parameterizations via basis projection to achieve up to 80% parameter reduction.
The approach leverages parameterized B-spline bases, entropy-driven regularization, and gravitational penalties to optimize sparsity, robustness, and computational efficiency.
Practical applications in scientific modeling, time series forecasting, and feature selection demonstrate Enhanced generalization and interpretability with reduced overfitting.

A spline-based EfficientKAN is a Kolmogorov–Arnold Network variant engineered to achieve interpretable, compact, and computationally efficient representations by leveraging parameterized B-spline bases for all learnable nonlinearities—specifically, by automatically regularizing, compressing, and/or projecting overcomplete spline parameterizations into low-dimensional, functionally concise bases. This approach mitigates redundancy and nuisance space in spline edge-parameterizations, systematically reduces parameter count (often by ≈80%), and substantially enhances generalization and robustness in both deterministic and probabilistic machine learning tasks.

1. Theoretical Foundations and Motivation

The Kolmogorov–Arnold representation theorem establishes that any continuous multivariate mapping $f:\mathbb{R}^d\to\mathbb{R}$ can be constructed via compositions of univariate functions acting on affine projections of the input. Classical KAN architectures instantiate every edge as a learnable univariate spline $f_e(x)=\sum_j \alpha_{e,j} \varphi_j(x)$ with B-spline bases $\{\varphi_j\}$ , degree $k$ , and $m$ coefficients $\alpha_e\in\mathbb{R}^m$ per edge. However, many choices of $\alpha_e$ yield identical functions $f_e$ due to the high redundancy of spline parameterizations, inflating the model's parameter space and giving rise to large nuisance subspaces in the training Jacobian $\mathcal{N}=\ker(J)$ , where $J=\partial f/\partial \alpha$ (Poole et al., 24 Sep 2025).

This redundancy causes overparameterization, increased susceptibility to overfitting, and poor robustness, particularly in scientific and interpretable modeling regimes.

2. Spline Parameterization and Edge Functional Representation

Each edge in a spline-based EfficientKAN implements a univariate function parameterized as a B-spline expansion: $f_e(x) = \sum_{j=1}^m \alpha_{e,j} \varphi_j(x)$ where:

$\varphi_j$ are fixed B-spline basis functions of degree $k$ over a specific knot sequence,
$m=G+k$ with $G$ the number of interior knots; typical choices are $k=3$ (cubic) and $G\sim3$ –$16$ (Poole et al., 24 Sep 2025, Bodner et al., 19 Jun 2024, Aung et al., 26 Nov 2025).

Spline bases provide localized, high-order polynomial approximations with compact support. In practice, B-spline edge functions are often post-multiplied by scalar weights, optionally summed with a fixed "residual" base activation (e.g., SiLU), to enable efficient representation of both linear and nonlinear behaviors (Bodner et al., 19 Jun 2024, Qiu et al., 18 Dec 2024, Aung et al., 26 Nov 2025).

3. Entropy Minimization and Projective Compression

The core innovation of spline-based EfficientKAN is its entropy-driven compression via projection onto low-dimensional functional subspaces. The process is as follows (Poole et al., 24 Sep 2025):

For each edge $e$ , and for a candidate family of orthonormal bases $\{\psi_k^{(\mathcal{R})}\}$ (e.g., Fourier, Chebyshev, Bessel), compute discrete projection coefficients:

$c_{e,k}^{(\mathcal{R})} = \sum_{x\in\Omega} f_e(x)\;\psi_k^{(\mathcal{R})}(x)\,\Delta x$

Normalize coefficient magnitudes to obtain discrete probability vectors $p_{e,k}^{(\mathcal{R})}$ , and compute basis entropy:

$H_e^{(\mathcal{R})} = -\sum_k p_{e,k}^{(\mathcal{R})} \log p_{e,k}^{(\mathcal{R})}$

Add an entropy-based regularizer to the loss:

$R_{\rm ent} = \lambda_{\rm ent} \sum_e \min_{\mathcal{R}} H_e^{(\mathcal{R})}$

At periodic intervals, project $f_e$ onto the minimal-entropy basis $\mathcal{R}_e^*$ . Retain only the $r\ll m$ dominant coefficients and reconstruct a reduced target spline coefficient vector $\alpha_e^*$ by (pseudo-)inverse.
Apply a "gravitational" regularization term:

$R_{\rm grav} = \lambda_{\rm grav} \sum_e \sum_j w_{e,j} (\alpha_{e,j}-\alpha_{e,j}^*)^2$

Overall training objective:

$L_{\rm total} = L_{\rm data} + \lambda_{\rm ent} \sum_e \min_{\mathcal{R}} H_e^{(\mathcal{R})} + \lambda_{\rm grav} \sum_e \sum_j w_{e,j} (\alpha_{e,j}-\alpha_{e,j}^*)^2$

This pipeline ensures that each edge's spline representation is compressed to the minimal description required for its target functional behavior, automatically discovering the optimal basis (mixed-mode across edges).

4. Algorithmic and Implementation Aspects

Initialization and Hyperparameters:

Spline coefficients initialized as $\alpha_e \sim \mathcal{N}(0, \sigma^2)$ (small $\sigma$ ).
Regularization weights: $\lambda_{\rm ent}\sim10^{-2}$ (increase if basis selection is unstable), $\lambda_{\rm grav}\sim10^{-3}$ – $10^{-1}$ for the gravitational pull.
Grid $\Omega$ for projection: $|\Omega|\approx100$ –$500$ points per edge.
Projection interval: $P\approx50$ –$200$ steps per edge (Poole et al., 24 Sep 2025).

Parameter Budget:

Original spline per-edge dimension: $m$ (e.g., $m=23$ for cubic B-splines).
After projection to $r$ dominant basis functions $(r\approx4)$ , per-edge parameter reduction is $1-\frac{r}{m}\approx0.83$ .
Empirical studies demonstrate up to $80\%$ reduction in parameter count without loss in $R^2$ on diverse tasks and significantly increased robustness under noisy conditions (e.g., under $5$–$30$ dB additive noise, EfficientKAN maintains $MSE<10^{-1}$ ; standard KAN exceeds $10^{13}$ ) (Poole et al., 24 Sep 2025).

Training Procedure:

Alternate between standard gradient descent on the total loss and periodic basis-projection steps.
For each edge, at fixed projection frequency, update target representation $\alpha_e^*$ and basis selection, then apply soft "locking" via the gravitational penalty.

5. Practical Applications and Performance

Spline-based EfficientKAN variants have been applied across multiple domains:

Scientific Machine Learning: Automated fiber placement prediction and deterministic regression tasks, where basis entropy minimization yields significant model compression and interpretability gains (Poole et al., 24 Sep 2025).
Probabilistic Time Series Forecasting: Direct parameterization of predictive distributions for resource-efficient uncertainty-aware satellite communication forecasting, using sparse spline basis expansions for each functional connection. EfficientKAN achieves lower MSE, CRPS, and better calibration than MLP or vanilla KAN models, with <50% of the parameters of comparably accurate MLPs (Vaca-Rubio et al., 19 Oct 2025).
Tabular Feature Selection: Interpretation-focused applications leverage the explicit access to spline weights for direct feature importance quantification, outperforming classical L1/L2 and impurity-based selectors in several benchmarks (Akazan et al., 27 Sep 2025).
Convolutional and Graph Neural Networks: Spline-based EfficientKAN concepts extend to convolution and graph architectures (KKAN, GKAN), where univariate B-spline learnt filters replace fixed kernels or scalar edge weights, providing parameter-efficient adaptivity and improved test accuracy relative to parameter-matched classical baselines (Bodner et al., 19 Jun 2024, Kiamari et al., 10 Jun 2024).

6. Computational Complexity and Acceleration

The initial spline-based KAN incurs increased training and inference cost due to recursive B-spline evaluation. EfficientKAN is amenable to several acceleration techniques:

Matrix-based Parallelization: MatrixKAN replaces Cox–de Boor recursion with precomputed local basis matrices, reducing inference complexity from $O(k)$ sequential steps per evaluation to a constant matrix-multiply depth, enabling 40× empirical GPU speedups for realistic network sizes (Coffman et al., 11 Feb 2025).
PowerMLP-based Approximation: The PowerMLP architecture rewrites spline functions in terms of ReLU-power bases, allowing MLP-style evaluation efficiency while maintaining KAN expressivity over bounded domains (Qiu et al., 18 Dec 2024).
Batch-Optimized GPU Libraries: EfficientKAN implementations leverage GPU kernels exploiting B-spline local support, batch layout strategies, and memory coalescence for high-throughput inference (Moradzadeh et al., 20 Aug 2024).
Quantization: The QuantKAN framework applies quantization-aware training and post-training quantization to both base and spline branches, with per-branch fine-grained control on bit-width. Shallow EfficientKANs achieve near full-precision accuracy at $w4a4$ (4-bit weights and activations), with deeper models stabilized by e.g., DoReFa-based quantization protocols (Fuad et al., 24 Nov 2025).

7. Interpretability, Robustness, and Open Challenges

Interpretability: The projective compression and explicit basis selection promote edgewise functional sparsity and human-parsable representations. Entropy-minimized basis expansions allow direct extraction of critical dynamical features or scientific relations (Poole et al., 24 Sep 2025, Pal et al., 18 Nov 2024).
Robustness: EfficientKAN is resilient to substantial input and label noise, resisting the catastrophic MSE blowup typical in overparameterized spline models.
Limitations and Open Directions:
- Proper tuning of projection interval, basis dictionary, and regularization weights is required for different data regimes.
- Extending projection and compression to non-uniform or free-knot splines, as in Free-Knots KAN, increases flexibility but introduces new challenges in basis identification (Zheng et al., 16 Jan 2025).
- Hybridization with other acceleration methods (e.g., MatrixKAN, PowerMLP) is a focus for ongoing research.

In summary, the spline-based EfficientKAN framework systematically transforms KANs into a form that is both sparse and highly expressive, enabling scalable, interpretable, and robust function approximation in both scientific and classical machine learning settings by leveraging entropy-minimized, projectively compressed spline parameterizations (Poole et al., 24 Sep 2025).