LeanKAN: Parameter-Efficient Neural Surrogates

Updated 12 April 2026

LeanKAN is a neural surrogate layer that employs combined additive and multiplicative interactions for enhanced expressiveness and efficient parameter use.
It leverages the Kolmogorov–Arnold theorem and RBF-based activations to achieve faster convergence, memory savings, and improved interpretability.
Empirical results show LeanKAN’s superior performance in neural ODEs, PDEs, adaptive control, and real-time diagnostics with significant error reductions.

LeanKAN is a parameter-efficient Kolmogorov–Arnold Network (KAN) layer that extends the expressiveness of neural surrogates by incorporating both additive and multiplicative interactions in a minimal-parameter architecture. Designed as a direct, modular replacement for traditional AddKAN and MultKAN layers, LeanKAN achieves increased memory efficiency, faster convergence, improved generalization, and superior interpretability across diverse scientific and engineering domains. With formal error guarantees, LeanKAN enables scalable modeling of complex nonlinear and dynamical systems in settings ranging from real-time diagnostics to hybrid physics-informed neural ODEs and adaptive control.

1. Mathematical Formulation and Architectural Principles

LeanKAN is rooted in the Kolmogorov–Arnold representation theorem, which states that any continuous multivariate function on a compact domain can be expressed as a finite composition of sums of univariate functions. The canonical form of a D-layer KAN is: $f : \mathbb{R}^n \rightarrow \mathbb{R}^m, \quad \alpha_{d+1, b} = \sum_{a=1}^{\omega_d} \phi_{d, b, a}(\alpha_{d, a}), \quad b = 1, \ldots, \omega_{d+1}$ where $\phi_{d, b, a}$ are learnable one-dimensional activations. Each $\phi$ is parameterized, for example, as a sum of radial basis functions (RBFs) with a base nonlinearity: $\phi_{l,i,j}(x) = \sum_{m=1}^N w^{\psi}_{l,i,j,m} \exp\Bigl(-\frac{|x-c_m|^2}{2h^2}\Bigr) + w^b_{l,i,j} b(x)$ The LeanKAN layer introduces a joint additive-multiplicative channel decomposition: for any layer $l$ with input $\mathbf{x}_l \in \mathbb{R}^{n_l}$ and output $\mathbf{y}_l \in \mathbb{R}^{n_{l+1}}$ , a specified "multiplicative sub-dimension" $n_l^{\rm mu} \leq n_l$ is chosen. Each output applies

$y_{l,i}^{\rm mult} = \prod_{j=1}^{n_l^{\rm mu}} \phi_{l,i,j}(x_{l,j}), \quad y_{l,i}^{\rm add} = \sum_{j=n_l^{\rm mu}+1}^{n_l} \phi_{l,i,j}(x_{l,j}), \quad z_{l,i} = y_{l,i}^{\rm mult} + y_{l,i}^{\rm add}$

This structure—with $n_l n_{l+1}$ univariate functions per layer and no dummy activations—generalizes AddKAN (setting $\phi_{d, b, a}$ 0) and eliminates the parameter inflation and restricted expressivity of previous MultKAN variants. The only new hyperparameter is $\phi_{d, b, a}$ 1, typically chosen as $\phi_{d, b, a}$ 2 for expressivity/memory balance (Koenig et al., 25 Feb 2025).

2. Parameter Efficiency and Memory Improvements

LeanKAN architectures achieve strict parameter minimality for their given expressive capacity. For each layer: $\phi_{d, b, a}$ 3 where $\phi_{d, b, a}$ 4 is the RBF grid size. Unlike MultKAN, which for order- $\phi_{d, b, a}$ 5 multiplication and $\phi_{d, b, a}$ 6 multiplicative outputs incurs

$\phi_{d, b, a}$ 7

LeanKAN achieves joint add/mul representation with strictly $\phi_{d, b, a}$ 8 parameters—identical to AddKAN, but with superior representational power. Empirical studies confirm 2–3 $\phi_{d, b, a}$ 9 compression for equivalent test error in scientific regression, ODE, and PDE surrogate tasks (Koenig et al., 25 Feb 2025, Koenig et al., 17 Apr 2025).

LeanKAN's lack of dummy activations and direct allocation of learnable $\phi$ 0 to each channel ensures full memory efficiency, minimal forward-evaluation cost, and optimal utilization of learnable degrees of freedom.

3. Convergence, Expressivity, and Empirical Performance

In benchmark studies, LeanKAN outperforms AddKAN and MultKAN in convergence speed, generalization, and overfitting resistance. Empirical results include:

Neural ODEs (Lotka–Volterra): LeanKAN achieves test MSE $\phi$ 1 for a 120-parameter model, compared to $\phi$ 2 for a 156-parameter MultKAN (Koenig et al., 25 Feb 2025).
Complex-valued Schrödinger PDEs: For equal parameter counts, LeanKAN models yield 2–10 $\phi$ 3 lower test MSE than AddKAN, scaling linearly with parameter count, and maintain stability in noisy or irregular sampling regimes.
Toy regression: LeanKAN’s multiplicative outputs achieve MSE as low as $\phi$ 4, while MultKAN plateaus at $\phi$ 5.
Chemistry modeling (ChemKAN): On hydrogen–air chemistry, a 344-parameter LeanKAN core achieves MSE $\phi$ 6 across hundreds of initial conditions and preserves prediction robustness in the presence of 15% synthetic noise—where DeepONet experiences $\phi$ 7 degradation (Koenig et al., 17 Apr 2025).
Real-time diagnostics (battery core temperature): A 105-parameter LeanKAN, trained on high-fidelity data, enables sub-ms inference for model-free core-temperature estimation (Ghosh et al., 24 Feb 2026).

Convergence is facilitated by RBF activation smoothness and channel normalization, with no need for weight decay or explicit regularization. Early-epoch training dynamics show LeanKAN models escaping high-loss regimes significantly faster than MultKAN.

4. Applications and Domain-specific Schemes

LeanKAN has demonstrated impact in several scientific domains:

Battery Thermal Diagnostics: The LeanKAN core provides real-time, model-free estimation of battery core temperature, interfaced with an online Koopman-based anomaly detector for rapid and reliable thermal fault identification, with analytical guarantees on false-alarm rate and detection latency. Simulation scenarios on commercial LiFePO $\phi$ 8 cells report up to 60% reductions in anomaly detection latency and sub-ms inference feasibility for embedded controllers (Ghosh et al., 24 Feb 2026).
Combustion Chemistry Surrogates ("ChemKAN"): LeanKAN-parameterized KAN-ODEs compress complex reaction networks into O(100) parameters while maintaining 2 $\phi$ 9 speedup over detailed solvers and order-of-magnitude MSE improvements over DeepONet, even under substantial noise and data sparsity (Koenig et al., 17 Apr 2025).
Adaptive Control: In Lyapunov-based adaptive control of nonlinear systems, LeanKAN designs enable explicit visualizable functional decompositions, real-time Jacobian-based weight updates, and formal stability with parameter convergence. Function approximation error is reduced by 18–20% relative to DNN/LSTM surrogates at equivalent tracking error levels (Shen et al., 24 Dec 2025).

Additional domains include learned PDE operators, parameter-efficient surrogates for stiff dynamical systems, and interpretable data-driven physical model extraction.

5. Theoretical Guarantees and Diagnostics

LeanKAN enables rigorous error and diagnostic guarantees. Its approximation properties are governed by:

$\phi_{l,i,j}(x) = \sum_{m=1}^N w^{\psi}_{l,i,j,m} \exp\Bigl(-\frac{|x-c_m|^2}{2h^2}\Bigr) + w^b_{l,i,j} b(x)$ 0

with $\phi_{l,i,j}(x) = \sum_{m=1}^N w^{\psi}_{l,i,j,m} \exp\Bigl(-\frac{|x-c_m|^2}{2h^2}\Bigr) + w^b_{l,i,j} b(x)$ 1 number of layers, $\phi_{l,i,j}(x) = \sum_{m=1}^N w^{\psi}_{l,i,j,m} \exp\Bigl(-\frac{|x-c_m|^2}{2h^2}\Bigr) + w^b_{l,i,j} b(x)$ 2 spline grid size, and $\phi_{l,i,j}(x) = \sum_{m=1}^N w^{\psi}_{l,i,j,m} \exp\Bigl(-\frac{|x-c_m|^2}{2h^2}\Bigr) + w^b_{l,i,j} b(x)$ 3 spline order (for B-spline-parameterized $\phi_{l,i,j}(x) = \sum_{m=1}^N w^{\psi}_{l,i,j,m} \exp\Bigl(-\frac{|x-c_m|^2}{2h^2}\Bigr) + w^b_{l,i,j} b(x)$ 4). This allows a priori selection of grid resolution to meet arbitrary sup-norm error cutoffs, independent of input dimensionality (Ghosh et al., 24 Feb 2026).

In hierarchical diagnostic settings, such as the LeanKAN+Koopman battery anomaly detector, key theoretical results include:

No-anomaly residual upper bound: Residual $\phi_{l,i,j}(x) = \sum_{m=1}^N w^{\psi}_{l,i,j,m} \exp\Bigl(-\frac{|x-c_m|^2}{2h^2}\Bigr) + w^b_{l,i,j} b(x)$ 5 is bounded by model error and spline resolution, enabling false-alarm rate control via $\phi_{l,i,j}(x) = \sum_{m=1}^N w^{\psi}_{l,i,j,m} \exp\Bigl(-\frac{|x-c_m|^2}{2h^2}\Bigr) + w^b_{l,i,j} b(x)$ 6.
Reliable anomaly detection: Above a critical threshold (tuned by $\phi_{l,i,j}(x) = \sum_{m=1}^N w^{\psi}_{l,i,j,m} \exp\Bigl(-\frac{|x-c_m|^2}{2h^2}\Bigr) + w^b_{l,i,j} b(x)$ 7 and Koopman sensitivity), all anomalies of physical relevance are provably detected with specified sensitivity and delay (Ghosh et al., 24 Feb 2026).

In Lyapunov-adaptive LeanKAN controllers, the reconstruction error's independence from input dimension avoids the curse of dimensionality, and Jacobian-based gradient updates guarantee global asymptotic tracking (Shen et al., 24 Dec 2025).

6. Practical Recommendations, Hyperparameters, and Limitations

LeanKAN can be deployed as a direct drop-in replacement for AddKAN/MultKAN layers without codebase structural change. Recommendations include:

Hyperparameter defaults: $\phi_{l,i,j}(x) = \sum_{m=1}^N w^{\psi}_{l,i,j,m} \exp\Bigl(-\frac{|x-c_m|^2}{2h^2}\Bigr) + w^b_{l,i,j} b(x)$ 8 is generally effective for balancing add and mul channels. The RBF grid size $\phi_{l,i,j}(x) = \sum_{m=1}^N w^{\psi}_{l,i,j,m} \exp\Bigl(-\frac{|x-c_m|^2}{2h^2}\Bigr) + w^b_{l,i,j} b(x)$ 9 should be chosen based on smoothness/complexity tradeoff (4–10 recommended); base activation normalization (e.g., Swish, layermax) stabilizes training (Koenig et al., 25 Feb 2025).
Single-layer model caveats: Multiplicative interactions apply only to the first $l$ 0 inputs; stacking with AddKAN layers or randomizing input order mitigates this.
Multiparameter scaling: For large-scale problems (PDE surrogates, high-dimensional ODEs), parameter counts remain sublinear in output dimension due to channel-wise parameter sharing (Koenig et al., 17 Apr 2025).

Identified limitations include absence of explicit symbolic guarantees (approximate interpretability—analytic forms require pruning), input-splitting effects on representational coverage in shallow stacks, and limited empirical benefit from hybrid MultKAN–LeanKAN cascades.

7. Outlook and Research Trajectories

The LeanKAN paradigm signals an overview of high expressivity, parameter-efficiency, and interpretability for scientific-ML applications. Open directions include:

Scalability: Further improvements in parameter compression and domain-specific architectural bias (e.g., physics-preserving inductive splits) to extend application to multi-dimensional operator learning and long-range time-accurate surrogates (Koenig et al., 17 Apr 2025).
Hardware acceleration: Compiler optimizations and specialized hardware kernels for LeanKAN evaluation, promising further improvements beyond current $l$ 1 solver accelerations.
Diagnostics with guarantees: Extended formal convergence properties for hybrid LeanKAN-operator pipelines in real-time monitoring, with provable safety margins in critical control and diagnostics domains (Ghosh et al., 24 Feb 2026, Shen et al., 24 Dec 2025).
Analytic extraction: Automated methods for symbolic simplification and downstream physical insight extraction from LeanKAN-trained functional surrogates.

LeanKAN’s continued incorporation into neural ODE, adaptive control, and operator learning frameworks is expected to drive future advances in trustworthy, high-speed, low-memory scientific computation.