KAN-ODEs: Kolmogorov–Arnold Neural ODEs

Updated 12 April 2026

KAN-ODEs are neural dynamical system models that use the Kolmogorov–Arnold representation theorem to provide interpretable and parameter-efficient representations.
They replace conventional MLPs with univariate functions like RBFs or spline-based activations, delivering superior accuracy in systems such as Lotka–Volterra and Burgers’ equations.
KAN-ODEs employ numerical solvers and adjoint-sensitivity methods in training, with variants like LeanKAN reducing parameters and improving convergence.

Kolmogorov–Arnold Network Ordinary Differential Equations (KAN-ODEs) constitute a class of neural dynamical system models that synergize the universal approximation power of Kolmogorov–Arnold Networks (KANs) with the continuous-time framework of neural ordinary differential equations. KAN-ODEs generalize and improve upon conventional neural ODEs by replacing standard multilayer perceptrons (MLPs) with functionally richer, more interpretable, and parameter-efficient architectures grounded in the Kolmogorov–Arnold representation theorem (Koenig et al., 2024).

1. Mathematical and Architectural Foundations

KAN-ODEs are founded on the Kolmogorov–Arnold representation theorem, which guarantees that any continuous multivariate function $F:[0,1]^n \to \mathbb{R}$ can be decomposed into finite superpositions of univariate functions: $F(x_1,\dots,x_n) = \sum_{q=1}^{2n+1} \Phi_q \left(\sum_{p=1}^{n} \phi_{q,p}(x_p) \right)$ Here, each $\phi_{q,p}$ and $\Phi_q$ is a univariate continuous function, often parameterized within KANs as learnable radial basis functions (RBFs) or spline-based activations. In KAN-ODEs, this construction replaces the black-box vector field

$\frac{dx}{dt} = f\left(x(t), t; \theta \right)$

in standard neural ODEs [Chen et al., 2019] with

$\frac{dx}{dt} = \mathrm{KAN}\left(x(t); \theta_{\mathrm{KAN}}\right)$

enabling flexible and interpretable representations of dynamical systems (Koenig et al., 2024, Koenig et al., 25 Feb 2025, Pal et al., 2024).

In practical deep KAN architectures, the forward propagation through layer $\ell$ maps the activation vector $z^{(\ell)}$ as

$z^{(\ell+1)} = \Phi^{(\ell)}\left(\phi^{(\ell)}(z^{(\ell)})\right)$

with each $\phi^{(\ell)}$ acting independently on each coordinate of $F(x_1,\dots,x_n) = \sum_{q=1}^{2n+1} \Phi_q \left(\sum_{p=1}^{n} \phi_{q,p}(x_p) \right)$ 0, and $F(x_1,\dots,x_n) = \sum_{q=1}^{2n+1} \Phi_q \left(\sum_{p=1}^{n} \phi_{q,p}(x_p) \right)$ 1 aggregating these contributions (Koenig et al., 2024).

Recent architectural developments, such as LeanKAN, introduce parameter-lean variants that mix additive and multiplicative univariate activations per output node, further reducing the parameterization and improving convergence behavior relative to earlier AddKAN and MultKAN layers (Koenig et al., 25 Feb 2025).

2. Training Paradigm and Implementation

KAN-ODEs are trained by integrating their parameterized ODEs with a numerical solver (e.g., Tsit5, Rodas5) and minimizing a data-fitting loss, typically mean squared error: $F(x_1,\dots,x_n) = \sum_{q=1}^{2n+1} \Phi_q \left(\sum_{p=1}^{n} \phi_{q,p}(x_p) \right)$ 2 The gradient of this loss with respect to network parameters is computed by the adjoint-sensitivity method, a backpropagation technique through the ODE solver that maintains memory efficiency (Koenig et al., 2024).

KAN activation parameters, such as RBF centers and coefficients, are learned via stochastic optimization (e.g., Adam), and normalization procedures are frequently used to stabilize training (Koenig et al., 2024, Koenig et al., 25 Feb 2025). No regularization or physics-informed penalties are required in data-driven applications, but physics-informed loss terms (such as element conservation) can be incorporated as in ChemKANs for physical systems modeling (Koenig et al., 17 Apr 2025).

3. Empirical Performance and Comparative Studies

KAN-ODEs demonstrate superior parameter efficiency, improved scaling, and faster convergence compared to MLP-based neural ODEs across a range of test problems. Detailed studies include:

Lotka–Volterra system: KAN-ODE achieves two orders of magnitude lower mean squared error than a competing MLP-ODE at comparable parameter counts. The empirical scaling of KAN-ODE MSE with parameter count follows $F(x_1,\dots,x_n) = \sum_{q=1}^{2n+1} \Phi_q \left(\sum_{p=1}^{n} \phi_{q,p}(x_p) \right)$ 3 (as expected from third-order B-spline theory), compared to $F(x_1,\dots,x_n) = \sum_{q=1}^{2n+1} \Phi_q \left(\sum_{p=1}^{n} \phi_{q,p}(x_p) \right)$ 4 for MLPs (Koenig et al., 2024).
Fisher–KPP equation (symbolic source-term discovery): KAN-ODEs can recover the correct symbolic reaction term $F(x_1,\dots,x_n) = \sum_{q=1}^{2n+1} \Phi_q \left(\sum_{p=1}^{n} \phi_{q,p}(x_p) \right)$ 5 with sub-percent error when used as interpretable surrogates (Koenig et al., 2024).
Burgers’ equation (shock formation): KAN-ODEs reproduce full-field, time-evolving solutions including accurate shock and boundary layer behavior (Koenig et al., 2024).
Schrödinger and Allen–Cahn equations: Extensions to complex-valued and phase-separating PDEs demonstrate robust generalization and grid flexibility (Koenig et al., 2024).

Empirical results also reveal that parameter-leveraging techniques such as LeanKANs maintain or improve accuracy with reduced model size and improved training behavior compared to MultKANs and AddKANs, with peak memory and wall-clock time benefits (Koenig et al., 25 Feb 2025).

Comparative Metrics Table

Domain	KAN-ODE MSE	MLP/MultKAN MSE	Parameter Count
Lotka–Volterra	$F(x_1,\dots,x_n) = \sum_{q=1}^{2n+1} \Phi_q \left(\sum_{p=1}^{n} \phi_{q,p}(x_p) \right)$ 6	$F(x_1,\dots,x_n) = \sum_{q=1}^{2n+1} \Phi_q \left(\sum_{p=1}^{n} \phi_{q,p}(x_p) \right)$ 7	240 (KAN), 252 (MLP)
Fisher–KPP (reaction)	$F(x_1,\dots,x_n) = \sum_{q=1}^{2n+1} \Phi_q \left(\sum_{p=1}^{n} \phi_{q,p}(x_p) \right)$ 8	--	10–300
Burgers’ equation	$F(x_1,\dots,x_n) = \sum_{q=1}^{2n+1} \Phi_q \left(\sum_{p=1}^{n} \phi_{q,p}(x_p) \right)$ 9	--	492

These results substantiate the claim that KAN-ODEs attain lower errors, better scaling, and enhanced interpretability at comparable or reduced parameter counts.

4. Interpretability and Symbolic Discovery

A key feature of KAN-ODEs lies in the interpretability of their learned activation functions. Because each activation in a KAN corresponds to a univariate nonlinearity, their learned shapes can be visualized and interrogated directly. This facilitates extraction of closed-form equations via symbolic regression, as seen in source-term identification tasks (Koenig et al., 2024).

Extensions such as KAN-PISF systematically employ KANs for equation-structure suggestion, then apply physics-informed spline fitting and sparse term pruning to yield concise, physically meaningful ODE or PDE models. This sequential paradigm affords interpretability on par with symbolic regression methods yet retains the numerical flexibility of black-box neural ODEs (Pal et al., 2024).

Structured frameworks like SKANODE further constrain KAN-ODEs within explicit state-space models, enabling virtual sensing (latent variable estimation) and symbolic law extraction that aligns closely with underlying physics, as evidenced in both canonical (Duffing, Van der Pol) and real-world (aeroelastic, aircraft) systems (Liu et al., 23 Jun 2025).

5. Variants and Domain-Specific Adaptations

Multiple domain- or application-specific KAN-ODE variants have been developed:

LeanKAN and MultKAN: These enhance the base KAN layer with multiplicative interactions, controlled parameter allocation, and memory-efficient computations, thus extending representation power and applicability to high-output or multi-task settings (Koenig et al., 25 Feb 2025).
ChemKANs: Physical information such as elemental conservation is embedded into the KAN-ODE surrogate for chemical kinetics, yielding sparse, highly generalizable models capable of capturing stiff, multi-scale combustion chemistry with state-of-the-art speed and robustness (Koenig et al., 17 Apr 2025).
EvoKANs: Combines KAN encodings with a parameter-evolution ODE/PDE that uses the same governing equations as the target system, allowing the network weights themselves to evolve via the PDE residual and maintain energy dissipation properties. This approach excels in long-horizon integrations of phase-field and turbulent systems due to its energy-stable SAV-based update mechanism (Lin et al., 3 Mar 2025).
KAN-PISF: Integrates denoising, KAN-based structure suggestion, overcomplete dictionary construction, and physics-informed spline fitting for interpretable equation discovery—yielding sparse, readable models for a variety of nonlinear dynamical systems (Pal et al., 2024).

6. Limitations and Computational Aspects

Despite improved parameter efficiency, individual epochs in KAN-ODE training require 2–3 $\phi_{q,p}$ 0 more wall-clock time than standard MLP neural ODEs, due to the complexity of computing and differentiating the RBF or spline activations. However, rapid convergence typically compensates, resulting in lower overall training times for matched or superior accuracy (Koenig et al., 2024, Koenig et al., 25 Feb 2025).

Memory usage and hyperparameter tuning can also pose challenges in large-scale or stiff systems. LeanKAN design mitigates this by reducing intermediate activations and limiting hyperparameter proliferation, while physics-informed components such as PINN losses or structure-based sharing (e.g., ChemKAN) prevent overfitting and ensure generalization to extrapolated regimes (Koenig et al., 25 Feb 2025, Koenig et al., 17 Apr 2025).

7. Applications and Outlook

KAN-ODEs have demonstrated efficacy in a diverse array of scientific machine learning tasks, including:

Data-driven discovery and forecasting for biological, chemical, and physical dynamical systems.
Symbolic law extraction for interpretable scientific modeling and latent variable identification.
Surrogate modeling and acceleration of stiff, high-dimensional systems (e.g., combustion chemistry, turbulent DNS).
Robust encoding and time-stepping frameworks for PDE systems exhibiting sharp interfaces, shocks, and phase separation (Koenig et al., 2024, Koenig et al., 17 Apr 2025, Lin et al., 3 Mar 2025).

A plausible implication is that by combining universal approximation, explicit interpretability, and computational flexibility, KAN-ODEs provide a compelling modeling trade-off—bridging black-box and symbolic paradigms, and enabling the unplugging of physical hypothesis generation directly from data-driven model fitting.

Ongoing developments suggest that further scaling, integration of physics priors, and adaptive structure discovery will continue to enhance the role of KAN-ODEs in interpretable, high-performance dynamical system learning (Koenig et al., 25 Feb 2025, Liu et al., 23 Jun 2025, Koenig et al., 17 Apr 2025).