DeepOKAN: Kolmogorov–Arnold Neural Operator

Updated 6 March 2026

DeepOKAN is a neural operator framework that uses Kolmogorov–Arnold networks with Gaussian RBF activations to approximate complex mappings in computational mechanics.
It replaces traditional MLP-based architectures with KANs, achieving lower losses, faster convergence, and tighter error distributions across diverse mechanics problems.
DeepOKAN’s design, based on Kolmogorov–Arnold theory, offers practical advantages in efficiency and accuracy for surrogate modeling of PDE operators.

DeepOKAN (Deep Operator Network based on Kolmogorov–Arnold Networks) is a neural operator framework designed for superior surrogate modeling in computational mechanics. DeepOKAN replaces traditional MLP-based operator architectures (such as DeepONet) with Kolmogorov–Arnold networks (KANs) utilizing Gaussian radial basis function (RBF) activations. This results in efficient, accurate learning of complex mappings between input functions (e.g., loads, boundary conditions, material fields) and PDE solution fields. Evaluated on diverse mechanics problems—including 1D oscillatory wavefields, 2D orthotropic elasticity, and 2D transient diffusion—DeepOKAN consistently achieves lower losses, faster convergence, and tighter error distributions than its MLP-based counterparts, all while matching parameter count (Abueidda et al., 2024).

1. Operator Learning Framework: Mathematical Formulation

DeepOKAN targets the approximation of a solution operator

$F: Q \to S, \qquad F(q) = s(q)$

where $Q$ is the space of input functions (e.g., boundary conditions, material properties), and $S$ is the output field space (e.g., displacement, temperature). For practical implementation, input functions are discretized at branch points $\{x_i^b\}_{i=1}^m$ and outputs are evaluated at trunk points $\{x_j^t\}_{j=1}^n$ . The network approximates the operator via

$\widehat F(q)(x_j^t) = \sum_{k=1}^r b_k \big(q(x_1^b), ..., q(x_m^b)\big) \; t_k(x_j^t) + B$

with branch ( $b_k$ ) and trunk ( $t_k$ ) outputs parameterized by KANs. The parameter $r$ denotes the latent dimension and $B$ an additional bias.

This formulation enables operator regression in settings where classical surrogates struggle, leveraging the flexibility of deep architectures for non-local, high-dimensional mappings.

2. Kolmogorov–Arnold Network Architecture with Gaussian RBFs

DeepOKAN employs KANs in both branch and trunk networks. Each KAN layer maps $n_{\rm in}$ inputs to $n_{\rm out}$ outputs via a collection of univariate functions $\phi_{i,j}: \mathbb{R} \to \mathbb{R}$ , resulting in layerwise computations:

$(x_{\ell+1})_j = \sum_{i=1}^{n_{\rm in}} \phi_{i,j}((x_\ell)_i), \qquad j = 1,...,n_{\rm out}$

In DeepOKAN, $\phi_{i,j}$ are chosen as Gaussian RBFs:

$\phi_{i,j}(u) = \exp\left( - \left(\frac{u-g_{i,j}}{\beta}\right)^2 \right)$

where $g_{i,j}$ are RBF centers (either fixed or learned) and $\beta$ is a scale determined by the input domain $(g_{\max}-g_{\min})/(m-1)$ .

Following RBF evaluation, activation responses $R^\ell(x^\ell, G^\ell)$ are linearly mixed:

$x^{\ell+1} = W^\ell R^\ell(x^\ell, G^\ell)$

with learnable weight matrix $W^\ell \in \mathbb{R}^{n^{\ell+1} \times (n^\ell m)}$ .

The use of RBFs provides sharp local approximation, computational efficiency, and flexibility through adaptive center placement and scale, leading to improved function representation over B-splines or standard MLPs.

3. Training Protocols and Loss Functions

Training utilizes the root mean square deviation (RMSD) loss:

$\mathcal{L} = \sqrt{ \frac{1}{N} \sum_{k=1}^N (s_k - \hat{s}_k)^2 }$

where $\{s_k\}$ are ground-truth outputs and $\{\hat{s}_k\}$ are network predictions. Optimization is conducted using L-BFGS for low-dimensional/1D tests or Adam with a learning-rate scheduler (learning rate decayed by a factor $\gamma$ every $T_\mathrm{step}$ epochs).

No explicit regularization (such as dropout or weight decay) is imposed; empirical stability is achieved via scheduler and moderate batch sizes. Data is partitioned using a standard 80/20 training-test split.

4. Evaluation on Mechanics Problems

Three mechanics scenarios are used for quantitative assessment:

Problem	Data Split	Parameter Regime	Key Results
1D Sinusoidal Waves	1000 pts	$\approx 640–670$ weights	RBF-KAN achieves 1–2 orders lower RMSD versus MLP; resolves high-frequency oscillations
1D Operator Learning	20,000 samples	2-layers, width $r=40$ ( $\approx 2.76 \times 10^5$ weights)	DeepOKAN converges faster, achieves $\mathcal{L} \sim 10^{-3}$ vs. DeepONet’s $\sim 10^{-2}$ ; error histograms shifted/tighter
2D Orthotropic Elasticity	5000 samples	1-layer, $r=5$ , $1.3\text{k}–17.1\text{k}$ weights	DeepOKAN training/test errors lower by up to an order of magnitude; max absolute error $< 5\times10^{-4}$
2D Transient Poisson	4500 samples	$1.3\text{k}–51\text{k}$ weights, $r=4$	DeepOKAN mean error 0.0047 (std 0.0052), DeepONet 0.0298 (std 0.0302)

Empirical findings indicate DeepOKAN uniformly outperforms DeepONet in loss, convergence, and error variability, while being parameter-comparable (Abueidda et al., 2024).

5. Architectural Insights and Theoretical Implications

DeepOKAN leverages the Kolmogorov–Arnold superposition theorem, representing multivariate maps via sums of univariate nonlinearities. The architecture decomposes complex operator learning into learnable, edge-wise 1D RBF functions, providing greater representational capacity per parameter for smooth, low-dimensional PDE operators.

RBFs confer several advantages over spline or MLP activations:

Superior local approximation properties for univariate functions.
Computational efficiency via closed-form exponential.
Flexibility for learnable center positioning and scale adaptation.
Mitigation of curse of dimensionality—activations remain strictly univariate.

A plausible implication is that for sufficiently smooth and structured operator learning settings, DeepOKAN may offer quantifiable improvements in expressivity and sample efficiency over conventional MLP-based neural operators.

6. Limitations and Prospective Research Directions

Noted limitations include:

The need to tune RBF centers and scales ( $\beta$ ) for each problem setting.
Potential explosion of learnable edges in high-dimensional function approximation (e.g., 3D domains).
A purely data-driven formulation with no built-in physics-informed constraints.

Future research may address:

Alternative kernels (multi-quadratic, inverse-multiquadric) or adaptive/probabilistic center placements in KANs.
Integration of physics-informed loss functions (e.g., PDE residual penalties, enforced boundary conditions) for generalization to unseen settings.
Generalization to irregular input geometries (e.g., point-cloud DeepOKAN), and extension towards coupled multi-physics.
Theoretical analysis of RBF-KAN versus MLP approximation rates in the context of Kolmogorov–Arnold theory.

7. Summary and Outlook

DeepOKAN demonstrates the effectiveness of Kolmogorov–Arnold superposition principles combined with RBF parameterizations for operator learning in mechanics. The framework consistently yields improved surrogate accuracy, convergence rate, and statistical robustness across a spectrum of PDE-governed systems, without increased parameter count. These findings suggest that Kolmogorov–Arnold architectures with univariate RBF activations provide a principled and practical advancement for data-driven computational science and digital engineering workflows (Abueidda et al., 2024).

Markdown Report Issue Upgrade to Chat

References (1)

DeepOKAN: Deep Operator Network Based on Kolmogorov Arnold Networks for Mechanics Problems (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to DeepOKAN.