DeepOKAN: Kolmogorov–Arnold Neural Operator
- DeepOKAN is a neural operator framework that uses Kolmogorov–Arnold networks with Gaussian RBF activations to approximate complex mappings in computational mechanics.
- It replaces traditional MLP-based architectures with KANs, achieving lower losses, faster convergence, and tighter error distributions across diverse mechanics problems.
- DeepOKAN’s design, based on Kolmogorov–Arnold theory, offers practical advantages in efficiency and accuracy for surrogate modeling of PDE operators.
DeepOKAN (Deep Operator Network based on Kolmogorov–Arnold Networks) is a neural operator framework designed for superior surrogate modeling in computational mechanics. DeepOKAN replaces traditional MLP-based operator architectures (such as DeepONet) with Kolmogorov–Arnold networks (KANs) utilizing Gaussian radial basis function (RBF) activations. This results in efficient, accurate learning of complex mappings between input functions (e.g., loads, boundary conditions, material fields) and PDE solution fields. Evaluated on diverse mechanics problems—including 1D oscillatory wavefields, 2D orthotropic elasticity, and 2D transient diffusion—DeepOKAN consistently achieves lower losses, faster convergence, and tighter error distributions than its MLP-based counterparts, all while matching parameter count (Abueidda et al., 2024).
1. Operator Learning Framework: Mathematical Formulation
DeepOKAN targets the approximation of a solution operator
where is the space of input functions (e.g., boundary conditions, material properties), and is the output field space (e.g., displacement, temperature). For practical implementation, input functions are discretized at branch points and outputs are evaluated at trunk points . The network approximates the operator via
with branch () and trunk () outputs parameterized by KANs. The parameter denotes the latent dimension and an additional bias.
This formulation enables operator regression in settings where classical surrogates struggle, leveraging the flexibility of deep architectures for non-local, high-dimensional mappings.
2. Kolmogorov–Arnold Network Architecture with Gaussian RBFs
DeepOKAN employs KANs in both branch and trunk networks. Each KAN layer maps inputs to outputs via a collection of univariate functions , resulting in layerwise computations:
In DeepOKAN, are chosen as Gaussian RBFs:
where are RBF centers (either fixed or learned) and is a scale determined by the input domain .
Following RBF evaluation, activation responses are linearly mixed:
with learnable weight matrix .
The use of RBFs provides sharp local approximation, computational efficiency, and flexibility through adaptive center placement and scale, leading to improved function representation over B-splines or standard MLPs.
3. Training Protocols and Loss Functions
Training utilizes the root mean square deviation (RMSD) loss:
where are ground-truth outputs and are network predictions. Optimization is conducted using L-BFGS for low-dimensional/1D tests or Adam with a learning-rate scheduler (learning rate decayed by a factor every epochs).
No explicit regularization (such as dropout or weight decay) is imposed; empirical stability is achieved via scheduler and moderate batch sizes. Data is partitioned using a standard 80/20 training-test split.
4. Evaluation on Mechanics Problems
Three mechanics scenarios are used for quantitative assessment:
| Problem | Data Split | Parameter Regime | Key Results |
|---|---|---|---|
| 1D Sinusoidal Waves | 1000 pts | weights | RBF-KAN achieves 1–2 orders lower RMSD versus MLP; resolves high-frequency oscillations |
| 1D Operator Learning | 20,000 samples | 2-layers, width ( weights) | DeepOKAN converges faster, achieves vs. DeepONet’s ; error histograms shifted/tighter |
| 2D Orthotropic Elasticity | 5000 samples | 1-layer, , weights | DeepOKAN training/test errors lower by up to an order of magnitude; max absolute error |
| 2D Transient Poisson | 4500 samples | weights, | DeepOKAN mean error 0.0047 (std 0.0052), DeepONet 0.0298 (std 0.0302) |
Empirical findings indicate DeepOKAN uniformly outperforms DeepONet in loss, convergence, and error variability, while being parameter-comparable (Abueidda et al., 2024).
5. Architectural Insights and Theoretical Implications
DeepOKAN leverages the Kolmogorov–Arnold superposition theorem, representing multivariate maps via sums of univariate nonlinearities. The architecture decomposes complex operator learning into learnable, edge-wise 1D RBF functions, providing greater representational capacity per parameter for smooth, low-dimensional PDE operators.
RBFs confer several advantages over spline or MLP activations:
- Superior local approximation properties for univariate functions.
- Computational efficiency via closed-form exponential.
- Flexibility for learnable center positioning and scale adaptation.
- Mitigation of curse of dimensionality—activations remain strictly univariate.
A plausible implication is that for sufficiently smooth and structured operator learning settings, DeepOKAN may offer quantifiable improvements in expressivity and sample efficiency over conventional MLP-based neural operators.
6. Limitations and Prospective Research Directions
Noted limitations include:
- The need to tune RBF centers and scales () for each problem setting.
- Potential explosion of learnable edges in high-dimensional function approximation (e.g., 3D domains).
- A purely data-driven formulation with no built-in physics-informed constraints.
Future research may address:
- Alternative kernels (multi-quadratic, inverse-multiquadric) or adaptive/probabilistic center placements in KANs.
- Integration of physics-informed loss functions (e.g., PDE residual penalties, enforced boundary conditions) for generalization to unseen settings.
- Generalization to irregular input geometries (e.g., point-cloud DeepOKAN), and extension towards coupled multi-physics.
- Theoretical analysis of RBF-KAN versus MLP approximation rates in the context of Kolmogorov–Arnold theory.
7. Summary and Outlook
DeepOKAN demonstrates the effectiveness of Kolmogorov–Arnold superposition principles combined with RBF parameterizations for operator learning in mechanics. The framework consistently yields improved surrogate accuracy, convergence rate, and statistical robustness across a spectrum of PDE-governed systems, without increased parameter count. These findings suggest that Kolmogorov–Arnold architectures with univariate RBF activations provide a principled and practical advancement for data-driven computational science and digital engineering workflows (Abueidda et al., 2024).