Papers
Topics
Authors
Recent
Search
2000 character limit reached

DeepOKANs: Neural Operator Learning

Updated 12 April 2026
  • DeepOKANs are neural operator learning architectures that combine branch–trunk decomposition with Kolmogorov–Arnold Networks to approximate high-dimensional operators for PDEs.
  • They leverage physics-informed training and chunkwise parameter sharing with rational and Gaussian basis functions to enhance predictive accuracy and reduce parameter count.
  • Benchmark results show DeepOKANs achieve significant error reductions, outperforming standard DeepONets and MLP-based models in applications like mechanics and uncertainty quantification.

Deep Operator Kolmogorov–Arnold Networks (DeepOKANs) are a family of neural operator learning architectures that synthesize the universal function approximation capabilities of Kolmogorov–Arnold Networks (KANs) with the branch–trunk decomposition popularized by Deep Operator Networks (DeepONets). These architectures are designed to efficiently approximate solution operators for partial differential equations (PDEs) and other parametric mapping problems, emphasizing expressivity, compactness, and physics-consistent inductive bias. DeepOKANs achieve demonstrably improved predictive accuracy and generalization over standard DeepONets and multilayer perceptron (MLP)-based operator learners on a wide range of benchmark tasks, particularly in mechanics, spatio-temporal PDEs, and uncertainty quantification settings (Abueidda et al., 2024, Wu et al., 9 Oct 2025, Shukla et al., 2024, Pensoneault et al., 2024).

1. Branch–Trunk Architecture and Kolmogorov–Arnold Networks

The core of a DeepOKAN is the compositional branch–trunk operator network, mathematically structured as

G(f)(x)n=1pbn(f)tn(x)=b(f),t(x),\mathcal{G}(f)(x) \approx \sum_{n=1}^p b_n(f) t_n(x) = \langle \mathbf{b}(f), \mathbf{t}(x)\rangle,

where b\mathbf{b} encodes the input function ff (sampled at sensor points), t\mathbf{t} encodes the coordinates xx (spatial, temporal, or spatio-temporal), and pp is a latent width parameter (Wu et al., 9 Oct 2025).

In DeepOKANs, both the branch and trunk sub-networks are implemented via Kolmogorov–Arnold Networks. A KAN leverages the Kolmogorov–Arnold superposition theorem, which states that any sufficiently smooth multivariate function f:[0,1]dRf:[0,1]^d \to \mathbb{R} has the representation

f(x1,,xd)=q=12d+1Ψq(p=1dϕq,p(xp))f(x_1,\dots,x_d) = \sum_{q=1}^{2d+1} \Psi_q\left(\sum_{p=1}^d \phi_{q,p}(x_p)\right)

for univariate functions {Ψq,ϕq,p}\{\Psi_q, \phi_{q,p}\}. This decomposition is parameterized in KANs by directly learning these univariate functions on each edge, yielding layers that compute

[xl+1]j=i=1nlϕl,i,j([xl]i).[x_{l+1}]_j = \sum_{i=1}^{n_l} \phi_{l,i,j}([x_l]_i).

The function b\mathbf{b}0 is commonly chosen as a Gaussian or rational radial basis function (RBF), with centers and scales as learnable parameters (Abueidda et al., 2024, Wu et al., 9 Oct 2025).

Recent advances include chunkwise sharing of univariate activation functions (CKAN), and the use of rational basis functions (e.g., Enhanced Rational Units, ERUs) to obtain expressivity with manageable parameter counts (Wu et al., 9 Oct 2025).

2. Mathematical Formulation and Operator Learning Objective

DeepOKANs are designed to model high-dimensional operators associated with physical systems or PDEs: b\mathbf{b}1 where b\mathbf{b}2 solves a PDE with parametric inputs b\mathbf{b}3. Starting with the DeepONet paradigm, DeepOKAN replaces MLP sub-networks with KANs or CKANs, yielding a reconstructed operator of the form

b\mathbf{b}4

Each network layer either computes a sum of Gaussian RBF evaluations (in RBF-KANs) or rational function evaluations (in CKANs). In CKAN, chunkwise parameter sharing further reduces the number of unique univariate functions by partitioning input/output neurons into chunks and sharing a base function within each chunk: b\mathbf{b}5 (Wu et al., 9 Oct 2025, Abueidda et al., 2024)

Gradient computation is straightforward due to closed-form derivatives of both Gaussian and rational activations, facilitating effective use of standard optimizers.

3. Physics-Informed and Data-Driven Loss Functions

DeepOKAN frameworks flexibly support both data-driven and physics-informed training. For operator regression tasks with direct supervision, the mean squared error (MSE) or root mean squared deviation (RMSD) loss is minimized: b\mathbf{b}6 For physics-informed learning of PDE operators, the loss aggregates multiple components: b\mathbf{b}7 where terms enforce agreement with solution data, initial conditions, boundary conditions, and the residual of the governing PDE (e.g., for Burgers’ equation: b\mathbf{b}8). This approach enables operator learning from heterogeneous sources of supervision and physically consistent extrapolation (Wu et al., 9 Oct 2025, Abueidda et al., 2024).

4. Training Procedures and Hyperparameter Selection

Training DeepOKANs follows established neural operator pipelines, with adaptations to the unique characteristics of KANs:

  • Optimizers: Adam with decaying learning rates (e.g., step or cosine schedulers), and optional L-BFGS for small baselines.
  • Batch size: Tuned per problem, typically b\mathbf{b}9–ff0, with ff1–ff2 total epochs for challenging tasks.
  • Regularization: No explicit ff3 penalty or dropout is required; localized basis functions and learning-rate decay suffice for stability.
  • Model capacities: Practical recommendations are provided for RBF centers per coordinate (ff4–ff5), network depth (ff6–ff7), and total parameter budget (typically ff8–ff9 parameters).

Ablation studies in CKAN-based DeepOKANs show that increased chunk granularity or rational function order can improve accuracy with minimal impact on inference cost (Wu et al., 9 Oct 2025).

5. Benchmark Results and Empirical Performance

Comprehensive experimental results demonstrate clear improvements of DeepOKANs—using both RBF-KAN (Abueidda et al., 2024) and CKAN (Wu et al., 9 Oct 2025) architectures—over standard DeepONet and MLP baselines across diverse tasks. Key benchmarks include:

Problem DeepOKAN rel-t\mathbf{t}0 DeepONet rel-t\mathbf{t}1 Error Reduction
1D Wave Operator t\mathbf{t}2 t\mathbf{t}3–t\mathbf{t}4 t\mathbf{t}5–t\mathbf{t}6
2D Orthotropic Elasticity t\mathbf{t}7 t\mathbf{t}8–t\mathbf{t}9 xx0–xx1
Transient Poisson Problem xx2 xx3 xx4
Burgers’ Equation (xx5) xx6 xx7 xx8 reduction
Eikonal Equation xx9 pp0 pp1

(Abueidda et al., 2024, Wu et al., 9 Oct 2025)

Empirically, DeepOKANs converge more quickly, attain lower final losses, and generalize better, especially for highly oscillatory or sharp-featured solutions. Error distributions show heavier tails for MLP-based DeepONets, while DeepOKANs maintain tightly-clustered, low error statistics. Additionally, DeepOKANs tractably scale in parameter count thanks to chunked rational basis sharing.

6. Uncertainty Quantification with Ensemble Approaches

DeepOKANs also provide paths toward ensemble- and Bayesian-style predictive uncertainty quantification. In (Pensoneault et al., 2024), an ensemble Kalman inversion (EKI) method is applied to DeepONet-style operator learners, yielding a DeepOKANs variant that derives uncertainty bands from parameter ensembles without backpropagation. The EKI update iteratively refines an ensemble pp2 by updating via empirical covariances and observed residuals, with scalable mini-batch variants and adaptive covariance heuristics: pp3 where pp4 tracks prediction errors. This construction achieves well-calibrated credible intervals, improved computational scaling (pp5–pp6 faster than MCMC), and strong uncertainty quantification for operator learning tasks.

7. Limitations, Implementation Considerations, and Outlook

While DeepOKANs consistently outperform MLP-based DeepONets and PINNs in both predictive accuracy and efficiency, the choice of singular basis function (e.g., B-splines, low-order orthogonal polynomials, RBFs, rational functions) impacts robustness and parameter efficiency. Early B-spline KANs exhibited instability and divergence in some regimes; rational and RBF KANs, especially with chunkwise parameterization, achieve improved accuracy, smooth convergence, and manageable scaling (Shukla et al., 2024, Wu et al., 9 Oct 2025, Abueidda et al., 2024). Further, the explicit physics-informed loss formulation in PO-CKAN enables strong generalization with fewer training points and improved adherence to known PDE structure.

Hyperparameter tuning—especially for the number of basis functions, chunk size, and regularization—is problem dependent. Loss surface visualization and dynamics analysis via information bottleneck theory have provided insights into learning behaviors and potential areas for optimization (Shukla et al., 2024).

DeepOKANs combine the flexibility and mathematical universality of KANs with the scalable, compositional design of DeepONets and modern operator learning, offering a state-of-the-art toolset for high-dimensional surrogate modeling and operator regression in the computational sciences.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to DeepOKANs.