DeepOKANs: Neural Operator Learning
- DeepOKANs are neural operator learning architectures that combine branch–trunk decomposition with Kolmogorov–Arnold Networks to approximate high-dimensional operators for PDEs.
- They leverage physics-informed training and chunkwise parameter sharing with rational and Gaussian basis functions to enhance predictive accuracy and reduce parameter count.
- Benchmark results show DeepOKANs achieve significant error reductions, outperforming standard DeepONets and MLP-based models in applications like mechanics and uncertainty quantification.
Deep Operator Kolmogorov–Arnold Networks (DeepOKANs) are a family of neural operator learning architectures that synthesize the universal function approximation capabilities of Kolmogorov–Arnold Networks (KANs) with the branch–trunk decomposition popularized by Deep Operator Networks (DeepONets). These architectures are designed to efficiently approximate solution operators for partial differential equations (PDEs) and other parametric mapping problems, emphasizing expressivity, compactness, and physics-consistent inductive bias. DeepOKANs achieve demonstrably improved predictive accuracy and generalization over standard DeepONets and multilayer perceptron (MLP)-based operator learners on a wide range of benchmark tasks, particularly in mechanics, spatio-temporal PDEs, and uncertainty quantification settings (Abueidda et al., 2024, Wu et al., 9 Oct 2025, Shukla et al., 2024, Pensoneault et al., 2024).
1. Branch–Trunk Architecture and Kolmogorov–Arnold Networks
The core of a DeepOKAN is the compositional branch–trunk operator network, mathematically structured as
where encodes the input function (sampled at sensor points), encodes the coordinates (spatial, temporal, or spatio-temporal), and is a latent width parameter (Wu et al., 9 Oct 2025).
In DeepOKANs, both the branch and trunk sub-networks are implemented via Kolmogorov–Arnold Networks. A KAN leverages the Kolmogorov–Arnold superposition theorem, which states that any sufficiently smooth multivariate function has the representation
for univariate functions . This decomposition is parameterized in KANs by directly learning these univariate functions on each edge, yielding layers that compute
The function 0 is commonly chosen as a Gaussian or rational radial basis function (RBF), with centers and scales as learnable parameters (Abueidda et al., 2024, Wu et al., 9 Oct 2025).
Recent advances include chunkwise sharing of univariate activation functions (CKAN), and the use of rational basis functions (e.g., Enhanced Rational Units, ERUs) to obtain expressivity with manageable parameter counts (Wu et al., 9 Oct 2025).
2. Mathematical Formulation and Operator Learning Objective
DeepOKANs are designed to model high-dimensional operators associated with physical systems or PDEs: 1 where 2 solves a PDE with parametric inputs 3. Starting with the DeepONet paradigm, DeepOKAN replaces MLP sub-networks with KANs or CKANs, yielding a reconstructed operator of the form
4
Each network layer either computes a sum of Gaussian RBF evaluations (in RBF-KANs) or rational function evaluations (in CKANs). In CKAN, chunkwise parameter sharing further reduces the number of unique univariate functions by partitioning input/output neurons into chunks and sharing a base function within each chunk: 5 (Wu et al., 9 Oct 2025, Abueidda et al., 2024)
Gradient computation is straightforward due to closed-form derivatives of both Gaussian and rational activations, facilitating effective use of standard optimizers.
3. Physics-Informed and Data-Driven Loss Functions
DeepOKAN frameworks flexibly support both data-driven and physics-informed training. For operator regression tasks with direct supervision, the mean squared error (MSE) or root mean squared deviation (RMSD) loss is minimized: 6 For physics-informed learning of PDE operators, the loss aggregates multiple components: 7 where terms enforce agreement with solution data, initial conditions, boundary conditions, and the residual of the governing PDE (e.g., for Burgers’ equation: 8). This approach enables operator learning from heterogeneous sources of supervision and physically consistent extrapolation (Wu et al., 9 Oct 2025, Abueidda et al., 2024).
4. Training Procedures and Hyperparameter Selection
Training DeepOKANs follows established neural operator pipelines, with adaptations to the unique characteristics of KANs:
- Optimizers: Adam with decaying learning rates (e.g., step or cosine schedulers), and optional L-BFGS for small baselines.
- Batch size: Tuned per problem, typically 9–0, with 1–2 total epochs for challenging tasks.
- Regularization: No explicit 3 penalty or dropout is required; localized basis functions and learning-rate decay suffice for stability.
- Model capacities: Practical recommendations are provided for RBF centers per coordinate (4–5), network depth (6–7), and total parameter budget (typically 8–9 parameters).
Ablation studies in CKAN-based DeepOKANs show that increased chunk granularity or rational function order can improve accuracy with minimal impact on inference cost (Wu et al., 9 Oct 2025).
5. Benchmark Results and Empirical Performance
Comprehensive experimental results demonstrate clear improvements of DeepOKANs—using both RBF-KAN (Abueidda et al., 2024) and CKAN (Wu et al., 9 Oct 2025) architectures—over standard DeepONet and MLP baselines across diverse tasks. Key benchmarks include:
| Problem | DeepOKAN rel-0 | DeepONet rel-1 | Error Reduction |
|---|---|---|---|
| 1D Wave Operator | 2 | 3–4 | 5–6 |
| 2D Orthotropic Elasticity | 7 | 8–9 | 0–1 |
| Transient Poisson Problem | 2 | 3 | 4 |
| Burgers’ Equation (5) | 6 | 7 | 8 reduction |
| Eikonal Equation | 9 | 0 | 1 |
(Abueidda et al., 2024, Wu et al., 9 Oct 2025)
Empirically, DeepOKANs converge more quickly, attain lower final losses, and generalize better, especially for highly oscillatory or sharp-featured solutions. Error distributions show heavier tails for MLP-based DeepONets, while DeepOKANs maintain tightly-clustered, low error statistics. Additionally, DeepOKANs tractably scale in parameter count thanks to chunked rational basis sharing.
6. Uncertainty Quantification with Ensemble Approaches
DeepOKANs also provide paths toward ensemble- and Bayesian-style predictive uncertainty quantification. In (Pensoneault et al., 2024), an ensemble Kalman inversion (EKI) method is applied to DeepONet-style operator learners, yielding a DeepOKANs variant that derives uncertainty bands from parameter ensembles without backpropagation. The EKI update iteratively refines an ensemble 2 by updating via empirical covariances and observed residuals, with scalable mini-batch variants and adaptive covariance heuristics: 3 where 4 tracks prediction errors. This construction achieves well-calibrated credible intervals, improved computational scaling (5–6 faster than MCMC), and strong uncertainty quantification for operator learning tasks.
7. Limitations, Implementation Considerations, and Outlook
While DeepOKANs consistently outperform MLP-based DeepONets and PINNs in both predictive accuracy and efficiency, the choice of singular basis function (e.g., B-splines, low-order orthogonal polynomials, RBFs, rational functions) impacts robustness and parameter efficiency. Early B-spline KANs exhibited instability and divergence in some regimes; rational and RBF KANs, especially with chunkwise parameterization, achieve improved accuracy, smooth convergence, and manageable scaling (Shukla et al., 2024, Wu et al., 9 Oct 2025, Abueidda et al., 2024). Further, the explicit physics-informed loss formulation in PO-CKAN enables strong generalization with fewer training points and improved adherence to known PDE structure.
Hyperparameter tuning—especially for the number of basis functions, chunk size, and regularization—is problem dependent. Loss surface visualization and dynamics analysis via information bottleneck theory have provided insights into learning behaviors and potential areas for optimization (Shukla et al., 2024).
DeepOKANs combine the flexibility and mathematical universality of KANs with the scalable, compositional design of DeepONets and modern operator learning, offering a state-of-the-art toolset for high-dimensional surrogate modeling and operator regression in the computational sciences.