Deep Operator Networks
- Deep Operator Networks are neural architectures that approximate nonlinear operators across infinite-dimensional function spaces using a branch network for input encoding and a trunk network for output queries.
- They leverage the universal approximation theorem for operators to create surrogates for PDEs, control systems, and complex multiscale phenomena with fast inference and impressive generalization.
- Advanced strategies like Fourier-feature embeddings, ensemble methods, and Bayesian training enhance DeepONets' accuracy, numerical stability, and adaptability in practical applications.
Deep Operator Networks (DeepONets) are neural architectures designed to approximate nonlinear operators, primarily for mapping between infinite-dimensional function spaces. The original theoretical foundation connects DeepONet to the universal approximation theorem for operators, allowing for the construction of surrogates for differential equations, control models, and multiscale systems. DeepONets employ a composite architecture—typically comprising a branch network that encodes input functions (often as finite sensor readings) and a trunk network that encodes output coordinates (spatial, temporal, or geometric queries)—with their fusion yielding the network’s prediction via an inner product or bilinear form. Since their introduction in 2019, extensive theoretical, methodological, and empirical research has established DeepONets as a rigorous and flexible operator-learning tool with strong generalization properties, fast inference, and adaptability to physics-informed, multi-fidelity, geometric, and Bayesian settings.
1. Mathematical Foundation and Architecture
The canonical DeepONet formulation approximates a target nonlinear operator (e.g., mapping boundary or initial conditions to solution fields), where , are Banach spaces of functions. In practice, is discretized at sensor points , and predictions are sought at output locations .
Two neural networks are employed:
- Branch network: , encodes the input function into coefficient features.
- Trunk network: , encodes positional or parametric information into basis functions.
The output is given by
This structure is rigorously justified by the universal approximation theorem for operators and can recover any continuous operator with arbitrary accuracy for sufficient network width and sensor density (Lu et al., 2019, Goswami et al., 2022).
Generalizations involve stacking branch/trunk networks for multi-output, ensemble, or mixture-of-experts architectures (Sharma et al., 2024), replacing fully connected layers with convolutional (ResUNet, CNN), SIREN, or Fourier-feature-based subnets to encode complex geometry or high-frequency modes (He et al., 2023, He et al., 2024, Sojitra et al., 15 Sep 2025).
2. Universal Approximation and Theoretical Properties
The operator UAT (Chen & Chen 1995; Lu et al. 2021) guarantees that, for compact sets and , any continuous can be approximated as
with , realized as standard neural nets (ReLU, Tanh, SIREN, etc.) (Lu et al., 2019, Goswami et al., 2022, Jong et al., 23 May 2025). This covers both dynamic and PDE operators, provided the input discretization error is controlled (sensor density matches function smoothness).
Extensions to multi-step (time-sequence) predictions, variable-length input functions, and multi-input mappings preserve universal approximation (MS-DeepONet, B-LSTM-MIONet, MIONet) (Jong et al., 23 May 2025, Kong et al., 2023).
Recent theoretical work establishes explicit error bounds, showing generalization error scales with network width, training data size, and sensor counts—orthonormalizing trunk outputs further improves numerical stability and generalization (Lee et al., 2023).
3. Training Strategies and Regularization
Standard DeepONet training involves minimizing the mean-squared error over function-location pairs: The optimizer of choice is Adam, occasionally with L-BFGS for physics-informed variants.
Key innovations in training include:
- Random sampling of trunk-net inputs: Drawing random spatial/temporal query points per iteration offers 2–4 computational speedup, 5–10 memory savings, and acts as regularization, with test errors matching fixed-grid approaches (Karumuri et al., 2024).
- Two-step training: Sequential trunk (basis) then branch (coefficient) fitting, with Gram-Schmidt trunk orthonormalization, achieves lower generalization error and improved stability (Lee et al., 2023).
- Bayesian training and uncertainty quantification: Replica-exchange Langevin diffusion employs dual-temperature chains and posterior sampling, resulting in accelerated convergence, higher calibration, and reliable uncertainty bands when data is noisy (Lin et al., 2021, Kong et al., 2023).
Physics-informed DeepONets add PDE residuals and boundary/initial conditions to the loss (Goswami et al., 2022), leveraging automatic differentiation for sensitivity analysis (Qiu et al., 2024).
4. Architectural Extensions and Specialized Variants
Recent research advances target the limitations of vanilla DeepONet and adapt the architecture to new scientific domains:
- Fourier and SIREN trunk embedding: Stochastic or deterministic Fourier-feature mappings and SIREN activations enable spectrally accurate learning of high-frequency solution features and complex geometric dependencies, as in FEDONet and Geom-DeepONet (He et al., 2024, Sojitra et al., 15 Sep 2025). Empirically, spectral approaches yield 2–3 improvements in error on PDE benchmarks (Sojitra et al., 15 Sep 2025).
- Physics-inspired trunk input: Feeding physical coefficients (e.g., consolidation coefficient ) into the trunk allows the network to directly encode basis function modulation; Fourier-feature embedding further improves representation of steep gradients and early-time transients (Choi et al., 14 Jul 2025).
- ResUNet/CNN trunk networks: For spatially complex geometries (e.g., elastoplastic structures), convolutional encoder-decoder trunks efficiently encode mesh topology and enable element-wise fusion between branch and trunk latent spaces (He et al., 2023).
- Ensemble/trunk mixture-of-experts: Stacking multiple trunk networks (vanilla, PoU-MoE, POD) or employing gating networks incorporates diverse bases, improves spatial locality, and robustly captures sharp spatial features, reducing errors by factors of 2–4 on test sets (Sharma et al., 2024).
- Randomized neural networks (RaNN-DeepONet): Fixing non-output layer parameters and solving for output weights via least-squares makes operator learning convex, deterministic, and orders-of-magnitude faster with minimal accuracy loss (Jiang et al., 1 Mar 2025).
5. Robustness, Generalization, and Resolution Independence
DeepONets demonstrate strong robustness to input noise, coarse discretization, and sampling variability. Quantitative studies show:
- Mean test errors for standard ODE/PDE operators decaying exponentially in sensor number or polynomially in dataset size up to fourth order (Lu et al., 2019).
- Randomized trunk input selection and SIREN-based dictionary learning yield resolution independence: operators can be learned from arbitrarily sampled point clouds, without architectural change or retraining (Bahmani et al., 2024). The RINO framework formalizes this process, enabling compact, resolution-independent embeddings and robust cross-mesh generalization.
- Multi-fidelity DeepONets incorporate low-fidelity and high-fidelity data, learning both linear and nonlinear corrections for reduced data requirements and improved test error, especially in stiff or data-scarce regimes (Howard et al., 2022).
6. Applications and Empirical Performance
DeepONet frameworks have achieved state-of-the-art results for a wide range of scientific and engineering problems:
- PDE Surrogates: Darcy flow, reaction-diffusion, Burgers, Allen-Cahn, Kuramoto-Sivashinsky, and Navier-Stokes equations (Goswami et al., 2022, Sojitra et al., 15 Sep 2025, Jiang et al., 1 Mar 2025, Sharma et al., 2024).
- Real-time design and optimization: Surrogate modeling for 3D elastoplastic stress, shape optimization, and digital engineering workflows (He et al., 2023, He et al., 2024, Sojitra et al., 15 Sep 2025).
- Control and dynamical systems: Model predictive control with MS-DeepONet, neural simulation of nonlinear systems, swing-up and stabilization policies (Jong et al., 23 May 2025, Lin et al., 2022).
- Data assimilation, inverse problems, and UQ: Forward/inverse mapping for instability waves, data-assimilation cycles, and robust uncertainty quantification via Bayesian DeepONets (Leoni et al., 2021, Lin et al., 2021, Kong et al., 2023).
- Super-resolution reconstruction: Significant accuracy improvements over interpolation for PDE solutions, particularly for high-frequency features (Yang, 2024).
Typical metrics (mean relative error, coverage in 95% bands, wall-clock speedup) consistently show DeepONets outperform both classical function approximators and alternative neural operator designs (FNO/GKN/NKN) in accuracy, robustness, and computational efficiency.
7. Limitations, Future Directions, and Design Principles
DeepONet performance is sensitive to training data representativeness (especially for out-of-distribution inputs), choice and organization of sensor points, and network width/depth parameters. Extrapolation beyond trained coefficient ranges or function families remains challenging, with error increasing outside domain (Choi et al., 14 Jul 2025).
Design principles emerging from recent research include:
- Branch for functional inputs, trunk for spatial/parametric/geometric queries; Fourier/SIREN embedding for high-frequency or complex domains.
- Physical parameters modulating basis functions should be fed into the trunk; functional/distributional variations into the branch.
- Cross-validation for sensor/trunk width selection; early fusion, adaptive sampling, and orthonormalization for stability and generalization.
Future research areas include multi-modal operator learning, continual learning at the operator level, graph-based architectures for unstructured data, energy-efficient deployments, and integration with physical constraints for next-generation scientific machine learning (Goswami et al., 2022, Bahmani et al., 2024).
Key References:
(Lu et al., 2019) – DeepONet: foundational operator theory and architecture (Goswami et al., 2022) – Physics-informed DeepONet and benchmarking (Lee et al., 2023) – Generalization error analysis and two-step training (Sojitra et al., 15 Sep 2025) – Fourier-feature embedding and spectral accuracy (He et al., 2024) – Geom-DeepONet for field prediction on 3D parameterized geometries (Jong et al., 23 May 2025) – MS-DeepONet extension for predictive control (Howard et al., 2022) – Multi-fidelity operator learning (Bahmani et al., 2024) – Resolution-independent operator learning (Lin et al., 2021) – Bayesian DeepONet and uncertainty quantification (Jiang et al., 1 Mar 2025) – Randomized neural networks for efficient operator learning (He et al., 2023) – ResUNet-based DeepONet for variable complex geometries (Sharma et al., 2024) – Ensemble and mixture-of-experts trunk networks (Yang, 2024) – DeepONet for super-resolution in PDE reconstruction