Deep Operator Networks (DeepONets)

Updated 17 November 2025

Deep Operator Networks (DeepONets) are neural architectures that approximate mappings between function spaces using distinct branch and trunk networks.
They leverage sensor-based sampling and dual-network designs to achieve high accuracy, often under 1% relative error, with significant speedup over traditional solvers.
DeepONets enable practical applications in surrogate modeling, control, uncertainty quantification, and digital twin frameworks across multi-physics and engineering systems.

Deep Operator Networks (DeepONets) are neural operator learning architectures designed to approximate mappings between spaces of functions, thereby generalizing classical neural networks from finite-dimensional function approximation to infinite-dimensional operator approximation domains. The mathematical principles underlying DeepONets rest on the universal approximation theorem for nonlinear operators, which ensures representational fidelity for a broad class of governing equations, including those arising in ordinary and partial differential equations, nonlinear stochastic systems, and multi-physics applications. DeepONets provide a flexible, mesh-independent machine learning framework for parametric surrogate modeling, uncertainty quantification, control, scientific design optimization, and resilience analysis, with empirical validation across domains from computational mechanics to critical infrastructure modeling.

1. Fundamental Principles and Universal Approximation

DeepONets are grounded in the operator-theoretic generalization of the classical universal approximation theorem: for any continuous operator $\mathcal{G}:V \to U$ between Banach spaces of functions on compact sets, there exist sufficiently expressive neural architectures able to approximate $\mathcal{G}$ to arbitrary accuracy (Lu et al., 2019). Specifically, the network decomposes operator approximation into two subnetworks:

Branch network: Encodes the input function $u$ by sampling its values at $m$ pre-defined "sensor" locations $\{\eta_j\}_{j=1}^{m}$ , yielding $b_i(u) = b_i(u(\eta_1),\ldots,u(\eta_m))$ ;
Trunk network: Encodes the target output location $y \in K_2$ via $t_i(y)$ .

The operator prediction is given by: $\mathcal{G}(u)(y) \approx \sum_{i=1}^p b_i(u) t_i(y).$

For any $\epsilon > 0$ , there exist $m, p$ and networks such that

$\sup_{u \in V, y \in K_2} |\mathcal{G}(u)(y) - \sum_{i=1}^p b_i(u) t_i(y)| < \epsilon$

(Lu et al., 2019, Goswami et al., 2022). This architecture supports both "stacked" (multiple branch nets) and "unstacked" (single branch and trunk) forms.

2. Architectural Design and Training Workflows

The canonical DeepONet architecture consists of fully connected feed-forward neural networks in both branch and trunk, with typical depths of 3–5 layers and widths of 50–200 neurons, using tanh, ReLU, or Swish activations (Goswami et al., 2022). Input representation to the branch net requires careful selection of sensor locations to capture function space variability. The trunk net can handle multi-dimensional inputs (spatial or spatio-temporal coordinates).

Loss functions in data-driven training are usually mean squared error, aggregated over training samples and output query locations: $\mathcal{L}(\theta) = \frac{1}{N_{\rm train}} \sum_{i=1}^{N_{\rm train}} \frac{1}{N_y} \sum_{j=1}^{N_y} \left( \hat{\mathcal{G}}_{\theta}^{(i)}(y_j) - \mathcal{G}^{(i)}(y_j) \right)^2$ (Dhulipala et al., 2022). Optimization is performed using Adam or related algorithms. For generalization and performance, branch/trunk decomposition is sharply superior to concatenated fully-connected networks (Lu et al., 2019), yielding substantially lower test MSE and smaller generalization gaps.

3. Extensions: Feature Expansion, Ensembles, and Physics-Informed Training

DeepONets are extended by several practices:

Feature expansion: Temporal features (e.g., $\sin(\omega t),\cos(\omega t)$ ), custom trunk bases, and proper orthogonal decomposition (POD) enrich expressivity (Goswami et al., 2022).
Ensembles and Mixture-of-Experts: Ensemble DeepONets concatenate multiple trunks (POD, classical, PoU-MoE) under a single branch, while PoU-MoE blends local trunks with spatial partition-of-unity weights. This increases accuracy, especially for solutions with strong spatial locality or sharp gradients—e.g., 2–4× lower mean $\ell_2$ errors in multi-dimensional PDE benchmarks (Sharma et al., 20 May 2024).
Physics-informed learning: Losses can incorporate PDE residuals, boundary/initial conditions, and variational penalties. Physics-informed DeepONets regularize training by adding terms such as

$L(\theta) = L_{\rm data}(\theta) + \lambda_{\rm PDE} L_{\rm PDE}(\theta),$

where $L_{\rm PDE}$ involves collocation of the residual at sampled points (Goswami et al., 2022, Ramezankhani et al., 20 Jun 2024). Advanced models use curriculum learning, domain decomposition (multi-head decoder), and sequential co-training for multi-physics or highly nonlinear systems (Ramezankhani et al., 20 Jun 2024).

4. Empirical Results, Generalization, and Computational Efficiency

Empirical studies demonstrate DeepONet and its variants can learn operators robustly from modest-sized data sets and generalize well to unseen scenarios—zero-shot learning—when the training set covers the expected variability (Dhulipala et al., 2022, Kobayashi et al., 2023). Quantitative metrics include:

Relative errors $<1\%$ and $R^2 > 0.99$ in test predictions for systems-of-systems recovery, nonlinear control, and PDE surrogate modeling.
Significant acceleration over traditional solvers; e.g., DeepONet inference of system recovery takes $\lesssim 1$ ms/curve, versus $1$–$10$ s for reference Monte Carlo trajectories ( $10^3$ – $10^4\times$ speedup) (Dhulipala et al., 2022).
Ensemble and PoU-MoE DeepONets reduce error by 2–4× compared to single-trunk or single-POD architectures, notably for problems with spatially sharp solution features (Sharma et al., 20 May 2024).
Advanced physics-informed models trained with nonlinear decoders and curriculum strategies show predictive errors two orders of magnitude lower than vanilla PIDON at comparable or increased inference speed (Ramezankhani et al., 20 Jun 2024).

Performance is bounded by architecture width/depth, sensor coverage, and training set diversity; careful selection of these hyperparameters is vital for generalization (Lee et al., 2023).

5. Theoretical Analysis and Error Convergence

The theoretical error of DeepONet decomposes into terms reflecting operator truncation, input sensor discretization, regression fit, and output domain sampling. For sufficiently smooth operators, polynomial or higher-order error decay with training set size is observed, and, under appropriate design, exponential convergence is possible in the small-data regime (Lu et al., 2019, Lee et al., 2023). For singularly perturbed PDEs, uniform approximation and generalization error bounds are proved by encoding function samples and training/test mesh points on Shishkin grids, which accurately capture boundary layers (Du et al., 2023).

Ensemble trunk architectures and domain decomposition align with the generalized universal approximation theorems, further enlarging the hypothesis space without sacrificing error control (Sharma et al., 20 May 2024). The Gram-Schmidt orthonormalization of trunk outputs after initial trunk training stabilizes the overall least-squares regression and improves conditioning and generalization (Lee et al., 2023).

6. Representative Applications

DeepONets are validated over a spectrum of physical, engineering, and scientific operator tasks:

Large-scale infrastructure recovery: Markov-renewal modeling of system-of-systems; DeepONets predict recovery trajectories and enable real-time resilience quantification with millisecond-scale inference (Dhulipala et al., 2022).
Computational mechanics and porous media: Darcy flow, lid-driven cavity (Navier–Stokes), quasi-brittle fracture, parametric elastic/plastic systems, and biological tissue mechanics show accuracies in the 1–5% relative error range with physics-regularized variants (Goswami et al., 2022).
Control and dynamical systems: Nonlinear ODE systems (Lotka–Volterra, pendulum, cart–pole), adaptive surrogate modeling of PDE-constrained design, and model predictive control (MPC) loops (Lin et al., 2022, Jong et al., 23 May 2025).
Surrogate modeling and digital twins: Real-time prediction, zero-shot generalization, and embedding in digital twin architectures for state estimation and rapid simulation in engineering environments (Kobayashi et al., 2023).

7. Limitations, Future Directions, and Recommendations

Current challenges include effective extrapolation outside the training envelope, learning in high-dimensional or irregular domains, handling noise and uncertainty, and integrating physical constraints into the learning loop. Recommendations include:

Ensuring branch sensor coverage and trunk query domain match the targeted operator's variability.
Employing ensemble/POD/PoU trunk augmentations for spatially localized or multi-scale outputs.
Using physics-informed loss penalties whenever ground-truth data are limited, or sharp physical constraints exist.
Applying domain-specific feature expansions and advanced training protocols (e.g., curriculum learning, multi-head decoders, sequential co-training) for strongly nonlinear or multi-physics operator learning.
Monitoring generalization using held-out validation sets and adaptively tuning hyperparameters based on error analysis.

Extensions toward scalable uncertainty quantification—e.g., randomized prior ensembles, replica-exchange SGLD—and structured randomized training architectures (RaNN-DeepONets) further improve robustness, efficiency, and computational tractability for large-scale scientific machine learning deployments (Yang et al., 2022, Lin et al., 2021, Jiang et al., 1 Mar 2025).