Papers
Topics
Authors
Recent
2000 character limit reached

DeepONet: Neural Operator Architecture

Updated 9 November 2025
  • DeepONet is a neural operator architecture that approximates mappings between infinite-dimensional function spaces using a universal approximation theorem.
  • It employs a branch-trunk design where one network encodes input functions sampled at sensor locations and the other encodes evaluation coordinates for output reconstruction.
  • Enhanced training strategies such as QR-based two-step optimization facilitate zero-shot generalization and high accuracy in surrogate modeling and digital twin applications.

A Deep Operator Network (DeepONet) is a neural operator architecture designed to approximate nonlinear mappings between infinite-dimensional function spaces, with a core application in learning the solution operators associated with parameterized ordinary and partial differential equations. Unlike classical neural networks that directly approximate functions (i.e., mappings between finite-dimensional vector spaces), DeepONet leverages a universal approximation theorem for operators, targeting the learning of the entire input–output relationship (i.e., operator) from limited data, and supporting rapid, mesh-free evaluations for new, unseen input functions. This makes DeepONet particularly effective for surrogate modeling, digital twin development, and fast inversion in science and engineering contexts.

1. Mathematical Foundations and Universal Approximation

The key theoretical foundation of DeepONet is the universal approximation theorem for nonlinear operators, originally established by Chen & Chen (1995) and later refined by Lu et al. (2021). Explicitly, DeepONet seeks to learn an operator

G:u()(G(u))(y)\mathcal{G}: u(\cdot) \mapsto (\mathcal{G}(u))(y)

where uu is a function from a Banach space U\mathcal{U}, and (G(u))(y)(\mathcal{G}(u))(y) denotes the value of the output function at location yy (in space, time, or spacetime).

For any ε>0\varepsilon>0, there exists a finite expansion,

(G(u))(y)i=1pbi(u(x1),,u(xm))ti(y),(\mathcal{G}(u))(y) \approx \sum_{i=1}^p b_i(u(x_1), \ldots, u(x_m))\, t_i(y),

where uu is sampled at "sensor" locations x1,,xmx_1, \ldots, x_m, bib_i encode information from the input function (the branch network outputs), and tit_i encode the dependence on the output location (the trunk network outputs). In vector notation, this takes the form: (G(u))(y)b(u),t(y)=b(u)t(y)(\mathcal{G}(u))(y) \approx \langle\, \mathbf{b}(u),\, \mathbf{t}(y)\,\rangle = \mathbf{b}(u)^{\top} \mathbf{t}(y) This trunk–branch structure is dense in the space of continuous operators US\mathcal{U} \to \mathcal{S}, guaranteeing approximation power for sufficiently large pp and sufficiently deep/wide subnetworks (Lu et al., 2019, Kobayashi et al., 2023).

2. Network Architecture: Branch–Trunk Design

The canonical DeepONet architecture is defined by two primary feed-forward multilayer perceptrons (MLPs):

  • Branch Network: Receives the input function uu sampled at mm fixed sensor locations, producing a pp-dimensional embedding b(u)Rp\mathbf{b}(u) \in \mathbb{R}^p. For multidimensional or complex inputs, the branch may be augmented via CNNs, RNNs (e.g., GRU/LSTM for temporal sequence inputs (He et al., 2023)), or other specialized encoders.
  • Trunk Network: Receives spatial, temporal, or parameter coordinates yRdy \in \mathbb{R}^d (the evaluation site of the output function), outputting a corresponding pp-dimensional vector t(y)Rp\mathbf{t}(y) \in \mathbb{R}^p.
  • Combination/Readout: The final prediction is b(u)t(y)\mathbf{b}(u)^{\top} \mathbf{t}(y), typically producing a scalar, or a vector if output is multidimensional.

A typical configuration for moderate-size PDE problems is:

  • Branch: [m,40,40][m, 40, 40] MLP,
  • Trunk: [d,40,40][d, 40, 40] MLP,
  • Activations: ReLU,
  • Weight Initialization: Glorot (Kobayashi et al., 2023).

For input functions discretized on arbitrary, non-aligned point clouds or with variable sensor resolution, resolution-independent DeepONet and related dictionary-learning extensions generalize the branch input representation, achieving resolution-agnostic embedding (Bahmani et al., 17 Jul 2024).

Architectural Variants

Recent work extends the core architecture to:

  • Multiple Branches: Enhanced DeepONet (EDeepONet) for multi-input function spaces (e.g., for multiphysics or multiple forcing terms), where several parallel branch networks encode independent function arguments (Tan et al., 2022).
  • Trunk Enrichment: Ensemble DeepONet and Mixture-of-Experts (MoE) trunk networks combine multiple trunk bases—including spatially-local and global (e.g., POD, partition-of-unity) experts—for increased expressiveness and to better represent functions with both smooth and localized features (Sharma et al., 20 May 2024).
  • Domain Decomposition: DD-DeepONet couples several local DeepONets via Schwarz-type interface iteration or stretching transforms, enabling scalable approximation on complex or variable geometries (Yang et al., 31 Jul 2025).

3. Training Strategies and Generalization Properties

Loss Functions

Standard DeepONet training minimizes an empirical regression loss over a dataset of triplets (u(k),y(k),s(k))(u^{(k)}, y^{(k)}, s^{(k)}), using mean squared error (MSE), mean relative L2L^2, or mean absolute error (MAE), as problem-appropriate.

  • Optimizer: Typically Adam,
  • Learning Rate: 1×103\sim 1 \times 10^{-3},
  • Schedule: Fixed iteration count per benchmark (e.g., 10,000 for ODE/diffusion, 50,000 for Burgers) (Kobayashi et al., 2023).

Training Procedures

Joint training of the branch and trunk networks is nonconvex and can be ill-conditioned. Divide-and-conquer schemes—in particular, two-step training methods that orthonormalize trunk outputs via QR (Gram–Schmidt) decomposition before calibrating the branch net (Lee et al., 2023)—yield improved conditioning, stability, and generalization, achieving provably algebraic rates in all sampling and architecture hyperparameters.

Generalization and Zero-Shot Learning

A striking property is zero-shot generalization: DeepONet, once trained, can immediately predict operator outputs for new, unseen input functions without retraining. Demonstrated performance on ODE and diffusion–reaction operators indicates

  • ODE: mean R2=0.9974±0.0165R^2 = 0.9974 \pm 0.0165,
  • Diffusion: mean R2=0.9990±0.0011R^2 = 0.9990 \pm 0.0011,
  • Over 90% of test cases with R2>0.95R^2 > 0.95 for both, in truly held-out function spaces (Kobayashi et al., 2023).

For tasks such as uncertainty propagation—e.g., reliability analysis of stochastic dynamical systems—this zero-shot capability enables real-time prediction of ensembles of responses under stochastic forcing (Garg et al., 2022).

Challenges for Nonlinear/Hyperbolic Problems

For problems with sharp gradients and stiff features, such as the viscous Burgers’ equation, classic DeepONet may underfit steep gradients (mean R20.44\text{mean } R^2 \approx 0.44, min –3.6) despite deeper/wider architectures or longer training. Suggested remedies include:

  • Data augmentation,
  • Incorporation of physics-informed loss terms,
  • Using more expressive branch networks (e.g., with attention or CNN layers) (Kobayashi et al., 2023).

4. Extensions and Applications

Digital Twins and Surrogate Modeling

DeepONet's function-to-function modeling capacity makes it highly suitable as a surrogate model in digital twin frameworks. Notable applications include:

  • Rapid (on-the-fly) surrogate prediction integrated with real-time data assimilation (e.g., with Kalman or Bayesian filters),
  • Multi-fidelity fusion: combining observations/sensor data with low-fidelity solver outputs,
  • Explainable operator surrogates via conformal prediction and sensitivity metrics for reliability assessment (Kobayashi et al., 2023).

Ensemble/Expert Models and Spatial Generalization

The ensemble DeepONet and spatial mixture-of-experts (MoE) augmentations are shown to deliver 2–4x lower relative 2\ell_2 errors on 2D/3D PDEs compared to classical DeepONet or POD-DeepONet (Sharma et al., 20 May 2024). These models exploit trunk enrichment for greater generalization, particularly for complex and multi-scale physical systems.

Multi-Fidelity and Physics-Informed Learning

Composite DeepONet architectures use parallel subnetworks for low- and high-fidelity data and correction nets (linear and nonlinear), enabling accurate operator learning with limited high-fidelity samples. This structure facilitates robust data-driven and residual-based (physics-informed) learning, requiring orders-of-magnitude fewer high-fidelity samples for a given target accuracy (Howard et al., 2022).

Super-Resolution and Sparse Input Regimes

DeepONet-based super-resolution models recover high-frequency oscillations and fine-scale structure from extremely low-resolution inputs, demonstrably outperforming standard interpolation by 2–3 orders of magnitude in L2L_2 error, even with training datasets as small as N=45N=45 (Yang, 11 Oct 2024).

Time-Series and Sequential Learning

GRU- and LSTM-augmented branch networks for DeepONet maintain temporal causality and improve accuracy for full-field predictions under path-dependent or history-dependent loads, outperforming feed-forward benchmarks by up to 2.5x in relevant applications (He et al., 2023).

5. Algorithmic, Computational, and Practical Considerations

Computational Scaling

  • Classical DeepONet: Evaluation cost per query is O(mp+dp)O(mp + dp), linearly scaling with sensor and trunk dimension.
  • Quantum DeepONet: By leveraging unary encoding and orthogonal quantum layers, forward evaluation cost is reduced from O(d2)O(d^2) to O(d)O(d) (Xiao et al., 24 Sep 2024).

Resolution Independence

RI-DeepONet and related dictionary-based approaches enable sensor-location-invariant operator surrogates by learning continuous input/output bases (e.g., SIRENs), preserving accuracy across arbitrary discretizations and point clouds (Bahmani et al., 17 Jul 2024).

Robustness and Limitations

  • Training cost can be significant, but is amortized over rapid inference for many scenarios (e.g., uncertainty quantification, outer-loop optimization).
  • For strongly nonlinear/hyperbolic PDEs, or rapidly varying geometries, explicit data coverage or model adaptation is critical; basic MLP-based trunk/branch combinations may fail without further enhancement (Kobayashi et al., 2023, Yang et al., 31 Jul 2025).
  • For parametric and multiphysics systems, the use of branch/trunk architecture supporting multiple branches (multi-field coupling) is advised (Tan et al., 2022).

6. Future Directions and Open Problems

Key open research questions and directions for DeepONet and related operator-learning frameworks include:

7. Summary Table: DeepONet Performance on Key Benchmarks

Problem Mean R2R^2 RMSE Special Notes
ODE operator 0.9974±0.016 7.16×10⁻³ >90% of test R2R^2 > 0.95
Diffusion–reaction 0.9990±0.001 8.9×10⁻³ min R2R^2 0.9361
Burgers’ (conv–diff, ν=0.01\nu=0.01) 0.437±0.646 9.7×10⁻² Many cases R2<0R^2<0

This table collates the core findings on three canonical test cases, highlighting that DeepONets deliver robust operator surrogates with high generalization on linear and parabolic operators, but require architectural or data enhancements in the presence of sharp nonlinearities (Kobayashi et al., 2023).


DeepONet exemplifies a theoretical and practical convergence of functional analysis, neural approximation, and applied scientific computing, supporting a wide range of applications where operator surrogacy, generalization to unseen functions, and scalability in high-dimensional, variable scenarios are required. Continued advances in architecture, optimization, uncertainty quantification, and domain-adaptive learning will likely extend the scope and reliability of DeepONet-type operator surrogates in computational science.

Whiteboard

Follow Topic

Get notified by email when new papers are published related to DeepONet.