DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators (1910.03193v3)

Published 8 Oct 2019 in cs.LG and stat.ML

Abstract: While it is widely known that neural networks are universal approximators of continuous functions, a less known and perhaps more powerful result is that a neural network with a single hidden layer can approximate accurately any nonlinear continuous operator. This universal approximation theorem is suggestive of the potential application of neural networks in learning nonlinear operators from data. However, the theorem guarantees only a small approximation error for a sufficient large network, and does not consider the important optimization and generalization errors. To realize this theorem in practice, we propose deep operator networks (DeepONets) to learn operators accurately and efficiently from a relatively small dataset. A DeepONet consists of two sub-networks, one for encoding the input function at a fixed number of sensors $x_i, i=1,\dots,m$ (branch net), and another for encoding the locations for the output functions (trunk net). We perform systematic simulations for identifying two types of operators, i.e., dynamic systems and partial differential equations, and demonstrate that DeepONet significantly reduces the generalization error compared to the fully-connected networks. We also derive theoretically the dependence of the approximation error in terms of the number of sensors (where the input function is defined) as well as the input function type, and we verify the theorem with computational results. More importantly, we observe high-order error convergence in our computational tests, namely polynomial rates (from half order to fourth order) and even exponential convergence with respect to the training dataset size.

Citations (1,705)

View on Semantic Scholar

Summary

The paper introduces DeepONet, a novel architecture leveraging the universal approximation theorem to learn nonlinear operators from limited data.
It employs a dual-branch design with a branch net for input functions and a trunk net for output coordinates to minimize optimization errors.
Extensive numerical experiments validate DeepONet’s high accuracy and superior convergence rates in solving various differential equations.

DeepONet: Learning Nonlinear Operators for Identifying Differential Equations

The paper, "DeepONet: Learning Nonlinear Operators for Identifying Differential Equations based on the Universal Approximation Theorem of Operators," introduces a novel neural network architecture designed to learn nonlinear operators from a limited set of data. The methodology leverages the universal approximation theorem for operators, which bridges fundamental concepts in continuous operator approximation and advanced neural network techniques.

The authors focus on two principal challenges: the practical realization of the universal approximation theorem for operators, which theoretically guarantees small approximation errors for sufficiently large networks, and the mitigation of optimization and generalization errors. Addressing these challenges, they propose the Deep Operator Network (DeepONet), an architecture composed of two interconnected sub-networks – the "branch" net for encoding input function data at finite, fixed locations (sensors), and the "trunk" net that encodes desired output locations.

Theoretical Foundations

The paper begins by elaborating on the universal approximation theorem for operators, extending the work of Chen and Chen (1995). This theorem posits that a neural network with a single hidden layer is capable of approximating any nonlinear continuous operator. However, practical performance is nuanced by optimization and generalization errors, which are not addressed by the theorem. Thus, optimizing the network's architecture becomes paramount in practical applications.

Methodology

DeepONet's architecture is central to its ability to minimize the aforementioned errors. The branch net processes discretized input functions at various sensor points, while the trunk net processes the coordinates at which the operator's output is evaluated. This division enables the network to effectively handle high-dimensional inputs and outputs, offering a structured approach for learning operators.

Significantly, the authors highlight two architectural variations: the stacked DeepONet, which employs multiple branch networks, and the unstacked DeepONet, which employs a single branch network to output a vector. Comparative results show that while both architectures achieve superior performance over fully connected networks (FNNs), the unstacked DeepONet offers practical advantages in terms of computational efficiency and generalization capability.

Numerical Experiments and Results

The paper documents extensive numerical experiments across various types of dynamic systems and partial differential equations (PDEs). The primary findings are:

Performance Comparison: DeepONet demonstrates markedly lower generalization errors compared to FNNs, even in simple linear problems.
Convergence Rates: High-order convergence rates (both polynomial and exponential) are observed with respect to the training dataset size. Specifically, the paper notes exponential convergence for small datasets, transitioning to polynomial rates for larger datasets.
Multi-dimensional PDEs: For PDE cases, such as the diffusion-reaction system with a source term, DeepONet showcases its versatility by effectively learning from grids of different densities and generating highly accurate predictions.
Sensor Density and Smoothness: The number of sensors required for accurate operator learning scales with the function's smoothness and the problem's temporal/spatial extent. The results provide empirical support for the theoretical bounds derived for sensor requirements.
Function Spaces: The analysis includes diverse function spaces such as Gaussian random fields and Chebyshev polynomials, indicating robustness across varied scenarios.

Implications and Future Directions

The introduction of DeepONet has multifaceted implications. Theoretically, it augments the understanding of neural network capabilities in approximating continuous operators and points towards potential upper bounds on network sizes for operator approximation. Practically, DeepONet's architecture demonstrates profound utility in scientific domains requiring operator learning from sparse or high-dimensional data, such as physics-informed machine learning.

Future research could explore enhancing DeepONet further by incorporating advanced architectures like convolutional layers or attention mechanisms, which might yield even better performance. Additionally, developing a deeper theoretical foundation for the error bounds and generalization capabilities of DeepONets remains a compelling avenue for exploration.

Conclusion

DeepONet represents a significant advancement in leveraging neural networks for approximating nonlinear operators. By meticulously addressing optimization and generalization errors, the proposed architecture achieves exceptional accuracy and efficiency, making it a valuable tool for identifying and solving differential equations. The empirical evidence of high-order convergence rates and robust performance across diverse scenarios underscores the potential and versatility of DeepONet in applied scientific research.

PDF Markdown

Related Papers

YouTube

Show All Videos