Universal Approximation Theorem for Operators

Updated 7 March 2026

Universal Approximation Theorem for Operators defines conditions for approximating any continuous operator between infinite-dimensional spaces using deep neural architectures.
The theorem utilizes truncated-input encoding with register and decoder mechanisms to achieve uniform approximation with minimal network width.
Depth in these networks generates exponential expressive power, underpinning architectures like DeepONets and Fourier Neural Operators in scientific machine learning.

A Universal Approximation Theorem (UAT) for operators provides rigorous conditions under which neural-network-like architectures can approximate any continuous (possibly nonlinear) operator between infinite-dimensional function spaces, uniformly on compact subsets. Such results generalize the classical UAT for neural networks—originally concerning functions $f : \mathbb{R}^d \to \mathbb{R}$ —to the operator-valued context $G : V \to C(Y)$ , where $V$ is a compact set of input functions and $Y$ is a compact subset of $\mathbb{R}^n$ . The study of universal operator approximation is foundational for operator learning in scientific machine learning and underpins architectures such as DeepONets, Fourier Neural Operators (FNO), and their variants.

1. Formal Operator Neural Network and Problem Setting

Let $X$ be a compact subset of a Banach space, $V \subset C(X)$ a compact set of real-valued continuous functions (“inputs”), and $Y \subset \mathbb{R}^n$ a compact output (“coordinate”) domain. The operator of interest is a mapping $G: V \to C(Y)$ , $u \mapsto (y \mapsto G(u)(y))$ . The goal is to construct, for any $\epsilon > 0$ , a neural architecture $F$ such that

$\sup_{u \in V,\, y \in Y} |G(u)(y) - F(u(x_1), \ldots, u(x_m), y)| < \epsilon$

for suitable sensor points $x_1, ..., x_m \in X$ . An Operator Neural Network (ONN) is thus a fully-connected feedforward network mapping $\mathbb{R}^{m+n} \to \mathbb{R}$ , implemented as $F$ , which may be demanded to have prescribed width and arbitrary depth, or vice versa, depending on the theorem variant (Yu et al., 2021).

A truncated-input ONN includes an additional input processing layer that encodes the $m$ real inputs $u(x_j)$ as a single number with finite decimal representation, a key technical device for the most stringent universal approximation results.

2. Main Universal Approximation Theorems for ONNs

2.1 Non-Polynomial Activations: Width Five

Let $\sigma: \mathbb{R} \to \mathbb{R}$ be continuous, non-polynomial, and $C^1$ at some $\alpha$ with $\sigma'(\alpha) \neq 0$ . Then, for every $\epsilon > 0$ , there exist $m \in \mathbb{N}$ and sampling points $x_1, ..., x_m \in X$ , and an ONN $F: \mathbb{R}^{m+n} \rightarrow \mathbb{R}$ of arbitrary depth and width 5 (with truncated inputs) such that

$\sup_{u \in V,\, y \in Y} |G(u)(y) - F(u(x_1), ..., u(x_m), y)| < \epsilon$

This theorem asserts that width can be held at 5 while depth is allowed to be arbitrary, under mild smoothness conditions on $\sigma$ (Yu et al., 2021).

2.2 Non-Affine Polynomial Activations

For polynomial (degree $\ge2$ ) but non-affine activation functions $\sigma$ , the same result holds with width 6. If $\sigma'(\alpha) = 0$ but $\sigma''(\alpha) \neq 0$ at some $\alpha$ , width 5 again suffices. Thus, a wide class of smooth activations admit universal operator approximation with strictly bounded width (Yu et al., 2021).

3. Construction Principles and Architectural Mechanisms

The realization of operator UAT in the “arbitrary-depth, bounded-width” setting exploits several key mechanisms:

Wide-shallow UAT (Chen et al.): Any continuous operator on compacta can be uniformly approximated by a network expressible as a finite sum over terms factorized into sensor-activated (branch) and location-activated (trunk) subnetworks—formally the structure underlying DeepONet (Lu et al., 2019).
Truncated-input encoding: Each real input is truncated to $\kappa$ decimal digits. All sensor values are encoded into a single “register” number $r$ via

$r = \sum_{j=1}^m 10^{-j\kappa} \lfloor 10^\kappa u(x_j) \rfloor$

Register-compute and decoder maps: Decoders $\phi_j$ reconstruct the $j$ th truncated block from $r$ , so each $u(x_j)$ can be approximately extracted at each layer as needed. Implementations rely on $C^1$ activation regularity for “carry-through” identity approximation and multipliers.
Minimal width: A width-5 architecture is organized as (1) register neuron; (2) two decoders; (3) one for local affine computation; (4) one “augmenter” that performs final summation. This minimal configuration achieves universality for all continuous nonlinear operators on compact sets for the prescribed class of activations (Yu et al., 2021).

4. Depth-Separation Results: ReLU Operator Networks

A rigorous depth-separation theorem demonstrates exponential gap between deep constant-width and shallow wide ReLU ONNs:

For any $k \ge 1$ $k \geq 1$ , there exists a continuous operator $G_k: V \to C(Y)$ $G_{k} : V \to C (Y)$ such that:
1. It can be computed exactly by a ReLU ONN of depth $2k^3 + 8$ , width $O(1)$ .
2. Any ReLU ONN of depth $\le k$ and at most $2^k$ neurons must incur error at least $1/64$ on some input, for some $u \in V$ .

This adapts Telgarsky’s sawtooth-function construction to the operator-valued setting, showing that certain operators are not well-approximated by shallow ReLU networks of subexponential width (Yu et al., 2021).

5. Architectural and Practical Implications

Constant-width (e.g., width 5 or 6) ONNs with deep architectures are universal for operator learning; “more depth, less width” architectures suffice.
Depth generates exponential expressive power: families of operators exist that deep, narrow ONNs can compute exactly but shallow, wide ones cannot approximate except at exponential cost in width.
In practical operator learning for scientific machine learning, deep, narrow ONNs are preferred for complex nonlinear operator classes.
The results provide explicit theoretical backing for DeepONets, FNOs, and related architectures, clarifying their universality and the role of depth-vs-width trade-offs in practice (Yu et al., 2021).

6. Relation to Other Operator UATs and Future Directions

The approach in (Yu et al., 2021) builds on, and extends, the classical operator UAT established by Chen & Chen (1995) and the first DeepONet architectures, which allow for arbitrary width and bounded depth (Lu et al., 2019). By combining truncation-based encoding with register-compute mechanisms and controlled decoder extraction, the width constraint is reduced to its theoretical minimum, and the impact of depth is sharply quantified.

Contemporary directions include analysis for other neural operator architectures (e.g., Enc-Dec schemes, FNOs, transformer-based operator approximators) and quantitative scaling laws for width, depth, and parameter-efficiency under varying operator regularity and input domain complexity. Open problems include optimal encoding schemes beyond decimal truncation, depth-width trade-offs for other activation classes, and extensions to random-input and measure-theoretic operator settings.

References:

(Yu et al., 2021) Arbitrary-Depth Universal Approximation Theorems for Operator Neural Networks (Lu et al., 2019) DeepONet: Learning Nonlinear Operators for Identifying Differential Equations Based on the Universal Approximation Theorem of Operators

Markdown Report Issue Upgrade to Chat

References (2)

Arbitrary-Depth Universal Approximation Theorems for Operator Neural Networks (2021)

DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Universal Approximation Theorem for Operators.