Conditional Neural Operator Architecture

Updated 21 December 2025

Conditional neural operator architectures are neural network models that approximate mappings between infinite-dimensional function spaces by conditioning on additional inputs like functions or vectors.
They employ modular designs—encoding input functions, conditioning descriptors, and output variables—to fuse representations and ensure universal approximation across diverse applications.
Empirical results demonstrate significant reductions in computational complexity and error when applied to parametric PDEs, optimal control, and inverse problems.

A conditional neural operator architecture is a broad class of neural operator designs that approximate operators between infinite-dimensional function spaces, where the operator itself is parametrized or conditioned on an additional input such as a function, vector, or more general descriptor. This paradigm extends classical operator learning to heterogenous problem families—such as parametric PDEs or control problems—where the solution operator must adapt to each instance's parameters or context. Conditional neural operators are implemented via various neural architectures that encode both the primary input function and the conditioning descriptor, fuse these representations, and generate the corresponding output function or field. The resulting architectures unify and generalize models including DeepONet, Fourier Neural Operator (FNO), and adaptive spectral approaches. Rigorous approximation theory guarantees universal approximation properties for families of conditional operators, and strong empirical performance has been demonstrated for a range of scientific and engineering tasks.

1. Mathematical Foundations of Conditional Neural Operators

Conditional neural operator learning formalizes the task: given a family of operators $\{G[\phi]: U \to V\}_{\phi \in W}$ (with $U$ , $V$ function spaces and $W$ a parameter or function space), approximate the mapping $(u, \phi) \mapsto G[\phi][u]$ jointly in all arguments. This allows a single neural architecture to flexibly realize a continuum of operators, rather than a fixed point-to-point mapping.

Mathematically, for input function $u \in U$ and condition $\phi \in W$ , the operator is

$\mathcal{G}(u; \phi) = G[\phi](u): U \times W \to V, \quad \mathcal{G}(u; \phi)(x) = G[\phi](u)(x), \; x \in \Omega_V.$

This framework subsumes parametric PDE solvers, optimal control policies parametrized by problem data, and stochastic closure models conditioned on measurements or reduced states (Weihs et al., 29 Oct 2025, Feng et al., 17 Dec 2024, Dong et al., 6 Aug 2024, Cao et al., 22 Jul 2025).

2. Core Architectural Patterns

Most conditional neural operator architectures employ modular designs that separately encode the input function, the conditioning descriptor, and the output location/variable. These representations are then fused, often via inner-product or low-rank expansion, to produce the final output.

Example Architectures

Multiple Operator Network (MONet):

Three subnetworks: branch (input $u$ ), parameter (condition $\phi$ ), and trunk (output $x$ ); outputs are combined as

$\sum_{k=1}^N \sum_{i=1}^M \tau_k(x) b_{ki}(u) L_{ki}(\phi)$

where each $\tau_k$ , $b_{ki}$ , $L_{ki}$ is an MLP on its argument (Weihs et al., 29 Oct 2025).

Multiple Nonlinear Operator (MNO):

Decomposes the expansion into parameter modes and low-rank trunk-branch fusions,

$\sum_{p=1}^P l_p(\phi) \sum_{k=1}^{H_p} b_{pk}(u) \tau_{pk}(x).$

Neural Adaptive Spectral Method (NASM):

Encodes the problem instance $p$ as vector $e$ , then computes B-spline/Fourier/Chebyshev basis coefficients and adaptive basis parameters as

$\mathcal{N}_{\mathrm{NASM}}(p)(t) = \sum_{j=1}^K c_j(t,e) \phi_j(t; \theta(t,e))$

with the dependence on $p$ entirely through $e$ (Feng et al., 17 Dec 2024).

Conditional Score-based FNO:

For stochastic PDE closure, FNO variants encode the noisy field, conditioned state, measurement vector, and time embedding through separate pipelines and fuse them by concatenation and a final convolution, which is used as the "score" for reverse diffusion in a conditional generative model (Dong et al., 6 Aug 2024).

Diff-ANO Conditional Consistency Model:

U-Net backbone with ControlNet-style side branches encodes a time-indexed latent, measurement conditioning, and spatial features, enabling learned conditional diffusion priors for inverse problems (Cao et al., 22 Jul 2025).

A schematic table of core architectural modules:

Module Type	Description	Typical Models
Branch/Trunk/Param	Separate MLPs/CNNs for $u$ , $\phi$ , $x$	MONet, DeepONet
Encoder+CoefNet+Basis	Encode $p$ ; map $(e,t)$ to coeffs/basis; spectral aggregation	NASM
Multi-modal FNO	Parallel pipelines for field, condition, measurements, noise	Conditional FNO/Diffusion
Consistency U-Net	U-Net with measurement-conditional side branch, time embedding	Diff-ANO

3. Universal Approximation and Scaling Laws

Conditional neural operator architectures achieve universal approximation properties across continuous, measurable, and Lipschitz operator families. The existence of architectures—such as MONet and MNO—that can approximate any continuous conditional operator to arbitrary accuracy is rigorously established, with explicit scaling laws between model size and error depending on the functional and spatial problem dimension (Weihs et al., 29 Oct 2025).

Universal Approximation (UAP):

For any $\varepsilon > 0$ , exists a neural architecture such that

$\sup_{(\phi, u, x)} |G[\phi][u](x) - \mathrm{NN}[\phi][u](x)| < \varepsilon.$

Quantitative Scaling:

The error $\varepsilon$ scales as an inverse power of iterated logarithms in $N_\#$ (number of network parameters), depending on approximation order and the dimensions of the function/parameter spaces.

Architectural Trade-offs:

The complexity (network width/depth) can be distributed between subnetworks (branch/trunk/param) to optimize for target applications. "Function-then-functional" versus "functional-then-function" strategies yield different rates (Weihs et al., 29 Oct 2025).

For NASM, explicit spectral truncation and MLP approximation bounds combine to

$\| \mathcal{G} - \mathcal{N}_{\mathrm{NASM}} \| \leq C_1 K^{-s} + C_2 \epsilon$

with $K$ basis size, $s$ Sobolev regularity, and $\epsilon$ the MLP error (Feng et al., 17 Dec 2024).

4. Representative Training and Inference Workflows

The training protocols for conditional neural operators adapt standard operator-learning procedures to multi-operator or parametric tasks:

Data Generation:
- Sample input functions and/or problem parameters from an appropriate distribution.
- Generate ground-truth operator outputs from high-fidelity solvers (e.g., finite difference, direct optimal control, classical closure).
Network Training:
- Subnetworks encode the relevant function spaces and conditioning variables.
- Train with mean squared error, denoising score matching, or reconstruction+consistency losses.
- Typical optimizers: Adam, learning rate decay, mini-batch updates.
Inference:
- Encode condition; then, for each new input function and query location/time, compute the output via a single forward pass.
- For generative models, perform few-step or SDE-driven sampling using the trained conditional score or consistency model.

For NASM, this reduces optimal control solution from iterative numerical optimization to a single network evaluation per time query, leading to $O(10^4)$ FLOPs per query. For conditional score-based models based on FNOs, both training and efficient conditional sampling exploit mesh-independence through Fourier layers and fast convolution (Feng et al., 17 Dec 2024, Dong et al., 6 Aug 2024).

5. Empirical Performance and Benchmarks

Conditional neural operator architectures have been empirically validated across a broad set of PDE families, optimal control systems, and inverse problems:

Parametric PDEs:

MNO and MONet exhibit lower average relative $L^2$ error than DeepONet, FNO, and MIONet on conservation laws, diffusion-reaction, nonlinear wave, and parametric reaction-diffusion, with errors reduced by 20–40% upon scaling model size (Weihs et al., 29 Oct 2025).

Optimal Control:

NASM yields $6000\times$ speedup over direct optimization, and outperforms DeepONet, FNO, SNO, and plain MLPs on synthetic environments (e.g., quadrotor, cartpole) and real-world datasets (robotic pushing with image/friction conditioning), especially for out-of-distribution generalization (Feng et al., 17 Dec 2024).

Stochastic Closure Modeling:

Conditional FNO-based diffusion models capture non-local and stochastic corrections, generalizing across spatial resolutions and sampling ensemble statistics coherent with high-fidelity data (Dong et al., 6 Aug 2024).

Inverse Imaging:

Diff-ANO achieves high-throughput, high-quality USCT reconstructions, replacing costly iterative solvers and gradient computations with learned operator surrogates and conditional consistency models (Cao et al., 22 Jul 2025).

6. Conditioning Mechanisms and Architectural Innovations

Conditioning in these architectures is handled via:

Parameter nets or encoders that map the conditioning function or descriptor ( $\phi$ , $p$ , $y$ ) into feature vectors.
Adaptive basis or trunk expansions, whose coefficients or basis parameters are functions of the encoded condition.
Parallel pipelines or side branches (as in ControlNet or FNO) that allow injection of measurement or parametric information at all network depths.
Explicit design for mesh-independence (as in FNO), time-dependence in control or PDE outputs, and basis adaptation for regularity.

Ablation studies confirm the necessity of explicit conditioning mechanisms. For example, removing time-dependence in coefficient networks or basis adaptation in NASM degrades performance; similarly, flattened parameter-agnostic networks in multi-operator MNO/DeepONet yield less efficient or less accurate approximations (Feng et al., 17 Dec 2024, Weihs et al., 29 Oct 2025).

7. Applications, Limitations, and Outlook

Applications span PDE surrogate modeling, parametric optimal control, uncertainty quantification, inverse problems in imaging, and stochastic subgrid closure in climate/turbulence.

Key strengths include:

Reusable operator surrogates for entire families of problems (parametric/dependent PDEs, control functionals).
Empirical and theoretical guarantees of strong accuracy across in-distribution and moderate out-of-distribution regimes.
Order-of-magnitude speedup in inference relative to classical numerical approaches.

Limitations are typically associated with the scaling of training data requirements with the dimension of the function or conditioning space, and with the need to hand-craft basis or encoder architectures for particular problem families. Recent theory provides guidance for balancing network complexity among submodules to optimize for target problem regularity and dataset/parameter domain complexity (Weihs et al., 29 Oct 2025).

Future directions include further unifying conditional operator architectures with generative and stochastic modeling tools, integrating differentiable programming with learned neural surrogates, and extending the mesh-independence and scaling properties to more general non-Euclidean domains and stochastic operator families (Feng et al., 17 Dec 2024, Dong et al., 6 Aug 2024, Weihs et al., 29 Oct 2025, Cao et al., 22 Jul 2025).