Thompson Sampling in Function Spaces via Neural Operators (2506.21894v1)

Published 27 Jun 2025 in stat.ML and cs.LG

Abstract: We propose an extension of Thompson sampling to optimization problems over function spaces where the objective is a known functional of an unknown operator's output. We assume that functional evaluations are inexpensive, while queries to the operator (such as running a high-fidelity simulator) are costly. Our algorithm employs a sample-then-optimize approach using neural operator surrogates. This strategy avoids explicit uncertainty quantification by treating trained neural operators as approximate samples from a Gaussian process. We provide novel theoretical convergence guarantees, based on Gaussian processes in the infinite-dimensional setting, under minimal assumptions. We benchmark our method against existing baselines on functional optimization tasks involving partial differential equations and other nonlinear operator-driven phenomena, demonstrating improved sample efficiency and competitive performance.

Summary

The paper presents NOTS, which integrates neural operators with Thompson sampling to optimize over infinite-dimensional function spaces.
It leverages the infinite-width limit of neural operators to produce samples from a GP posterior, yielding sublinear regret bounds.
Empirical results on PDE benchmarks, like Darcy flow and shallow water equations, demonstrate superior performance compared to GP-based methods.

Thompson Sampling in Function Spaces via Neural Operators

This paper introduces a principled framework for Thompson sampling in infinite-dimensional function spaces, leveraging neural operator surrogates to optimize functionals of unknown operators. The approach is motivated by scientific and engineering problems where the objective is to optimize a functional of the output of a black-box operator, such as a PDE solver or a physical simulator, with expensive oracle access. The proposed method, Neural Operator Thompson Sampling (NOTS), integrates neural operator models with Thompson sampling to efficiently explore and optimize over function spaces, providing both theoretical guarantees and empirical validation on challenging PDE benchmarks.

Problem Setting and Motivation

The central problem addressed is the optimization of objectives of the form

$a^* \in \arg\max_{a \in \mathcal{A}} f(G_*(a)),$

where $G_*$ is an unknown operator mapping between function spaces, and $f$ is a known, cheap-to-evaluate functional. This setting arises in applications such as optimal design in porous media, climate modeling, and inverse problems in physics, where both the input and output are functions, and the search space is infinite-dimensional.

Traditional Bayesian optimization (BO) and bandit algorithms are ill-suited for this regime due to the curse of dimensionality and the lack of scalable uncertainty quantification for neural operator models. Existing approaches either restrict to finite-dimensional parameterizations or rely on expensive ensemble-based uncertainty estimation.

Methodology

The core contribution is the development of NOTS, which employs neural operators as surrogates for the unknown operator $G_*$ . The method proceeds iteratively:

Posterior Sampling via Neural Operators: At each iteration, a neural operator is randomly initialized and trained on the current dataset using regularized least-squares loss. The initialization is chosen such that, in the infinite-width limit, the trained model corresponds to a sample from the posterior of a vector-valued Gaussian process (GP) over operators. This leverages the connection between infinitely wide neural networks and GPs, specifically via the neural network Gaussian process (NNGP) kernel.
Acquisition via Thompson Sampling: The trained neural operator surrogate is used to select the next input function by maximizing the objective functional over the search space, i.e., Thompson sampling in function space.
Oracle Query and Update: The selected input is evaluated via the expensive oracle, and the dataset is updated.

This process is formalized in Algorithm 1 of the paper, and the approach avoids the need for explicit uncertainty quantification (e.g., deep ensembles or mixture density networks) by relying on the theoretical properties of infinite-width neural operators.

Theoretical Analysis

The paper provides a rigorous analysis of the infinite-width limit of neural operators, showing that, under appropriate initialization and training of only the last linear layer, the trained model converges to a sample from the GP posterior. This enables the extension of regret bounds from GP-based Thompson sampling to the function space setting. Specifically, for linear functionals and finite search spaces, the Bayesian cumulative regret of NOTS is shown to be sublinear, scaling as $O(\sqrt{T \gamma_T})$ , where $\gamma_T$ is the maximum information gain for the induced kernel.

The analysis is grounded in the operator-valued kernel framework for vector-valued GPs, and the results are contingent on the regularity of the functional and the noise model. The paper also discusses the limitations of the theory, particularly for nonlinear functionals and multi-layer neural operators, and outlines directions for extending the analysis.

Empirical Evaluation

The empirical section evaluates NOTS on two canonical PDE benchmarks:

Darcy Flow: Optimization of functionals such as negative total flow rate, total pressure, and potential power, with input functions representing permeability fields discretized on a $16 \times 16$ grid.
Shallow Water Equations: An inverse problem where the goal is to find initial conditions that reproduce a target solution, with inputs discretized on a $32 \times 64$ grid.

NOTS is compared against several baselines, including GP-based BO (with infinite-width ReLU BNNs), Bayesian Functional Optimization (BFO), and sample-then-optimize neural Thompson sampling (STO-NTS). The results demonstrate that NOTS consistently achieves lower cumulative regret and superior optimization performance, particularly in high-dimensional settings where GP-based methods degrade. For example, in the Darcy flow rate optimization, NOTS outperforms all baselines, efficiently identifying input functions that minimize leakage or maximize pressure, as visualized in the provided figures.

Implementation Considerations

Neural Operator Architecture: The implementation uses Fourier Neural Operators (FNOs) as the primary surrogate, with recommended settings from the neural operator library. For theoretical alignment, experiments with single-hidden-layer FNOs and last-layer-only training are also reported.
Initialization and Training: Kaiming or LeCun initialization is used to ensure the correct variance scaling for the infinite-width limit. Training is performed via mini-batch SGD with L2 regularization.
Computational Requirements: The approach is scalable to high-dimensional discretizations due to the efficiency of FNOs and the avoidance of explicit uncertainty quantification.
Search Space Discretization: In practice, the infinite-dimensional search space is discretized to a finite pool of candidate functions, enabling tractable optimization and evaluation.

Implications and Future Directions

The proposed NOTS framework provides a scalable and theoretically grounded approach for active optimization in function spaces, with direct applicability to scientific and engineering domains involving expensive black-box operators. The method circumvents the limitations of traditional BO in high dimensions and offers a practical alternative to ensemble-based uncertainty quantification for neural operators.

Theoretically, the work bridges the gap between operator learning, Bayesian optimization, and Thompson sampling, extending regret guarantees to the function space setting. Practically, the empirical results indicate that neural operator surrogates can enable efficient exploration and optimization in regimes where GP-based methods are infeasible.

Future research directions include:

Extending the theoretical analysis to continuous search spaces and nonlinear functionals.
Generalizing the regret bounds to multi-layer neural operators and more general noise models.
Developing batch and parallel variants of NOTS for further scalability.
Applying the framework to real-world scientific discovery and engineering design problems, where function-space optimization is critical.

Summary Table: Key Features and Results

Aspect	NOTS (Proposed)	GP-based BO	Ensemble/Active Learning
Surrogate Model	Neural Operator (FNO)	GP/BNN	DeepONet/FNO Ensemble
Uncertainty Quant.	Infinite-width GP limit	Explicit GP posterior	Ensemble variance
Regret Guarantee	Sublinear (Bayesian)	Sublinear (finite-dim)	Not available
Scalability	High (function space)	Poor (high-dim input)	Moderate
Empirical Performance	Superior in high-dim	Degrades in high-dim	Task-dependent

Concluding Remarks

This work establishes a new paradigm for active optimization in function spaces by unifying neural operator learning with Thompson sampling. The approach is theoretically justified, computationally efficient, and empirically validated on challenging PDE optimization tasks. The framework opens avenues for principled, data-efficient optimization in domains where the search space is inherently infinite-dimensional and the cost of oracle queries is prohibitive.