Neural Thompson Sampling

Updated 23 January 2026

Neural Thompson Sampling is a method that extends classical Thompson Sampling by using neural networks to handle high-dimensional, non-linear reward functions.
It employs neural operator architectures to approximate posterior distributions efficiently, replacing traditional Gaussian Process surrogates.
Empirical studies in PDE-constrained and inverse problems show rapid regret reduction and scalable performance in complex function spaces.

Neural Thompson Sampling is a class of algorithms that extend Thompson Sampling—a posterior sampling strategy for sequential decision-making and stochastic optimization—into settings where the underlying reward or response function is modeled with neural networks or neural operator architectures. These approaches enable exploration in high-dimensional, non-linear, and infinite-dimensional domains where classical Bayesian methods (e.g., Gaussian Process surrogates) are computationally infeasible or statistically deficient. Neural Thompson Sampling (NTS) covers a broad taxonomy of methodologies, ranging from finite-dimensional contextual bandits, functional optimization over operator-valued domains, and combinatorial settings, to recent developments in neural operator surrogates for function-space bandits (Oliveira et al., 27 Jun 2025).

1. Mathematical Problem Setting and Motivation

The classical Thompson Sampling framework considers an agent who, at each round $t$ , observes context data, selects an action from a discrete (or continuous) space, and receives a stochastic reward whose distribution is parameterized by an unknown function or operator (e.g., $G_*$ ). The Bayesian strategy is to condition on observed data, sample a hypothesis function from the posterior, and choose the action that is optimal under that sample. This optimally balances exploration and exploitation in theory.

However, in high-dimensional or function-space domains such as active design for PDE-governed systems or black-box optimization of functional outputs, direct Bayesian inference is often intractable. In these regimes, neural networks and neural operators provide powerful surrogate models for the unknown mapping $G_*$ . This motivates Neural Thompson Sampling: replacing the classical GP (Gaussian Process) surrogate and posterior sampling with neural-based posterior proxies that scale to expressive modern function classes.

A prototypical high-dimensional example is (Oliveira et al., 27 Jun 2025): the objective is

$f^* \in \underset{f \in \mathcal{S}}{\arg\max}\, \Phi(G_*(f))$

where $G_*:\mathcal{U} \to \mathcal{Y}$ is a black-box operator (often expensive to evaluate), $\mathcal{S} \subset \mathcal{U}$ is a discretized search set of input functions, and $\Phi: \mathcal{Y} \to \mathbb{R}$ is a cheap-to-evaluate, known functional.

2. Neural Operator Thompson Sampling: Algorithmic Framework

Neural Operator Thompson Sampling (NOTS) provides a general algorithmic template for function-space bandits and operator-driven optimization (Oliveira et al., 27 Jun 2025):

Initialization: The search space $\mathcal{S}$ and (optionally) an initial dataset of evaluated pairs $(f_i, y_i)$ are specified.
At each round $t=1,\ldots,T$ :

Sample neural operator parameters $\theta_{t,0}$ from a Gaussian prior $\mathcal{N}(0, \Sigma_0)$ .
Fit a surrogate operator $G_\theta$ via regularized empirical risk minimization:

$\theta_t = \arg\min_\theta \sum_{j=1}^{t-1} \| y_j - G_\theta(f_j) \|_{\mathcal{Y}}^2 + \lambda \|\theta\|_2^2$
Define the surrogate $G_t = G_{\theta_t}$ .
Sample-then-optimize: Select the next query as

$f_t \in \arg\max_{f \in \mathcal{S}} \Phi(G_t(f))$
Evaluate the oracle: observe $y_t = G_*(f_t) + \varepsilon_t$ .
Update the dataset with $(f_t, y_t)$ .

Crucially, in the infinite-width limit with last-layer-only training, each $G_t$ acts as an exact sample from a Gaussian process posterior; thus, the expensive step of explicit Bayesian uncertainty quantification is replaced by randomized initialization and network re-training.

3. Theoretical Guarantees in Infinite-Dimensional Settings

NOTS admits precise non-asymptotic regret guarantees. Under the following assumptions (Oliveira et al., 27 Jun 2025):

The search set $\mathcal{S}$ is finite and compact in the function space.
The true operator $G_*$ is distributed as a Gaussian process with operator-valued kernel $\mathcal{K}$ .
Observation noise is modeled as a GP.
Only the last linear layer of a single-hidden-layer, infinitely wide neural operator is trained.

In this regime, by infinite-width correspondence, the randomized neural operator $G_t$ is an exact posterior sample for the function $f \mapsto G_*(f)$ . For any bounded linear functional $\Phi$ , the process $f \mapsto \Phi(G(f))$ inherits a scalar GP structure

$k_\Phi(f, f') = \Phi^T \mathcal{K}(f, f') \Phi$

and the Bayesian cumulative regret satisfies

$R_T = \mathcal{O}\left(\sqrt{T \gamma_{T}} \right)$

where $\gamma_T$ is the maximal information gain of the scalar GP.

The convergence dimension $\gamma_T$ captures the complexity of the search domain under the chosen kernel and functional: if $\gamma_T = o(T)$ , the average regret $\to 0$ as $T \to \infty$ .

4. Empirical Performance and Benchmarking

NOTS has been empirically validated on challenging operator-driven optimization problems (Oliveira et al., 27 Jun 2025):

Darcy flow (PDE-constrained): Optimizing over 1,000 random masks (16 $\times$ 16 grid), with functionals such as total flow, power, and pressure.
Shallow-water on sphere: High-dimensional (6,144-dimensional) inverse problem optimization over initial conditions (32 $\times$ 64 grid).

Key empirical findings:

GP-based Bayesian Optimization baselines degrade rapidly at input dimensions above a few hundred.
Sample-then-optimize neural (MLP) surrogates yield better, yet suboptimal, regret decay.
NOTS achieves rapid regret reduction ( $\mathcal{O}(T^{1/2})$ ) and is the only method to make progress on extremely high-dimensional tasks.

NOTS scales gracefully with dimension owing to the architectural inductive bias of neural operators such as the Fourier Neural Operator (FNO), which is the practical backbone in these studies.

5. Methodological Insights, Strengths, and Limitations

Strengths

Infinite-Dimensional Scalability: Neural operator architectures and NOTS are tailored for optimization over spaces of functions, facilitating application to PDEs, control, and scientific computing.
Uncertainty Implicit via Randomization: No explicit Bayesian layers are required; randomness in initialization and last-layer training suffices for approximate posterior sampling.
Flexible Objective Handling: Composite objectives, i.e., $\Phi \circ G$ , are naturally supported for linear $\Phi$ .

Limitations

Theoretical Scope: Rigorous guarantees presently hold for single hidden layer, infinite width, last-layer training, linear functionals, and finite search sets.
Finite-Width and Nonlinear Extensions: For deeper neural operators or nonlinear functionals (e.g., $L^2$ -norms), the exact posterior sampling property breaks. Deviations are quantified as $\mathcal{O}(1/\sqrt{\text{width}})$ .
Prior Misspecification: The framework assumes the operator prior matches the true generative mechanism; mismatches can degrade the quality of posterior samples.

Practical considerations include setting regularization $\lambda$ to the observation noise variance, using typical architectures (e.g., FNO with 64 lifted modes and 4 kernel layers), and initializing weights appropriately.

6. Extensions and Open Research Directions

Open problems and promising directions include:

Continuous $\mathcal{S}$ and Infinite-Domain Optimization: Extending NOTS to infinite search spaces, potentially requiring discretization methods or sampling strategies in function space.
Batch and Parallel Thompson Sampling: Leveraging NTK-based GP surrogates to allow multiple simultaneous queries in parallel hardware settings.
Deeper Operator Theory: Developing GP correspondences and regret bounds for multi-layer (deep) neural operators.
Nonlinear or Shape-Constrained Functionals: Generalizing convergence guarantees to settings where the objective $\Phi$ is nonlinear or subject to shape constraints.

7. Position within the Landscape of Neural Posterior Sampling

NOTS is a recent step in a continuum of Neural Thompson Sampling approaches:

In classical NTS for contextual bandits, the posterior over rewards is constructed using the neural tangent kernel (NTK), yielding regret rates of $\widetilde{\mathcal{O}}(\sqrt{T})$ when the reward function is in the NTK-RKHS (Zhang et al., 2020).
Other variants replace explicit GP posteriors with variational or bootstrap-based surrogates (e.g., deep ensembles (Lu et al., 2017), Bayesian bootstrap (Osband et al., 2015)).
Recent sample-then-optimize neural TS methods (Dai et al., 2022) leverage the GP-NTK correspondence for efficient, batch selection in settings where functional evaluations are possible in parallel.
NOTS uniquely treats entire operator-valued mappings as the latent variable, thus expanding TS to optimization over function spaces beyond finite-dimensional inputs.

In summary, Neural Operator Thompson Sampling synthesizes neural operator frameworks and probabilistic sampling over function spaces to enable tractable, theoretically sound Thompson sampling on infinite-dimensional active learning tasks—a regime that poses fundamental challenges for traditional Bayesian optimization and kernel learning methods (Oliveira et al., 27 Jun 2025).

Markdown Upgrade to Chat

References (5)

Thompson Sampling in Function Spaces via Neural Operators (2025)

Neural Thompson Sampling (2020)

Ensemble Sampling (2017)

Bootstrapped Thompson Sampling and Deep Exploration (2015)

Sample-Then-Optimize Batch Neural Thompson Sampling (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Neural Thompson Sampling.