Differentiable & Bayesian Optimization

Updated 3 May 2026

Differentiable and Bayesian Optimization is an emerging framework that fuses gradient-based methods with probabilistic surrogates to enhance global search in complex objective landscapes.
It leverages automatic differentiation and gradient-informed surrogates to speed up acquisition optimization and improve predictive uncertainty quantification.
This approach enables scalable and robust optimization in high-dimensional, constrained, or simulation-based problems by unifying global Bayesian search with local refinement techniques.

Differentiable and Bayesian Optimization encompasses a growing intersection of methodologies where the classic sample-efficient global optimization framework of Bayesian optimization (BO) is enhanced by leveraging gradient-based computation, automatic differentiation, and probabilistic surrogates. These advances enable faster, more scalable, and more robust optimization of complex objective functions, often under constraints, model misspecification, discrete simulation logic, or uncertainty quantification requirements. This area covers both the development of new differentiable acquisition functions, the use of gradient-enhanced surrogates, and the unification of global-search BO approaches with differentiable simulation and variational inference machinery.

1. Fundamentals of Differentiable Bayesian Optimization

Bayesian optimization models a black-box objective function $f:\mathbb{R}^d\rightarrow\mathbb{R}$ using a probabilistic surrogate, typically a Gaussian process (GP) or increasingly, a Bayesian neural network (BNN), to guide data-efficient selection of new queries. Classical BO operates through the following closed loop:

Place a prior $f \sim \mathcal{GP}(m, k)$ or a BNN prior on $f$ .
Given $n$ observations $\{(x_i, y_i)\}$ , condition the surrogate to obtain a posterior mean $\mu_n(x)$ and variance $\sigma_n^2(x)$ .
Define an acquisition function $\alpha(x)$ (e.g., Expected Improvement, Lower Confidence Bound, etc.).
Identify $x_{n+1} = \arg\max_{x} \alpha(x)$ (for maximization), evaluate $f(x_{n+1})$ , and update the surrogate.

Differentiability enters BO at several levels:

Acquisition functions can be made differentiable with respect to $f \sim \mathcal{GP}(m, k)$ 0 and/or acquisition hyperparameters.
If the surrogate incorporates gradient (and even Hessian) data, the posterior updates and acquisition landscapes exploit this for sharper uncertainty quantification and faster convergence (Wu et al., 2017, Makrygiorgos et al., 14 Apr 2025, Ament et al., 2022).
Global optimization of acquisition functions can use first- or higher-order methods rather than random search or derivative-free optimization.
For simulation-based or likelihood-intractable objectives, differentiable programming enables pathwise gradients through the entire modeling/optimization stack (Quera-Bofarull et al., 2023, Antonova et al., 2022).

2. Incorporation of Gradient Information

Incorporation of derivative information, when available, yields substantial practical improvements in BO:

GP Surrogates with Gradients: Conditioning GPs not only on function values but also on (possibly noisy or directional) gradient observations produces a joint GP over $f \sim \mathcal{GP}(m, k)$ 1, with block-partitioned kernel incorporating all cross-covariances (Wu et al., 2017). Structured AD techniques enable efficient matrix-vector operations for these high-dimensional surrogates, scaling as $f \sim \mathcal{GP}(m, k)$ 2 for the gradient blocks and $f \sim \mathcal{GP}(m, k)$ 3 for Hessians (Ament et al., 2022).
Gradient-informed BNNs: Bayesian neural networks can be extended to incorporate local gradient labels in the likelihood term, training on both $f \sim \mathcal{GP}(m, k)$ 4 and $f \sim \mathcal{GP}(m, k)$ 5. The loss is

$f \sim \mathcal{GP}(m, k)$ 6

where $f \sim \mathcal{GP}(m, k)$ 7 penalizes the mismatch between observed and predicted gradients. Reverse-mode AD is used for both the surrogate and acquisition optimization (Makrygiorgos et al., 14 Apr 2025).

Acquisition Function Optimization: Fully differentiable surrogate models (in GPs or BNNs) enable the application of L-BFGS, Adam, or other first/second-order optimizers to $f \sim \mathcal{GP}(m, k)$ 8. Hybrid approaches exploit the analytic gradient for rapid exploitation, or fallback to non-gradient strategies when the landscape is rugged (Antonova et al., 2022).

Experiments and theoretical results demonstrate that conditioning on $f \sim \mathcal{GP}(m, k)$ 9-dimensional gradients per query (in addition to function values) sharpens the local surrogate posterior, reducing predictive uncertainty and accelerating regret decay as $f$ 0 increases (Wu et al., 2017, Makrygiorgos et al., 14 Apr 2025, Ament et al., 2022).

3. Differentiable Acquisition Functions and Global Optimization

Several acquisition functions central to BO have been made fully differentiable:

q-Expected Hypervolume Improvement (qEHVI):

$f$ 1

Here, inclusion–exclusion combinatorics and MC sample reparameterization define HVI as a differentiable function of $f$ 2, enabling gradient-based maximization for batched, multi-objective BO (Daulton et al., 2020).

Rollout/Non-Myopic Acquisition via MDP Formulation: BO can be cast as a finite-horizon Markov decision process, with parametric policies $f$ 3 over sampling locations. Policy gradients (score function and reparameterization) enable learning of non-myopic sampling policies that account for several steps of lookahead (Nwankwo et al., 2024).
Bayesian Experimental Design EIG via Differentiable Surrogates: For design or simulator calibration, information gain (e.g., in Kullback–Leibler divergence) can be differentiated w.r.t. design variables by backpropagating through ensemble Kalman inversion or variational inference, allowing efficient experimental design even with high-dimensional discrepancy models (Yang et al., 29 Apr 2025, Quera-Bofarull et al., 2023).
Differentiable Quantile-Based Acquisition Functions: For hybrid (white-box/black-box) constraints, upper quantile bounds of composite random variables are estimated via MC sample-averaging with soft-sort, producing a differentiable surrogate for acquisition optimization (Lu et al., 2023).

AD-enabled acquisition optimization produces orders-of-magnitude speedups in wall-time and improved solution quality relative to classic randomized optimizers (e.g., CMA-ES, DIRECT), particularly in high dimensions or parallelized batch settings (Daulton et al., 2020, Jingzhe et al., 9 Mar 2026).

4. Extensions: Constrained, Multi-objective, and Structured Search

Recent work leverages differentiable optimization for more complex Bayesian optimization variants:

Constrained BO with Differentiable Surrogates: Penalization or predictive mean-based surrogate reformulation allows gradient-based steps towards constraint satisfaction. LCBO alternates between projected gradient descent on a smooth penalty-augmented surrogate and information-driven exploration to reduce gradients' posterior variance—guaranteeing convergence rates that scale polynomially with problem dimension under mild assumptions. This enables tractable high-dimensional constrained BO (Jingzhe et al., 9 Mar 2026).
Sparse and Structured Solutions: Differentiable homotopy relaxations to exact $f$ 4 penalties (via temperature-annealed or smooth surrogates) provide a mechanism to induce sparse solutions within the BO loop, allowing simultaneous exploration of accuracy-sparsity Pareto fronts via hypervolume improvement in multi-objective BO (Liu et al., 2022).
Global Optimization with Differentiable Simulation: Rugged, high-dimensional, nonsmooth objective landscapes (e.g., robot or physics simulation) can be confronted via hybrid methods combining BO (for global exploration) and local gradient-based optimization. Differentiable simulation kernels yield pathwise gradient information, while Bayesian surrogates help the optimizer recover from vanishing or noisy gradients (Antonova et al., 2022).
Bayesian Experimental Design under Discrepancy: When predictive physical models are structurally misspecified, differentiable programming through discrepancy models (e.g., neural-network surrogates) and BED utility metrics (ensemble-based KL divergence) enable tractable, gradient-based search over high-dimensional design spaces (Yang et al., 29 Apr 2025).
Multi-dimensional Binning by Differentiable or Bayesian Search: In high-energy physics and other classification settings, flexible bin boundaries (e.g., GMM parametrizations) can be learned to maximize significance or power by gradient-based optimization (GATO) or Bayesian search (BOBR), with differentiable loss proxies and constraints (Erdmann et al., 12 Jan 2026).

5. Unified Frameworks and Algorithmic Procedures

The general pattern in differentiable and Bayesian optimization is the interplay of the following algorithmic building blocks:

Stage	Differentiability Enabler	Example Methods
Surrogate	Gradient-enhanced GP, BNN	(Wu et al., 2017, Makrygiorgos et al., 14 Apr 2025, Ament et al., 2022)
Acquisition	MC pathwise/reparam., AD	qEHVI (Daulton et al., 2020), differentiable quantile UCB (Lu et al., 2023)
Constraints	Penalty surrogates, AD	LCBO (Jingzhe et al., 9 Mar 2026), CUQB (Lu et al., 2023)
Design	AD through BED/VI/EKI	AD-EKI for EIG (Yang et al., 29 Apr 2025), GVI for calibration (Quera-Bofarull et al., 2023)
Sampling	Rollout/MDP policy gradient	Non-myopic BO (Nwankwo et al., 2024)
Structure	Group/feature sparsity, AD	SEBO (Liu et al., 2022)
Simulation	Reverse-mode AD	Differentiable Sim+BO (Antonova et al., 2022)

Algorithmic optimization loops frequently employ stochastic optimization for parameter or design updates, combining MC sampling, reparameterization, and high-throughput parallelization (e.g., on GPUs).

Pseudocode for differentiable BO loops typically follows:

Fit surrogate on all available function (and gradient) data.
Optimize differentiable acquisition function using gradient-based methods.
Query the true function at proposed points, augment data.
Repeat until convergence or budget exhaustion.

Calibration, design, or simulation pipelines may alternately backpropagate through outer- and inner-loop objectives, with, e.g., variational flows, normalizing flows, or differentiable solvers for stochastic programs (Quera-Bofarull et al., 2023, Yang et al., 29 Apr 2025, Lahoud et al., 2024).

6. Benchmarking, Empirical Results, and Limitations

Across synthetic benchmarks (Branin, Hartmann, DTLZ2, Ackley, Griewank, Rastrigin), high-dimensional problems ( $f$ 5– $f$ 6), and real-world settings (epidemiological ABMs, high-energy physics binning, truss and policy optimization), differentiable BO approaches consistently outperform both naive BO and derivative-free optimizers in sample efficiency, wall time, and final regret/convergence metrics (Wu et al., 2017, Daulton et al., 2020, Yang et al., 29 Apr 2025, Jingzhe et al., 9 Mar 2026, Erdmann et al., 12 Jan 2026).

Key findings include:

Gradient-informed surrogates accelerate convergence and outperform zeroth-order counterparts as $f$ 7 increases (Makrygiorgos et al., 14 Apr 2025, Ament et al., 2022).
qEHVI and differentiable MC-acquisition schemes enable batched, multi-objective, and constrained BO to remain tractable, scalable, and fast (Daulton et al., 2020, Lu et al., 2023).
Differentiable simulation + BO pipelines can overcome the limitations of pure gradient or pure surrogate-only optimizers in nonconvex, rugged objective landscapes (Antonova et al., 2022).
Theoretical regret and constraint-violation bounds extend to new classes of acquisition functions when differentiability is preserved throughout (Lu et al., 2023, Liu et al., 2022, Jingzhe et al., 9 Mar 2026).
Fully-differentiable frameworks scale to high parameter and design dimensions (10³–10⁷), leveraging AD and matrix-structure (Ament et al., 2022, Yang et al., 29 Apr 2025).

Limitations are also documented:

Differentiability may be hampered by model or kernel choice, especially in highly non-smooth or nonsmooth simulation logic. Remedies involve smooth surrogates, penalty relaxations, or hybrid local/global search (Antonova et al., 2022, Quera-Bofarull et al., 2023).
BNN surrogates rely on sufficient data and appropriate regularization ( $f$ 8) for stable learning (Makrygiorgos et al., 14 Apr 2025).
Memory and compute overhead rises with deep AD stacks or large ensembles; checkpointing and low-rank structure exploitations are crucial for practicality (Yang et al., 29 Apr 2025, Ament et al., 2022).
For global, multi-modal landscapes, local minima in acquisition optimization can limit true global convergence; multi-start or hybrid evolutionary approaches are often used as a remedy (Erdmann et al., 12 Jan 2026).

7. Outlook and Future Opportunities

Differentiable and Bayesian optimization unifies global uncertainty-aware search with efficient gradient-based local refinement and enables the rigorous integration of physics, domain-specific constraints, and robust statistical inference into optimization loops.

Active lines of progress include:

Extension to meta-learning and adaptive experiment design in high-dimensional and structure-rich environments (Yang et al., 29 Apr 2025).
Handling model misspecification and scenario uncertainty through robust divergence, composite likelihoods, and variational flows (Quera-Bofarull et al., 2023, Lahoud et al., 2024).
Fully differentiable bilevel and nested architecture optimization (e.g., for neural architecture search, modular simulators, design of designs).
Methods for automatic acquisition parameterization and gradient-based tuning of acquisition hyperparameters (Nwankwo et al., 2024).
Hybrid sampling and continuous variable strategies for complex design/parameter search spaces and acquisition landscapes (Liu et al., 2022, Erdmann et al., 12 Jan 2026).
Hardware-accelerated, parallel-scale, and structure-exploiting implementations for application in engineering, scientific computing, and control (Daulton et al., 2020, Ament et al., 2022).

Current research indicates the tight integration of differentiable programming, Bayesian modeling, and global sequential optimization will continue to expand the class of tractable, reliable, and interpretable optimization problems. The emerging toolkit leverages advances in automatic differentiation, scalable Bayesian inference, and surrogate-based global search—paving the way for new capabilities in complex system calibration, experiment design, policy optimization, statistical learning, and scientific discovery.