Sequential Bayesian Design for Locally Accurate Surrogates
- The paper introduces an adaptive sequential strategy that focuses surrogate model accuracy in high-probability posterior regions to reduce computational costs.
- It employs coarse initialization and iterative retraining using informative samples from updated posterior distributions.
- Empirical results on PDE-constrained inverse problems demonstrate significant reductions in expensive model evaluations and improved estimation accuracy.
Sequential Bayesian Design for Locally Accurate Surrogate (SBD-LAS) refers to a class of methodologies for constructing surrogate models that accurately approximate the response of complex, expensive-to-evaluate simulators, specifically in regions of the input space where the true likelihood or posterior is concentrated. SBD-LAS aims to dramatically reduce the computational cost of Bayesian inference or optimization by focusing surrogate model accuracy where it is needed most—typically around the high-probability region for the parameter posterior in an inverse problem—rather than investing significant modeling effort to achieve global accuracy. This strategy leverages adaptive, sequential experimental design, updating the surrogate and the posterior iteratively as new, informative samples are selected. The approach is particularly suited for high-dimensional, computationally intensive problems such as PDE-constrained inverse problems, where globally accurate surrogates are generally infeasible with limited data and computational resources (Wang et al., 23 Jul 2025).
1. Problem Setting and Motivation
In many scientific and engineering contexts, inverse problems governed by partial differential equations (PDEs), complex physical simulators, or stochastic models require solving for unknown parameters given observed data. Bayesian methods provide a principled framework for such inference, yielding a posterior distribution over parameters via
The computation of the likelihood typically involves running an expensive forward model (e.g., a high-fidelity PDE solver). The computational bottleneck arises because both global surrogate modeling (accurate across the entire input space) and direct posterior sampling (e.g., via MCMC, often requiring millions of model runs) are intractable in high dimensions or with limited computational budgets.
The motivation behind SBD-LAS is to construct a surrogate that is only required to be accurate in regions where the posterior is non-negligible—i.e., for values of with high . This local focus drastically reduces the amount of required data and model complexity, thus enabling efficient, accurate inference and decision-making (Wang et al., 23 Jul 2025).
2. Locally Accurate Surrogate Modeling
The surrogate in SBD-LAS is a function , trained only on samples drawn from the high-probability posterior region. The surrogate likelihood is then modeled as
where is the (possibly noise-inflated) data covariance. The initially unknown high-probability region is discovered adaptively by leveraging the posterior samples from previous design stages.
Surrogate construction typically proceeds as follows:
- Start with a coarse (cheap) solver for to cover the prior broadly.
- Augment or correct the coarse predictions using a data-driven model (neural network, operator net, etc.) trained on data from the posterior region.
- Retrain the surrogate at each design stage using informative points from the updated posterior, ensuring surrogate accuracy in regions where it impacts the inference.
This process allows for use of lower model complexity and smaller training data sets than would be necessary for global accuracy, leading to computational efficiency and model adaptivity.
3. Sequential Bayesian Design Strategy
Since the high-probability region of the likelihood is not known at the outset, SBD-LAS employs a sequential experimental design (adaptive sampling) strategy:
- Initialization: Use the prior or a coarse approximation to propose initial training points and construct the first surrogate.
- Posterior Update: Given the surrogate likelihood, compute the approximate posterior
where denotes the iteration.
- Prior Transfer and Predictive Acceleration: For the next iteration, set the prior to be a Gaussian approximation of the current posterior, possibly using a “one-step ahead” linear prediction:
where are the mean and covariance of the posterior samples, and is a step-size parameter.
- Resampling and Retaining Efficiency: Draw new training points from the updated prior and retrain the surrogate.
- Termination: Repeat this procedure until convergence (e.g., posterior mean/covariance stabilize, or a desired surrogate accuracy is met).
This strategy ensures that surrogate refinement and posterior exploration concentrate computational effort on regions where the surrogate’s accuracy has the greatest impact on the Bayesian inference.
4. Algorithmic Framework
The SBD-LAS algorithm can be summarized in the following steps (Wang et al., 23 Jul 2025):
- Coarse Initialization: Initialize with samples from a coarse solver, forming the first prior and surrogate.
- Iterative Loop:
- Use the current surrogate to compute the posterior.
- Approximate the posterior with a Gaussian and predict the one-step ahead prior.
- Resample points from the new prior, retrain the surrogate using the new data.
- Continue until stopping criteria are met (e.g., small improvement between iterations).
- Final Inference: Use MCMC or another sampling method with the final locally accurate surrogate to estimate the posterior.
This iterative update, utilizing predictive prior acceleration when suitable (), was found to speed up convergence in experiments, especially as the posterior contracts.
5. Empirical Performance and Demonstrations
SBD-LAS was demonstrated on inverse problems involving the Darcy flow equation, a prototypical PDE-constrained problem in fluid mechanics:
- Complicated coefficient field: For a permeability field parameterized over a grid, SBD-LAS achieved lower inversion error (mean squared error) compared to coarse-solver or fine-solver-only approaches, with two orders of magnitude fewer calls to the fine solver.
- Multi-peak fields and interface problems: The method accurately recovers high-frequency features and sharp interfaces, with the predictive acceleration strategy ( or $0.5$) leading to faster convergence than simpler updates.
- High dimensionality and noise robustness: With dimensions up to 400 and realistic noise, SBD-LAS still achieved competitive inversion results at a fraction of the computational cost.
These results demonstrate that SBD-LAS is able to judiciously allocate computational resources, focusing high-fidelity simulations where they are most beneficial, and thus enabling accurate solution of inverse problems previously deemed computationally intractable.
6. Applications, Extensions, and Implications
SBD-LAS is applicable to a wide range of Bayesian inversion problems governed by expensive forward models:
- Scientific computing: Groundwater modeling, reservoir engineering, geoscience, and contaminant transport where Darcy-type PDEs are fundamental.
- Medical imaging: Applications where data acquisition is expensive and model evaluations are slow (e.g., MRI, tomography).
- Other PDE-constrained inverse problems: Anywhere Bayesian inference is used with computationally intensive simulation.
The method integrates naturally with MCMC and variational inference algorithms, and can leverage diverse types of surrogates (e.g., neural operators, deep ONets). Its use of iterative, local surrogate refinement and sequential Bayesian design provides a template for efficient, scalable Bayesian inference in high-dimensional and tightly constrained domains.
A key implication of SBD-LAS is the practical feasibility of Bayesian inversion with modest computational budgets via intelligent allocation of simulation and model complexity. Because updates use information-targeted sampling in the input space, the method is robust to the curse of dimensionality and the need for global model expressivity is largely bypassed. This has significant potential for extending fully Bayesian approaches to settings that were previously only tractable under restrictive, simplified models or with limited uncertainty quantification.
Summary Table: SBD-LAS Algorithmic Steps
Step | Description |
---|---|
Initialization | Train coarse surrogate, set prior from coarse solver |
Posterior Update | Compute posterior with current surrogate |
Prior Update | Transfer posterior or apply one-step ahead Gaussian prediction |
Sampling | Draw new training points from updated prior |
Surrogate Update | Retrain surrogate model locally |
Iteration | Repeat until convergence |
Final Inference | Use surrogate for full Bayesian inversion (e.g., MCMC) |