Bayesian Approximation Error Methods

Updated 25 May 2026

Bayesian Approximation Error methods are statistical approaches that model and correct surrogate model discrepancies by adding a probabilistic error term.
They employ techniques like Monte Carlo sampling and Taylor-based variance reduction to efficiently estimate error statistics and improve inference accuracy.
BAE methods enhance scalability and robustness in high-dimensional and PDE-constrained inverse problems, preventing overconfident predictions from simplified models.

A Bayesian Approximation Error (BAE) method is a statistical approach to modeling, quantifying, and propagating the uncertainty introduced by using approximate (surrogate or coarse) forward models in Bayesian inverse problems, optimal experimental design, and related inference tasks. The BAE paradigm augments the standard Bayesian framework by explicitly incorporating an additive model-discrepancy term, which reflects the difference between the accurate, computationally demanding model and a fast, inexact surrogate. Typically, the induced approximation error is treated probabilistically—most often by a Gaussian, whose mean and covariance are estimated with respect to the joint prior on the problem parameters (or, in certain strategies, with respect to a posterior-informed distribution). The resultant corrected likelihood, coupled with modified uncertainty quantification and design objectives, enables rigorous bias and variance correction for surrogate-accelerated inference, as well as scalable and robust solutions to high-dimensional or PDE-constrained inverse problems.

1. Formal Structure of BAE Methods

The core of the BAE framework is the decomposition of the data-generation process as

$y = G(v) + e = F(v) + \varepsilon(v) + e,$

where $G$ is the accurate forward map, $F$ is a computationally efficient surrogate, $v$ parameterizes the system, $e$ is measurement noise, and $\varepsilon(v) = G(v) - F(v)$ encapsulates approximation or model error (Koval et al., 2024, Calvetti et al., 29 Apr 2026, Alexanderian et al., 2022). The corresponding likelihood becomes

$\pi(y|v) \propto \exp\left(-\tfrac{1}{2} \| y - F(v) - \mu_\varepsilon \|^2_{\Gamma_\nu^{-1}} \right),$

where $\mu_\varepsilon$ , $\Gamma_\varepsilon$ are the mean and covariance of the approximation error and $\Gamma_\nu = \Gamma_\varepsilon + \Gamma_e$ combines model- and measurement-uncertainty.

The joint prior over $G$ 0 (and any auxiliary/nuisance parameters) is used to marginalize the error statistics, which can be efficiently estimated via Monte Carlo or, for problems governed by PDEs, via linearized (or higher-order Taylor) expansions and corresponding control variates (Nicholson et al., 5 Dec 2025, Alexanderian et al., 2022).

2. Estimating Error Statistics: Sampling and Variance Reduction

The estimation of the first two moments of the error, $G$ 1 and $G$ 2, is a critical offline stage. Several computational strategies exist:

Monte Carlo sampling: Draw i.i.d. prior or posterior-informed samples. Compute $G$ 3, and form empirical mean/covariance (Koval et al., 2024, Calvetti et al., 29 Apr 2026, Maclaren et al., 2018, Zhang et al., 2018).
Taylor-based variance reduction: For high-dimensional parameter fields, linear (or quadratic) Taylor expansions of both forward and surrogate models with respect to the parameter at the prior mean yield analytical approximations for the mean and covariance. These can be used as control variates, dramatically reducing the variance and sample requirements for accurate error estimation (Nicholson et al., 5 Dec 2025).
Composite and posterior-informed variants: Error statistics may be estimated from prior samples (prior-based), from joint posterior-predictive distributions (composite), or using samples drawn from a naive, surrogate-only MCMC posterior (posterior-informed), each variant trading off efficiency and double-use of the data (Maclaren et al., 2018).

3. Corrected Likelihoods, Posteriors, and Marginals

Having characterized the model discrepancy stochastically, the resulting posterior reflects both measurement and modeling uncertainty. For additive, Gaussian error models (the most common practical setting), the corrected likelihood is of the form

$G$ 4

and the Bayesian posterior over parameters is, for Gaussian priors, again tractable: $G$ 5 (Calvetti et al., 29 Apr 2026, Koval et al., 2024, Alexanderian et al., 2022). For nonlinear $G$ 6, a local Laplace (Gaussian) approximation around the MAP can be used (Alexanderian et al., 2022).

In hierarchical and marginalized settings, the correction propagates through both the model parameters and nuisance variables, yielding marginal posteriors whose covariance structure incorporates the effects of both sources of uncertainty (Koval et al., 2024, Alexanderian et al., 2022).

4. Implications for Optimal Experimental Design and Inverse Problems

BAE corrections are crucial in experimental design and high-dimensional inverse problems:

A-optimal sensor and experiment design: The trace of the posterior covariance under the BAE-corrected likelihood defines uncertainty-aware design objectives (e.g., A-optimality). In linearized or affine-surrogate settings, it can be shown that the posterior—and thus the objective—is invariant to the particular surrogate as long as the error statistics are treated rigorously, permitting non-intrusive, black-box design algorithms (Koval et al., 2024).
Scalability: All leading approaches decouple the forward and surrogate modeling from the subsequent inference given precomputed statistics, leading to computational costs that depend primarily on the number of design variables, not the underlying discretization or parameter space dimension (Alexanderian et al., 2022, Nicholson et al., 5 Dec 2025).

A table summarizing key algorithmic steps follows:

Step	Action	Main Reference
1. Generate samples	Draw from prior or naive posterior	(Koval et al., 2024, Maclaren et al., 2018)
2. Forward evaluations	Compute $G$ 7, $G$ 8	(Koval et al., 2024, Nicholson et al., 5 Dec 2025)
3. Estimate error moments	Compute means/covariances of $G$ 9	(Nicholson et al., 5 Dec 2025, Calvetti et al., 29 Apr 2026)
4. Setup corrected model	Form likelihood/posterior using $F$ 0	(Koval et al., 2024, Alexanderian et al., 2022)
5. Design/inference phase	Solve for MAP, posterior, or design using surrogate + error model	(Koval et al., 2024, Alexanderian et al., 2022)

Offline sampling dominates cost, but is amenable to parallelization, and the subsequent design or inference tasks are carried out with surrogates plus precomputed corrections.

5. Accuracy Bounds, Limitations, and Pathologies

The BAE-corrected posterior is Lipschitz continuous (in the Hellinger metric) with respect to the Lᵖ-norm of the misfit- or forward-model error, under mild integrability assumptions. Quantitatively, if the model error vanishes in norm, the corrected posterior converges to the true one at a rate determined by the error norm (Lie et al., 2019). A practical implication is that coarse surrogates may be safely used provided the error statistics are well controlled and the total error in the likelihood remains within required bounds.

Known limitations and caveats include:

The Gaussian error assumption is a Laplace approximation; strong nonlinearity or multimodal, skewed discrepancy may degrade effectiveness (Calvetti et al., 29 Apr 2026, Alexanderian et al., 2022).
The so-called “enhanced” error model, assuming independence of error and parameters, can fail if there is significant parameter-error correlation (Calvetti et al., 29 Apr 2026).
Posterior-informed error estimation introduces double-use of data, potentially underestimating uncertainty (Maclaren et al., 2018).
The cost of offline sampling can be considerable when the accurate forward model is expensive or high-dimensional; Taylor-based control variates and adaptive or hierarchical updating can alleviate this (Nicholson et al., 5 Dec 2025, Calvetti et al., 29 Apr 2026).
When both design and inference stages ignore model error, inference can be severely overconfident and point estimates can be far from ground truth (Alexanderian et al., 2022).

6. Applications and Methodological Variants

BAE methods have been applied extensively in Bayesian inference for PDE-constrained inverse problems, optimal sensor placement, and model reduction:

Surrogate modeling with explicit error correction: For hydrological, geophysical, and engineering systems, surrogate models (e.g., polynomial chaos expansions, Gaussian processes) are paired with BAE-corrected likelihoods to permit computationally tractable but unbiased inversion (Zhang et al., 2018).
Variance reduction and control variates: In high-dimensional or PDE-constrained problems, Taylor or Hessian-based control variate techniques reduce the number of fine-model runs by orders of magnitude, making BAE approaches practical at scale (Nicholson et al., 5 Dec 2025).
Connection to spotlight inversion and randomized sketching: The BAE paradigm subsumes certain deterministic approaches, such as spotlight inversion, under a statistical framing, and is closely related to low-rank or randomized linear algebra “priorsketching” methods for uncertainty quantification (Calvetti et al., 29 Apr 2026).

7. Summary and Perspective

The Bayesian Approximation Error framework offers a unified and principled solution to the challenge of surrogate-accelerated Bayesian inference and experimental design under model discrepancy. By explicitly propagating modeling errors, it restores unbiased estimates and valid uncertainty quantification in settings where computational constraints preclude the use of “full-fidelity” models. The BAE paradigm encompasses a spectrum of implementation strategies—including offline sampling, Taylor-based variance reduction, hierarchical updating, and marginalized error modeling—each adaptable to the demands of large-scale, nonlinear, high-dimensional, and PDE-constrained inversion or design problems. Contemporary numerical results consistently highlight that neglecting model error leads to overconfident and sometimes grossly inaccurate inferences, while BAE methods yield robust, uncertainty-aware, and scalable solutions (Koval et al., 2024, Alexanderian et al., 2022, Calvetti et al., 29 Apr 2026).