Amortized Bayesian Inference (ABI)

Updated 17 December 2025

Amortized Bayesian Inference (ABI) is a simulation-based framework that uses neural surrogates to approximate Bayesian posterior densities and model evidences.
ABI enables rapid inference by training a single network to amortize simulation costs, providing instantaneous estimates for both posteriors and model comparisons.
Empirical evaluations show that ABI, especially when augmented with self-consistency regularization, achieves near-exact marginal likelihood estimates and robust performance under out-of-distribution shifts.

Amortized Bayesian Inference (ABI) is a simulation-based framework for approximating Bayesian posterior densities using neural surrogates trained on data simulated from a generative model. Its defining characteristic is that, once trained, a single inference network provides instantaneous posterior or model evidence estimates for new observations, amortizing the cost of training over subsequent inferences. ABI scales Bayesian workflows to otherwise intractable models but entails unique challenges regarding model misspecification, robustness, hierarchical modeling, and model comparison.

1. Conceptual Foundations and Formal Definitions

ABI targets the otherwise intractable posterior $p(z|y)$ —or, in model-comparison settings, $p(\theta_k|y,M_k)$ —by training a neural density estimator $q_\phi(z|y)$ or $q_\phi(\theta_k|y,M_k)$ on large numbers of simulated pairs $\{(z^{(n)},y^{(n)})\} \sim p(z,y)$ or $\{(\theta_k^{(n)},y^{(n)},M_k)\}$ . The canonical learning objective is to minimize a strictly proper scoring rule, typically the negative log-likelihood: $\hat\phi = \arg\min_\phi \frac{1}{N}\sum_{n=1}^N S(q_\phi(\cdot|y^{(n)}),z^{(n)}),$ where $S$ may be the log-score $S(q,z) = -\log q(z|y)$ . Under joint simulation, as $N \to \infty$ and $q_\phi$ becomes expressive, $q_\phi(z|y) \to p(z|y)$ (Kucharský et al., 16 Dec 2025).

For model comparison across $K$ candidate models $M_1,\ldots,M_K$ , the ABI surrogate specializes to $q_\phi(\theta_k|y, M_k)$ , with training data generated via the sequence $\theta_k^{(n)} \sim p(\theta_k|M_k)$ , $y^{(n)} \sim p(y | \theta_k^{(n)}, M_k)$ (Kucharský et al., 16 Dec 2025).

2. ABI Methods for Posterior and Model Comparison

Multiple amortized strategies exist for model comparison, differing in their surrogate target and inference approach:

Method	Surrogate	Marginal Likelihood Estimate/Output
NPE	$q_\phi(\theta_k\|y,M_k)$	$\log p(y\|M_k) \approx \log p(\tilde\theta\|M_k) + \log p(y\|\tilde\theta,M_k) - \log q_\phi(\tilde\theta \| y, M_k)$
NPLE	$q_\phi(\theta_k\|y,M_k),\ q_\psi(y\|\theta_k, M_k)$	$\log p(y\|M_k) \approx \log p(\tilde\theta\|M_k) + \log q_\psi(y\|\tilde\theta, M_k) - \log q_\phi(\tilde\theta \| y, M_k)$
NEE	$q_\kappa(y\|M_k)$	$q_\kappa(y\|M_k)$ (direct evidence estimate)
NPMP	$q_\delta(M_k\|y)$	$q_\delta(M_k\|y)$ , $\widehat{BF}_{ij} = q_\delta(M_i\|y)/q_\delta(M_j\|y)$

Here, NPE uses parameter-posterior surrogates, NPLE augments NPE with a likelihood surrogate, NEE directly approximates the evidence, and NPMP trains a classifier for model probabilities. Each method is trained with the respective simulation-based loss, e.g., $-\log q_\phi(\theta_k^{(n)}|y^{(n)},M_k)$ for NPE (Kucharský et al., 16 Dec 2025).

Empirical evaluation shows that NPE and NPLE, when coupled with analytic likelihoods, achieve near-exact log-marginal evidence even for out-of-distribution (OOD) data. By contrast, NEE and NPMP exhibit high error and instability under model misspecification (Kucharský et al., 16 Dec 2025).

3. Self-Consistency Principles and Losses

The self-consistency (SC) criterion arises by rearranging Bayes' rule: $p(y|M_k) = \frac{p(\theta_k | M_k)p(y | \theta_k, M_k)}{p(\theta_k | y, M_k)},$ which should be invariant for all $\theta_k$ . Plugging the ABI approximation,

$\ell(\theta_k, y) \equiv \log p(\theta_k | M_k) + \log p(y | \theta_k, M_k) - \log q_\phi(\theta_k | y, M_k),$

the variance of $\ell$ over posterior draws should be minimized for self-consistency. The SC regularizer (for $M$ unlabeled or empirical $y^{(m)}$ ) is: $L_{SC}(\phi) = \frac{\lambda_{SC}}{M} \sum_{m=1}^M \mathrm{Var}_{\tilde\theta \sim q_\phi(\cdot|y^{(m)}, M_k)}[\ell(\tilde\theta, y^{(m)})]$ This regularizer can be evaluated and minimized in practice using Monte Carlo posterior samples (Kucharský et al., 16 Dec 2025). When used with analytic likelihood $p(y | \theta_k, M_k)$ , the loss is strictly proper (i.e., minimized if and only if $q_\phi = p$ ); with neural likelihood surrogate, properness is generally lost (Kucharský et al., 16 Dec 2025, Schmitt et al., 2023, Mishra et al., 23 Jan 2025).

SC can be implemented in semi-supervised regimes, enabling the use of unlabeled or empirical data to stabilize the ABI surrogate under extrapolation. The benefit is especially pronounced when training and test observations fall outside the simulated region (Mishra et al., 23 Jan 2025).

4. Practical Implementation

SC-augmented ABI follows an iterative simulation-based pipeline:

Simulate labeled triplets $(\theta_k, y, M_k)$ and compute the negative log-posterior loss $-\log q_\phi(\theta_k | y, M_k)$ .
Draw a set of unlabeled (real or held-out) data $\{y^{(m)}\}$ .
For each $y^{(m)}$ , sample S posterior draws $\theta^{(s)} \sim q_\phi(\cdot|y^{(m)}, M_k)$ , compute the plug-in marginal-likelihood estimate, and calculate $\mathrm{Var}_s(\ell(\theta^{(s)}, y^{(m)}))$ .
Add the averaged SC penalty $L_{SC}$ to the base simulation loss, typically ramped up after a warm-up period.
Update network parameters via gradient descent.
Repeat until convergence (Kucharský et al., 16 Dec 2025).

This algorithm applies to both NPE and NPLE; in the latter case, the likelihood is replaced with the neural surrogate (Kucharský et al., 16 Dec 2025, Schmitt et al., 2023, Mishra et al., 23 Jan 2025).

5. Empirical Properties and Performance

Extensive benchmarking in (Kucharský et al., 16 Dec 2025) demonstrates that:

NPE achieves $<1\%$ relative error for in-distribution data and, with SC, reduces bias to nearly zero under OOD mean shifts.
For model comparison (Bayes factors) between two Gaussians, NPE/NPLE vastly outperform NEE and NPMP across all tested dimensions, with SC further reducing bias in OOD regimes.
In real-world cognitive and time-series datasets with gold-standard bridge-sampling references, NPE without SC systematically underestimates log-marginals by 5–10 nats; NPE+SC matches the gold standard within $<0.5$ nat.
When analytic likelihood is unavailable and a neural surrogate is used (NPLE), SC provides only limited and inconsistent improvement unless the likelihood network is exceedingly well trained (Kucharský et al., 16 Dec 2025).

6. Recommendations and Limitations

Recommended practice, as established in (Kucharský et al., 16 Dec 2025), is:

Prefer parameter-posterior-based methods (NPE) whenever analytic likelihoods are available.
Always augment simulation-based training with SC regularization on a carefully selected pool of empirical or held-out data, especially for model comparison under possible misspecification.
For intractable likelihoods, NPLE may be used as a fallback, but expects inconsistent benefits from SC unless the likelihood surrogate is iteratively refined.
Pure direct evidence (NEE) and classifier-based (NPMP) surrogates are not recommended for robust OOD model comparison.

SC regularization improves accuracy and robustness of ABI for model comparison primarily by constraining plug-in marginal likelihood estimates to be invariant to $\theta$ ; this is especially fundamental for reliable extrapolation outside the region spanned by the simulated data (Kucharský et al., 16 Dec 2025, Schmitt et al., 2023, Mishra et al., 23 Jan 2025).

Limitations noted include the necessity of likelihood or surrogate likelihood evaluation, increased computational cost from Monte Carlo variance estimation, and possible breakdown if both the posterior and likelihood surrogates are misspecified or severely undertrained (Mishra et al., 23 Jan 2025).

7. Connections to Broader ABI Methodology

Self-consistency principles play a central role in the theoretical foundations of ABI and simulation-based inference. The variance penalty on the Bayes-inverted marginal likelihood is strictly proper and remains so in joint supervised + self-consistent training. This penalization framework enables robust extrapolation to OOD and empirical data, supports semi-supervised training, and facilitates practical workflows for amortized model comparison (Kucharský et al., 16 Dec 2025, Schmitt et al., 2023, Mishra et al., 23 Jan 2025).

The modularity of ABI—where simulation-based surrogates for posteriors, likelihoods, evidences, or model probabilities can be flexibly deployed, and combined with regularization for self-consistency, robustness, and sensitivity—demonstrates the versatility of this paradigm for modern Bayesian workflows. These developments position ABI as a key enabling technology for scalable, robust, and efficient Bayesian model comparison and inference in complex and misspecified settings.