Multifidelity Neural Simulation-Based Inference

Updated 30 January 2026

Multifidelity neural simulation-based inference is a Bayesian framework that integrates multi-level simulator outputs to achieve accurate uncertainty quantification at reduced computational cost.
It employs methods like transfer learning, feature matching, and multilevel Monte Carlo to fine-tune neural surrogates using limited high-fidelity data.
Empirical results in fields such as cosmology and neuroscience show that MF-SBI significantly lowers simulation budgets while maintaining high posterior accuracy.

Multifidelity neural simulation-based inference (MF-SBI) encompasses a suite of Bayesian inference methodologies that exploit data from simulators at multiple levels of accuracy and computational cost. This class of approaches addresses the prevalent challenge in scientific and engineering domains where high-fidelity (HF) forward simulations are computationally prohibitive, but low-fidelity (LF) surrogates or approximations are inexpensive, though potentially biased or less informative. State-of-the-art frameworks for MF-SBI employ neural posterior estimation, multilevel Monte Carlo techniques, transfer learning, hierarchical latent variable modeling, and error-aware neural surrogates; these jointly answer the need for accurate, uncertainty-aware inference under strict simulation budgets (Thiele et al., 1 Jul 2025, Krouglova et al., 12 Feb 2025, Hikida et al., 6 Jun 2025, Saoulis et al., 27 May 2025, Choi et al., 13 Jun 2025, Wu et al., 2022, Dhulipala et al., 2022).

1. Problem Setting and Objectives

MF-SBI targets the estimation of an unknown parameter vector $\theta\in\Theta$ from observed data that are themselves generated (at least in silico) by a complex, often intractable, stochastic process. The conventional simulation-based inference (SBI) task—computing or approximating the posterior $p(\theta\mid x_\mathrm{obs})\propto p(x_\mathrm{obs}\mid\theta)p(\theta)$ given data $x_\mathrm{obs}$ —is severely restricted when $p(x\mid\theta)$ can only be sampled via a computationally demanding HF simulator $s_H(\theta)\to x_H$ , especially if each run requires extensive resources (e.g., full N-body cosmological simulations, three-dimensional CFD, or multi-compartment neuron models).

Multifidelity approaches address this bottleneck by combining rich, inexpensive samples from one or several LF simulators $s_L(\theta)\to x_L$ (or even hierarchies $G^0,G^1,\ldots,G^L$ of fidelities) with a limited set of HF data $(\theta, x_H)$ to train neural surrogate models capable of accurate and calibrated inference at strongly reduced HF simulation budgets (Thiele et al., 1 Jul 2025, Krouglova et al., 12 Feb 2025, Saoulis et al., 27 May 2025, Hikida et al., 6 Jun 2025).

2. Core Design Patterns and Methodologies

2.1 Transfer and Pretraining-Based MF-NPE

A dominant motif is transfer learning within neural posterior estimation (NPE) (Krouglova et al., 12 Feb 2025, Saoulis et al., 27 May 2025, Thiele et al., 1 Jul 2025). The standard workflow is as follows:

Pretraining: A neural density estimator (typically a conditional normalizing flow $q_{\phi_L}(\theta|x_L)$ ) is first trained on a large corpus of LF simulated pairs $(\theta, x_L)$ :

$\mathcal{L}_L(\phi_L) = \mathbb{E}_{(\theta,x_L)}\left[-\log q_{\phi_L}(\theta|x_L)\right]$

Fine-tuning: The pretrained network is then repurposed as $q_{\phi_H}(\theta|x_H)$ and fine-tuned using the much smaller HF set, now minimizing:

$\mathcal{L}_H(\phi_H) = \mathbb{E}_{(\theta,x_H)}\left[-\log q_{\phi_H}(\theta|x_H)\right].$

No bespoke loss weighting or domain adaptation is generally required; regularized gradient-based updates preserve transferable LF knowledge.

In practice, this reduces the required number of HF simulations by an order of magnitude (8–15 $\times$ ) in cosmological inference, while maintaining posterior accuracy and coverage (Saoulis et al., 27 May 2025). Similar improvements are reported for statistical dynamics, multi-compartment neuron, and large-scale spiking network models (Krouglova et al., 12 Feb 2025).

2.2 Feature Matching and Distillation

Recent approaches incorporate more expressive summary translation between LF and HF domains via embedding networks and stochastic transfer mappings (Thiele et al., 1 Jul 2025). A typical system consists of:

Separate neural embeddings ( $S_\sigma$ , $S_\tau$ ) for LF and HF features.
A transfer network $r_\psi(u|v)$ (often parameterized as a Gaussian MLP) mapping LF embeddings to approximate HF-summary distributions.
A main conditional flow $q_\phi(\theta|u)$ trained on both direct HF summaries and transferred approximations from the LF domain.

The combined training objective includes the canonical HF NPE loss, a feature-matching term, a transferred LF loss, and a knowledge-distillation term where a teacher flow trained on LF data is distilled into the target HF flow. Loss balancing is governed by hyperparameters $(\alpha, \beta, \gamma)$ .

This leads to robust posterior approximations, especially in limited HF data regimes (reductions of negative log test posterior by $10$–$20$\% at small budgets, superior two-sample and MMD metrics) (Thiele et al., 1 Jul 2025).

2.3 Multilevel Monte Carlo Neural Training

Some frameworks leverage the telescoping structure of multilevel Monte Carlo (MLMC), decomposing the expected training loss over multiple simulator fidelities (Hikida et al., 6 Jun 2025):

$\ell(\phi) = \mathbb{E}[f^0_\phi] + \sum_{l=1}^L \mathbb{E}[f^l_\phi - f^{l-1}_\phi]$

where $f^l_\phi$ denotes the neural loss component at fidelity level $l$ . MLMC-based SBI optimally allocates samples across levels to minimize total variance per unit simulation cost. Gradient stabilization via rescaling and “gradient surgery” supports training convergence even when loss differences become small (Hikida et al., 6 Jun 2025).

Empirically, this substantially lowers the KL divergence and negative log-posterior between the learned and reference distributions, outperforming simple transfer baselines especially under severe HF constraints.

2.4 Hierarchical Latent Variable Modeling

Multifidelity hierarchical neural processes (MF-HNP) model inter-fidelity dependencies via latent variable hierarchies (Wu et al., 2022). Data at each fidelity is assigned a latent state $Z^{(\ell)}$ . The generative model factorizes as:

$p(Y^{(1:L)}, Z^{(1:L)}| X^{(1:L)}) = p(Z^{(1)}) \prod_{\ell=1}^L p(Y^{(\ell)}| Z^{(\ell)}, X^{(\ell)}) \prod_{\ell=2}^L p(Z^{(\ell)}| Z^{(\ell-1)}, X^{(\ell)})$

Variational inference is performed via ELBO maximization. Unlike GP-based approaches, MF-HNP is suitable for high-dimensional and non-nested data, and prevents error propagation between fidelity levels due to conditional independence structure.

2.5 Discrepancy Modeling and Surrogate Corrections

Alternative workflows first construct a neural or physics-based surrogate $f_l$ as an LF emulator and then model the residual $f_h(\theta)-f_l(\theta)$ either via a neural network or dimensionality reduction (e.g., Neural Active Manifolds). Posterior inference proceeds by integrating this effective surrogate in the likelihood, or by learning error distributions with normalizing flows to correct noise models in the posterior (Choi et al., 13 Jun 2025).

When a good global surrogate exists, this yields optimal accuracy/cost trade-offs; otherwise, discrepancy-based corrections or error-aware likelihoods are recommended.

3. Practical Implementation Schemes

The main practical instantiations of MF-SBI can be summarized as follows:

Approach	Network Types	LF Use	Loss Components	Key Regularization/Acquisition
Transfer/pretrain-finetune	Conditional NF/Flow+CNN/MLP	weight init	HF NPE loss	Weight decay, low LR
Feature match/distillation	Dual MLP embeddings, Flow	summary transfer	HF loss, LF-transferred, feature match, knowledge distillation	Linear ramp of loss weights ( $\alpha, \beta, \gamma$ )
MLMC-based	Flow+MLP	all levels	MLMC telescoping sum	Gradient surgery/rescale
Hierarchical neural process	Encoder/decoder MLPs	latent summary	ELBO (KL + per-level logliks)	KL-annealing, agg. methods
Discrepancy/flow correction	MLP, NeurAM, Real-NVP	error surrogates	NN loss, flow error likelihood	Likelihood noise inflation
Kriging + DNN fusion	GP surrogates + NN	prediction fusion	MSE, GP loess	Active learning (U-criterion)

Training typically proceeds in at least two stages (LF pretraining and HF finetuning), possibly augmented with active selection (e.g., ensemble-based acquisition), MLMC sample allocation, or hierarchical variational inference. Optimization protocols leverage Adam/AdamW, learning rate staging, batch balancing across fidelities, and early stopping.

4. Performance Metrics and Empirical Results

Performance in MF-SBI is evaluated by multiple standard and domain-specific criteria:

Posterior accuracy: Negative log test posterior (NLTP), mean test posterior probability (MTPP), mean squared error (MSE), classifier two-sample test (C2ST), maximum mean discrepancy (MMD).
Uncertainty quantification: Calibration error via credibility-coverage, empirical coverage of credible intervals, trace of posterior covariance, simulation-based calibration (SBC).
Computational efficiency: Number of HF runs required to reach baseline performance (reduction factor), total CPU time, posterior evaluation time.

Empirical benchmarks across scientific domains (cosmological parameter inference, physiological inverse problems, neuron and network models) consistently demonstrate that multifidelity approaches recover posterior distributions nearly indistinguishable from those obtained with orders-of-magnitude more HF simulations (Thiele et al., 1 Jul 2025, Saoulis et al., 27 May 2025, Krouglova et al., 12 Feb 2025, Hikida et al., 6 Jun 2025, Wu et al., 2022).

5. Limitations and Recommendations

MF-SBI benefits are maximized under the following conditions:

The LF model exhibits non-trivial correlation with HF outputs and is sufficiently informative to guide initial posterior estimates.
The architecture and training regime preserve transferable features while permitting sufficient expressivity to adapt to HF-specific corrections.
Discrepancy or error modeling is tractable and uncertainty estimates are required to be tightly calibrated, in which case flow-based noise correction is effective.

If the LF model is severely misspecified, the utility of transfer learning or discrepancy correction declines. High dimensionality in $\theta$ remains a challenge for neural flows, though MF-HNP and summary-based reductions help mitigate this for some problems (Wu et al., 2022).

Recommended practical workflows are:

For moderate dimension and good LF models, use pretrain+finetune schemes with direct transfer of flow weights (Krouglova et al., 12 Feb 2025).
For high-dimensional or complex inter-fidelity relations, employ embedding-based summary translation, MLMC, or hierarchical latent variables (Thiele et al., 1 Jul 2025, Hikida et al., 6 Jun 2025, Wu et al., 2022).
When accurate uncertainty bounds are critical, augment surrogates with error-aware flows (Choi et al., 13 Jun 2025).

Active learning, sequential acquisition, and adaptive sample allocation further increase efficiency where simulator costs are extreme.

6. Applications and Outlook

MF-SBI has been successfully deployed in cosmology (inference from matter power spectrum, dark matter density maps), neurobiophysics, cardiovascular modeling, epidemiology, and climate modeling (Thiele et al., 1 Jul 2025, Saoulis et al., 27 May 2025, Choi et al., 13 Jun 2025, Wu et al., 2022). The underlying principles generalize to any SBI context featuring multiple simulators of varying cost/fidelity and high-dimensional, structured data.

Key anticipated advances include extending MF-SBI to new neural backbone types (e.g., diffusion models), developing standardized multifidelity SBI benchmarks, and integrating domain-informed LF model construction (Krouglova et al., 12 Feb 2025). The consistent conclusion is that MF-SBI achieves calibrated, sample-efficient posterior estimation, bringing fully Bayesian inference within reach for models of computational complexity previously deemed intractable.