Pseudo-Marginal MCMC with Randomized Fidelity
- The paper introduces pseudo-marginal MCMC with randomized fidelity, enabling unbiased sampling by replacing intractable likelihoods with stochastic estimators from multi-fidelity models.
- It employs Russian roulette and block-update techniques to balance estimator variance against computational cost, ensuring effective mixing in high-dimensional scenarios.
- The method demonstrates scalability and significant performance gains in Bayesian inference, particularly for computationally expensive models such as high-fidelity simulations.
Pseudo-marginal Markov chain Monte Carlo (MCMC) with randomized fidelity encompasses a family of asymptotically exact Monte Carlo algorithms for situations in which evaluating the target density requires computationally expensive models, such as high-fidelity physical simulations, intractable integrals, or large data likelihoods. These methods employ a hierarchy of models of varying fidelities, where randomized-rank estimators and block updates yield unbiased estimates of the target density or its logarithm, allowing for rigorous MCMC sampling with reduced computational cost. Central to these methodologies are Russian roulette estimators, block pseudo-marginal and block-Poisson estimators, and the tuning of randomized fidelity to balance estimator variance and computational expense (Cai et al., 2022, &&&1&&&, Monterrubio-Gómez et al., 2021, Quiroz et al., 2016, Quiroz et al., 2014).
1. Fundamentals of Pseudo-Marginal MCMC with Randomized Fidelity
Pseudo-marginal MCMC permits sampling from an intractable posterior by replacing the (possibly intractable) likelihood or posterior with an unbiased estimator . When the estimator is nonnegative and unbiased, the standard Metropolis–Hastings (MH) algorithm on the extended space retains as its marginal. In randomized fidelity settings, a telescoping sequence of increasing-fidelity models or unbiased log-likelihood estimators from data subsampling forms the basis for the stochastic estimator. Randomization enters via the random truncation index in Russian roulette estimators, block choices in data partitioning, or stochastic Poisson products in exponential estimators (Cai et al., 2022, Tran et al., 2016, Monterrubio-Gómez et al., 2021, Quiroz et al., 2016, Quiroz et al., 2014).
2. Multi-Fidelity Telescoping and Russian Roulette Estimators
Suppose a sequence of approximations is available, with incremental differences (where ). The sum telescopes: as . A randomly truncated estimator is constructed by drawing , and setting , with . This yields an unbiased estimator: . Common choices for include geometric and Poisson-shifted distributions, which allow control over computational cost versus estimator variance. The trade-off parameter ( for geometric) tunes expected truncation level and thus cost and variance (Cai et al., 2022).
This methodology generalizes the so-called “Russian roulette” estimators developed for expectations involving infinite series or coupled model hierarchies and is particularly useful where high-fidelity evaluations are disproportionately costly compared to coarser approximations.
3. Block Pseudo-Marginal and Block-Poisson Estimation
The block pseudo-marginal (BPM) framework partitions the sources of Monte Carlo randomness (e.g., subsample indices, particles, QMC seeds) into blocks. At each MH step, only one block is refreshed, inducing high correlation between the log-likelihood estimates at current and proposed . The likelihood estimator factorizes as . For large , the correlation , allowing efficient tuning of mixing properties (Tran et al., 2016, Quiroz et al., 2014).
The block-Poisson estimator further introduces Poisson randomization, representing the unbiased estimator of the exponential of a sum via a product of block-wise Poisson estimators. For a likelihood and unbiased estimator , the block-Poisson construction enables unbiased and, in general, signed likelihood estimators via control variates and data batching. This estimator is not strictly positive; thus, the absolute value is used for the MCMC, and posterior expectations are corrected using signed importance sampling. The effectiveness of block updates on mixing stems from inducing positive correlation between successive log-likelihood estimates, formalized as where is the number of blocks (Quiroz et al., 2016).
4. Control Variates, Data Subsampling, and Doubly Stochastic Estimators
Randomized fidelity approaches in large-scale or high-dimensional scenarios frequently combine control variate constructions with data subsampling for efficient unbiased estimation of log-likelihoods. For data , per-observation log-likelihoods are expanded to , with a Taylor or quantized surrogate and a residual. Subsets of size allow difference estimators for the sum: , leading to low-variance stochastic estimation of the full log-likelihood. Unbiasedness and variance reduction follow from the properties of control variates and block updates. Bias correction for the likelihood, via the subtraction of half the variance estimate from 's exponent (Ceperley–Dewing correction), ensures negligible perturbations of the posterior in large samples (Quiroz et al., 2014).
Doubly stochastic estimators, as in variationally sparse GPs, further incorporate stochastic latent variable sampling using Taylor expansions and block-Poisson exponentiation to maintain unbiasedness (Monterrubio-Gómez et al., 2021).
5. Cost, Variance, and Tuning Strategies
A central methodological concern is trade-off balancing: lowering estimator variance improves MCMC mixing but increases per-iteration cost. Conversely, tolerating higher variance reduces computation per iteration but degrades MCMC efficiency. For estimators using block structures, the effective correlation between successive log-likelihood values is tuned via the number of blocks (or for block-Poisson), with higher inducing higher correlation and thereby, per Deligiannidis et al. and Pitt et al., allowing use of noisier estimators with acceptable inefficiency. Quantitatively, optimal per-block variance targets (e.g., for MC, for RQMC) and block sizes (e.g., –$200$) are derived via cost-inefficiency minimization (Tran et al., 2016).
For telescoping Russian roulette estimators, the law can be adapted to minimize variance for a fixed expected cost by setting (Cai et al., 2022). For subsampling and control variate estimators, pilot runs estimate variance constants for batch-size selection, and scheduling strategies such as “fidelity annealing” deploy lower-cost, higher-variance estimators during burn-in, then increase fidelity post-convergence (Monterrubio-Gómez et al., 2021).
6. Empirical Performance and Application Domains
Pseudo-marginal MCMC with randomized fidelity demonstrates empirical gains in both wall-clock and computational time relative to single-fidelity or full-data sampling. Applications include log-Gaussian Cox process modeling (where MF-MCMC yields 3–5 reduction in wall-clock per effective sample while matching posterior credible intervals (Cai et al., 2022)), Bayesian ODE/PDE system identification, Gaussian process inference, and large-scale logistic regression (Tran et al., 2016, Quiroz et al., 2016, Monterrubio-Gómez et al., 2021, Quiroz et al., 2014).
Benchmarking studies report:
- Block-Poisson pseudo-marginal methods producing 10–10-fold gains in relative computational time over full-data MCMC and outperforming Firefly and continuous-time zig-zag samplers in moderate and high dimensions (Quiroz et al., 2016).
- Block pseudo-marginal and RQMC combinations yielding scaling reductions from to for panel or subsampled likelihoods (Tran et al., 2016).
- Subsampling PM methods with well-constructed control variates and bias correction achieving 10–30 speed-ups and negligible errors ( or better in relative posterior error) (Quiroz et al., 2014).
7. Theoretical Guarantees and Limitations
Under mild regularity—finite estimator variance uniformly in the parameter, irreducible proposal kernels—pseudo-marginal chains retain ergodicity with stationary marginal coinciding with the true target posterior (Cai et al., 2022). In the signed estimator case, provided integrability of , consistent expectation estimation is achieved via signed importance sampling, with estimator variance inflation tied to the average sign (Quiroz et al., 2016). Perturbations from approximate unbiasedness in the log-likelihood manifest as total variation errors, formally negligible for large datasets when using control variates and batching (Quiroz et al., 2014).
The employment of randomized fidelity requires careful tuning: insufficient blocks or batch sizes degrade mixing and estimation; over-conservative choices waste computational resources. The practical approaches rely on pilot runs and empirical or analytic cost-variance tuning. For block-Poisson estimators, negative values can arise, necessitating signed corrections but not compromising theoretical exactness (Quiroz et al., 2016).
Pseudo-marginal MCMC with randomized fidelity provides a rigorous, tunable, and proven pathway to scalable Bayesian computation in the presence of computationally heterogeneous models.