Particle Gibbs with Ancestor Sampling (PGAS)
- PGAS is an advanced MCMC algorithm that integrates conditional SMC with a backward ancestor sampling step, effectively mitigating path degeneracy in state-space models.
- The method improves mixing and reduces autocorrelation by reassigning ancestry in latent trajectories, facilitating efficient joint inference of states and parameters.
- PGAS has been successfully applied in diverse fields such as Bayesian filtering, ecological modeling, and probabilistic programming, demonstrating robust convergence and computational benefits.
Particle Gibbs with Ancestor Sampling (PGAS) is an advanced Markov chain Monte Carlo (MCMC) algorithm designed for efficient joint inference of latent trajectories and static parameters in high-dimensional state-space models (SSMs). By combining conditional Sequential Monte Carlo (cSMC) with a backward-looking ancestor resampling step, PGAS greatly alleviates the path degeneracy and poor mixing characteristic of standard Particle Gibbs (PG) samplers in long sequences, non-Gaussian and nonlinear systems, and models with strong dependencies (Lindsten et al., 2014, Chopin et al., 2013, Lindsten et al., 2012, Wigren et al., 2019). The development, properties, and implementation of PGAS are summarized below.
1. Conditional SMC, PMCMC, and Context of PGAS
PGAS is a specific instance of the broader class of Particle MCMC (PMCMC) algorithms introduced by Andrieu, Doucet, and Holenstein, which embed Sequential Monte Carlo samplers as Gibbs moves within an MCMC framework to target the exact joint posterior over latent states and static parameters in complex statistical models. In PMCMC, sampling from the conditional distribution of the latent state sequence given parameters is replaced by running a cSMC sampler conditioned on a reference trajectory (Lindsten et al., 2014).
Conditional SMC (cSMC), the core ingredient of PG, proceeds by simulating an SMC particle filter in which one particular state sequence (the reference path) is held fixed. The resulting Markov kernel leaves the desired smoothing distribution invariant, but path degeneracy often impedes mixing as time horizon increases—the newly resampled trajectory frequently matches the old reference trajectory for a substantial prefix (Chopin et al., 2013, Gauraha, 2020).
2. Core PGAS Algorithm and Markov Kernel
PGAS augments cSMC with an ancestor sampling step at each time point, which randomly reassigns the ancestor of the retained trajectory according to a backward-weighted distribution. This procedure probabilistically breaks up the genealogy of the reference path and enables efficient exploration of latent trajectory space (Lindsten et al., 2012).
Let denote the latent state trajectory, the data, and the number of particles. Key steps of the PGAS kernel (in Markovian HMM or SSM context) are:
- Initialization: At , sample particles from the initial distribution, set the last particle to the reference. Compute and normalize weights.
- Forward Pass :
- For : sample ancestor according to normalized weights, propagate state .
- Ancestor Sampling for Reference: For , compute backward weights 0, where 1 is the reference at 2. Sample 3 from this distribution, set 4.
- Update and normalize weights.
- Path Extraction: Sample a new trajectory index according to terminal weights; reconstruct the full trajectory by tracing ancestor indices (Lindsten et al., 2014, Chopin et al., 2013, Ning et al., 2017).
For non-Markovian models, the backward weights generalize to
5
which can require truncation or approximation in practice (Lindsten et al., 2012).
3. Theoretical Properties and Mixing
Ancestor sampling fundamentally mitigates the path degeneracy inherent in standard PG. By re-choosing parental lineages of the conditioned trajectory, PGAS increases the chance of moving off the previous reference path, thereby producing a much more rapidly mixing Markov kernel over latent trajectories even with modest particle counts (often 6–7 suffices) (Chopin et al., 2013, Lindsten et al., 2014).
Main theoretical properties:
- Uniform ergodicity: Under mild regularity (e.g., bounded potentials), PGAS is uniformly ergodic for fixed 8, with the contraction coefficient improving as 9 increases (Chopin et al., 2013).
- Asymptotic variance reduction: PGAS achieves strictly lower lag-1 autocorrelation and therefore smaller asymptotic variance in the CLT than standard Particle Gibbs (Chopin et al., 2013).
- Exact invariance: The Markov kernel leaves the joint smoothing distribution and, if embedded in a full Gibbs scheme, the joint posterior over states and parameters, invariant (Lindsten et al., 2012, Lindholm et al., 2018).
4. Implementation, Computational Complexity, and Tuning
Each PGAS iteration requires one conditional SMC pass of 0 particles for 1 time-steps. For Markovian models, the overall per-iteration computational cost is 2. For non-Markovian models, backward-weights require additional computation, but efficient truncation strategies maintain linear or near-linear complexity (Lindsten et al., 2012, Wigren et al., 2019).
Guidelines and variations:
- Proposal distribution 3 may be the prior or an optimal auxiliary form to control the importance weight variance. For nonlinear or non-Gaussian systems, Taylor or problem-specific proposals can further stabilize sampling (Ning et al., 2017).
- The number of particles 4 is chosen to achieve satisfactory effective sample size (ESS) at each step. PGAS generally allows much smaller 5 than PG for comparable mixing and accuracy (Llewellyn et al., 6 Jan 2025).
- Systematic or residual resampling can be employed within cSMC to further reduce variance when ancestor sampling is infeasible (Chopin et al., 2013).
5. Applications and Extensions
PGAS has been adopted in a diversity of domains, including:
- Bayesian inference in nonlinear and/or non-Gaussian state-space models (e.g. GEV models with Gaussian copula dependence) (Ning et al., 2017).
- Ecological population dynamics and epidemic models, where parameter elimination can be performed within PGAS when conjugacy permits, yielding marginalized PGAS (mPGAS) with substantially improved mixing and efficiency (Wigren et al., 2019).
- Inference for infinite-state or nonparametric models, as in infinite Hidden Markov Models (iHMMs), exploiting optimized proposals within the PGAS kernel (Tripuraneni et al., 2015).
- Efficient smoothing of Markov jump processes and continuous-time Bayesian networks, even for infinite or very large discrete state spaces (Miasojedow et al., 2015).
- Multiscale and multivariate hierarchical models, supporting joint learning of latent trajectories and hyperparameters (Vélez-Cruz et al., 2024).
- Probabilistic programming systems, where delayed-sampling and automatic conjugacy recognition enable automated deployment of PGAS and mPGAS (Wigren et al., 2019, Meent et al., 2015).
6. Empirical Performance and Diagnostics
Empirical studies consistently demonstrate that PGAS substantially outperforms standard PG in terms of trajectory mixing, autocorrelation, and robustness to particle number, especially as the time horizon 6 becomes large. For instance, in Bayesian GEV copula models, PGAS with 7 maintained an "update frequency" (the proportion of MCMC sweeps where the reference path is altered) close to 99.9%—compared to rapid degeneracy and near-fixed paths in PG (Ning et al., 2017).
Key diagnostics reported in the literature include:
- Posterior credible bands on latent states closely tracking ground truth.
- Marginal inefficiency factors for global parameters that remain within the practicable range even in high dimensions and strongly dependent models.
- Consistently lower lag-1 autocorrelation and higher ESS per unit time for PGAS relative to PG and to alternative PMCMC variants (Chopin et al., 2013, Wigren et al., 2019, Llewellyn et al., 6 Jan 2025).
PGAS can be combined with stochastic approximation EM, Rao-Blackwellized smoothing, and grid-based HMM approximations for further computational gains and scalability (Lindholm et al., 2018, Llewellyn et al., 6 Jan 2025, Lindsten et al., 2012).
7. Model-Specific and Algorithmic Innovations
Recent extensions and innovations include grid-based PGAS for continuous-valued SSMs, marginalized PGAS exploiting conjugacy, and adaptive truncation strategies for non-Markovian ancestry weights. Detailed implementation recipes and pseudocode for all such variants are available in the cited works.
- Grid-PGAS leverages discrete HMM approximations to focus particle allocation effectively in high-posterior-density regions, yielding order-of-magnitude computational speedups in regime-switching and real-world forecasting models (Llewellyn et al., 6 Jan 2025).
- Marginalized PGAS (mPGAS) uses exponential family and conjugacy structure to integrate out parameters during trajectory updates, minimizing autocorrelation and enabling automatable inference backends in probabilistic programming languages (Wigren et al., 2019).
- Non-Markovian PGAS employs backward-weight truncation: under suitable decay of dependence, the theoretical impact of truncating ancestor weights decays exponentially and can be adaptively controlled (Lindsten et al., 2012).
PGAS therefore forms a generic, robust, and theoretically-grounded foundation for high-dimensional Bayesian filtering and smoothing, enabling scalable and precise inference across a wide variety of modern state-space models (Lindsten et al., 2014, Ning et al., 2017, Wigren et al., 2019, Llewellyn et al., 6 Jan 2025).