Nested Sampling for Bayesian Inference
- Nested sampling is a Monte Carlo algorithm that reformulates a high-dimensional Bayesian evidence integral into a one-dimensional integration over the prior volume.
- It iteratively removes the lowest-likelihood live point and replaces it by sampling within a constrained prior, yielding both evidence estimates and posterior samples.
- The method features robust diagnostic tests, adaptable sampling strategies, and is widely applied in fields like astronomy, cosmology, statistical physics, and engineering.
Nested sampling is a Monte Carlo algorithm introduced by Skilling (2004) for the efficient computation of Bayesian evidence (marginal likelihoods) and posterior inference in high-dimensional, multimodal, or degenerate probability distributions. The method reformulates the integral for Bayesian evidence into a one-dimensional integral over the prior volume, facilitating rigorous model comparison while producing posterior samples as a byproduct. Its sampling strategy and error quantification, adaptability to various domain geometries, and robust diagnostic and parallelization procedures have made it a cornerstone methodology in astronomy, cosmology, statistical physics, and engineering applications.
1. Mathematical Foundations and Evidence Integral
In Bayesian inference, the evidence for model parameters is given by
where is the likelihood and is the prior density on . Nested sampling reformulates this high-dimensional integral as a one-dimensional integral over the “prior volume” : so that is a monotonically decreasing function as the threshold increases. The evidence integral becomes
where is the inverse function of , mapping a prior mass to the corresponding likelihood level. This transformation is exact and allows the computation of via a numerically stable quadrature over (Buchner, 2021, Ashton et al., 2022, Latz et al., 2023, Feroz et al., 2013).
2. Core Algorithm and Implementation
Nested sampling maintains a set of “live” points sampled from the prior. At each iteration, it:
- Identifies and removes the live point with the smallest likelihood, .
- Estimates the shrinkage of the prior volume via an order-statistics result: , so , with .
- Accumulates the evidence contribution .
- Replaces the discarded point by sampling from the prior restricted to .
- Repeats until the remaining possible evidence contribution is negligible.
Posterior samples and weighted estimates for any function are produced from the sequence of dead points and their associated weights (Buchner, 2021, Ashton et al., 2022, Feroz et al., 2013, Betancourt, 2010). Deterministic approximations are commonly employed in practical settings.
3. Sampling Strategies: Constrained Priors and Algorithmic Variants
The critical computational step in nested sampling is drawing new live points from the constrained prior , where is the indicator function. Principal strategies include:
- Region Samplers (e.g., MultiNest): Define simple geometric shapes (ellipsoids, clusters) containing the live points, sampling uniformly within or via rejection (Buchner, 2014, Ashton et al., 2022).
- Markov Chain Methods (e.g., PolyChord, slice sampling): Evolve the state via MCMC tailored to respect the hard likelihood constraint (Albert, 2020, Buchner, 2021, Buchner, 2023).
- Hamiltonian/Geometric Methods: Utilize reflected Hamiltonian trajectories (CHMC (Betancourt, 2010), Galilean MC (Feroz et al., 2013)), yielding high-probability exploration within constrained contours.
- Geometric Samplers: For domains such as tori and spheres, employ wrapped proposals or projections to avoid boundary inefficiencies; e.g., geometric nested sampling (GNS) applies embedded move sets on non-Euclidean parameter spaces (2002.04123, Javid, 2019).
Advanced region samplers employ machine learning—normalizing flows or bijectors—to model arbitrary priors or to efficiently represent complex constraint boundaries (Alsing et al., 2021, Yallup et al., 2022).
Key Algorithmic Variants
| Variant | Sampling Kernel | Feature |
|---|---|---|
| Ellipsoidal/Cluster Samplers | Rejection in union regions | Fast in low , poor in high multimodal |
| Slice sampling/MCMC | Local moves (constraint) | Polynomial scaling in , robust to degeneracies |
| Hamiltonian MC | Reflected HMC, Galilean | Fast, handles thin curving regions |
| Geometric NS | Domain-wrapped proposals | Efficiency on circles, spheres, tori |
| Snowballing NS | Increasing , fixed steps | Asymptotically unbiased, MCMC step stabilization |
| Bijector-based NS | Learned invertible transforms | Arbitrary prior/geometric support |
4. Error Analysis, Uncertainty Quantification, and Diagnostics
Nested sampling provides principled error estimates for the computed evidence. The dominant source of Monte Carlo error arises from the stochastic nature of the volume shrinkage . Leading-order uncertainty in the log-evidence is
where is the information gain (Kullback–Leibler divergence) from prior to posterior mass in space (Buchner, 2021, Fowlie et al., 2022, Keeton, 2011).
Skilling’s information-theoretic variance and Keeton’s moment-propagation variance agree to leading order, both reducing to as long as is locally smooth and (Fowlie et al., 2022, Keeton, 2011). Statistical error from a single NS run (via volume-shrinkage resampling) closely matches the run-to-run variance over independent runs.
Diagnostic tests for correctness and implementation-specific bias are essential:
- Insertion index uniformity: Checks the uniformity in the rank at which new live points enter the likelihood-sorted list. Significant deviations (via KS tests) flag failures in constrained prior sampling or the presence of likelihood plateaus (Fowlie et al., 2020, Buchner, 2021).
- Shrinkage tests: Compare actual volume shrinkages to the theoretical Beta distribution, identifying over- or under-compression, e.g., due to region sampling errors (Buchner, 2014).
- Thread-based and multi-run variance diagnostics: Partitioning and comparing “live-point threads” across runs to gauge implementation variance and mode-finding robustness (Higson et al., 2018).
Visual tools such as -sample trace diagrams and bootstrap-based uncertainty plots add further diagnostic granularity (Higson et al., 2018, Handley, 2019).
5. Algorithmic Extensions: Geometries, Dynamic NS, Parallelization
Nested sampling is highly adaptable to different geometries:
- Wrapped proposal distributions for periodic domains (circles/tori) and projection moves for spheres prevent boundary-induced sampling bias and increase acceptance rates in non-trivial geometries (2002.04123, Javid, 2019).
- Learned bijective transforms allow arbitrary priors and enable nested sampling in complex constraint spaces without analytic prior transforms (Alsing et al., 2021).
Dynamic Nested Sampling (dynesty, dyPolyChord) dynamically varies the number of live points or allocates sampling density to regions with the highest evidence or posterior mass, optimizing resource allocation during the run (Buchner, 2021, Yallup et al., 2022).
Parallelization can be exploited by batch removal/insertion of live points or by farmed constrained-prior samplers, with careful attention to shrinkage-statistic variance (Pfeifenberger et al., 2016, Buchner, 2021).
Snowballing Nested Sampling progressively increases the number of live points, stabilizing MCMC-proposal parameters and converging asymptotically to unbiased evidence and posterior estimates without requiring ever-longer MCMC chains at fixed (Buchner, 2023).
6. Applications, Performance, and Limitations
Nested sampling is applied across a range of scientific domains:
- Bayesian model selection: Calculation of Bayes factors for cosmology, particle physics, and model choice scenarios (Ashton et al., 2022, Latz et al., 2023).
- Multimodal and degenerate inference: Simultaneous localization and mapping (SLAM) factor graphs, high-dimensional Gaussian mixtures, object detection, particle event-generation phase space (Huang et al., 2021, Feroz et al., 2013, Yallup et al., 2022).
- Statistical physics: Computation of partition functions, free energies, and thermodynamic observables in the Potts and Ising models (Pfeifenberger et al., 2016).
- Engineering and rare event estimation: Quantification of extremely small probabilities in reliability engineering or finance (Latz et al., 2023).
Algorithmic performance is competitive or superior to thermodynamic integration and traditional MCMC/posterior samplers, especially in high dimensionality or severe multi-modality. For evidence estimation, NS achieves uncertainties scaling as , provided all posterior modes are populated by live points.
Limitations: Failure to uniformly sample from the constrained prior (e.g., due to inadequate region construction or insufficient MCMC decorrelation) induces evidence bias or underexplored modes (Buchner, 2014, Fowlie et al., 2020). Plateaus and discontinuities in the likelihood lead to non-uniformity in the transformed -space, necessitating special handling for both evidence and uncertainty estimation (Latz et al., 2023, Buchner, 2021). In very high-dimensional problems, region samplers become exponentially inefficient; step-based or flow-based adaptations are required (Ashton et al., 2022, Albert, 2020). Memory and computational requirements scale with the product of the number of live points and parameter dimension.
7. Software Ecosystem, Best Practices, and Current Trends
Well-maintained, high-performance open-source NS implementations include MultiNest, PolyChord, dynesty, UltraNest, cpnest, JAXNS, and ecosystem tools for diagnostics (nestcheck), visualization (anesthetic), and postprocessing (Albert, 2020, Handley, 2019, Higson et al., 2018).
Best Practices:
- Select the number of live points commensurate with posterior complexity and evidence tolerance.
- Employ diagnostic checks (insertion rank, shrinkage, multi-run variance) routinely (Fowlie et al., 2020, Higson et al., 2018, Buchner, 2021).
- Use appropriate sampling kernels for the problem’s geometry and dimensionality.
- Publish run metadata (priors, code, evidence, diagnostics) and, where possible, the sequence of discarded (“dead”) points and weights for transparency and replication (Ashton et al., 2022).
- Address plateaus and rare events with algorithmic extensions as required (Latz et al., 2023).
Trends in the field include the adoption of flow-based proposals, dynamic allocation of sampling effort, ensemble and parallelized approaches, and robust error estimation for both evidence and posterior moments (Buchner, 2021, Alsing et al., 2021, Buchner, 2023). The approach continues to adapt to new challenges in high-dimensional, multimodal, non-Euclidean, or non-standard integration domains.
References
- (Feroz et al., 2013, Betancourt, 2010, Buchner, 2021, Ashton et al., 2022, Albert, 2020, Buchner, 2023, 2002.04123, Javid, 2019, Higson et al., 2018, Handley, 2019, Alsing et al., 2021, Fowlie et al., 2022, Keeton, 2011, Pfeifenberger et al., 2016, Buchner, 2014, Fowlie et al., 2020, Huang et al., 2021, Yallup et al., 2022, Latz et al., 2023).