Papers
Topics
Authors
Recent
Search
2000 character limit reached

Nested Sampling for Bayesian Inference

Updated 25 January 2026
  • Nested sampling is a Monte Carlo algorithm that reformulates a high-dimensional Bayesian evidence integral into a one-dimensional integration over the prior volume.
  • It iteratively removes the lowest-likelihood live point and replaces it by sampling within a constrained prior, yielding both evidence estimates and posterior samples.
  • The method features robust diagnostic tests, adaptable sampling strategies, and is widely applied in fields like astronomy, cosmology, statistical physics, and engineering.

Nested sampling is a Monte Carlo algorithm introduced by Skilling (2004) for the efficient computation of Bayesian evidence (marginal likelihoods) and posterior inference in high-dimensional, multimodal, or degenerate probability distributions. The method reformulates the integral for Bayesian evidence into a one-dimensional integral over the prior volume, facilitating rigorous model comparison while producing posterior samples as a byproduct. Its sampling strategy and error quantification, adaptability to various domain geometries, and robust diagnostic and parallelization procedures have made it a cornerstone methodology in astronomy, cosmology, statistical physics, and engineering applications.

1. Mathematical Foundations and Evidence Integral

In Bayesian inference, the evidence for model parameters θΩ\theta \in \Omega is given by

Z=ΩL(θ)π(θ)dθ,Z = \int_{\Omega} L(\theta)\,\pi(\theta)\,d\theta,

where L(θ)L(\theta) is the likelihood and π(θ)\pi(\theta) is the prior density on Ω\Omega. Nested sampling reformulates this high-dimensional integral as a one-dimensional integral over the “prior volume” XX: X(λ)=L(θ)>λπ(θ)dθ,X(\lambda) = \int_{L(\theta) > \lambda} \pi(\theta) d\theta, so that X(λ)X(\lambda) is a monotonically decreasing function as the threshold λ\lambda increases. The evidence integral becomes

Z=01L(X)dX,Z = \int_0^1 L(X) dX,

where L(X)L(X) is the inverse function of X(λ)X(\lambda), mapping a prior mass XX to the corresponding likelihood level. This transformation is exact and allows the computation of ZZ via a numerically stable quadrature over XX (Buchner, 2021, Ashton et al., 2022, Latz et al., 2023, Feroz et al., 2013).

2. Core Algorithm and Implementation

Nested sampling maintains a set of NN “live” points {θi}\{\theta_i\} sampled from the prior. At each iteration, it:

  1. Identifies and removes the live point with the smallest likelihood, LiL_i.
  2. Estimates the shrinkage of the prior volume via an order-statistics result: tiBeta(N,1)t_i \sim \operatorname{Beta}(N,1), so Xi=tiXi1X_i = t_i X_{i-1}, with X0=1X_0 = 1.
  3. Accumulates the evidence contribution ZZ+Li(Xi1Xi)Z \leftarrow Z + L_i (X_{i-1} - X_i).
  4. Replaces the discarded point by sampling from the prior restricted to L(θ)>LiL(\theta) > L_i.
  5. Repeats until the remaining possible evidence contribution is negligible.

Posterior samples and weighted estimates for any function f(θ)f(\theta) are produced from the sequence of dead points and their associated weights wi=Xi1Xiw_i = X_{i-1} - X_i (Buchner, 2021, Ashton et al., 2022, Feroz et al., 2013, Betancourt, 2010). Deterministic approximations Xiexp(i/N)X_i \approx \exp(-i/N) are commonly employed in practical settings.

3. Sampling Strategies: Constrained Priors and Algorithmic Variants

The critical computational step in nested sampling is drawing new live points from the constrained prior π(θ)I[L(θ)>Li]\pi(\theta) \,\mathbb{I}[L(\theta) > L_i], where I\mathbb{I} is the indicator function. Principal strategies include:

  • Region Samplers (e.g., MultiNest): Define simple geometric shapes (ellipsoids, clusters) containing the live points, sampling uniformly within or via rejection (Buchner, 2014, Ashton et al., 2022).
  • Markov Chain Methods (e.g., PolyChord, slice sampling): Evolve the state via MCMC tailored to respect the hard likelihood constraint (Albert, 2020, Buchner, 2021, Buchner, 2023).
  • Hamiltonian/Geometric Methods: Utilize reflected Hamiltonian trajectories (CHMC (Betancourt, 2010), Galilean MC (Feroz et al., 2013)), yielding high-probability exploration within constrained contours.
  • Geometric Samplers: For domains such as tori and spheres, employ wrapped proposals or projections to avoid boundary inefficiencies; e.g., geometric nested sampling (GNS) applies embedded move sets on non-Euclidean parameter spaces (2002.04123, Javid, 2019).

Advanced region samplers employ machine learning—normalizing flows or bijectors—to model arbitrary priors or to efficiently represent complex constraint boundaries (Alsing et al., 2021, Yallup et al., 2022).

Key Algorithmic Variants

Variant Sampling Kernel Feature
Ellipsoidal/Cluster Samplers Rejection in union regions Fast in low dd, poor in high dd multimodal
Slice sampling/MCMC Local moves (constraint) Polynomial scaling in dd, robust to degeneracies
Hamiltonian MC Reflected HMC, Galilean Fast, handles thin curving regions
Geometric NS Domain-wrapped proposals Efficiency on circles, spheres, tori
Snowballing NS Increasing NN, fixed steps Asymptotically unbiased, MCMC step stabilization
Bijector-based NS Learned invertible transforms Arbitrary prior/geometric support

4. Error Analysis, Uncertainty Quantification, and Diagnostics

Nested sampling provides principled error estimates for the computed evidence. The dominant source of Monte Carlo error arises from the stochastic nature of the volume shrinkage tit_i. Leading-order uncertainty in the log-evidence is

σlnZH/N,\sigma_{\ln Z} \approx \sqrt{H/N},

where HH is the information gain (Kullback–Leibler divergence) from prior to posterior mass in XX space (Buchner, 2021, Fowlie et al., 2022, Keeton, 2011).

Skilling’s information-theoretic variance and Keeton’s moment-propagation variance agree to leading order, both reducing to MσZ2/Z21+lnXM \sigma_Z^2/Z^2 \simeq -1 + \langle -\ln X \rangle as long as L(X)L(X) is locally smooth and ΔZ/Z1\Delta Z/Z \ll 1 (Fowlie et al., 2022, Keeton, 2011). Statistical error from a single NS run (via volume-shrinkage resampling) closely matches the run-to-run variance over independent runs.

Diagnostic tests for correctness and implementation-specific bias are essential:

  • Insertion index uniformity: Checks the uniformity in the rank at which new live points enter the likelihood-sorted list. Significant deviations (via KS tests) flag failures in constrained prior sampling or the presence of likelihood plateaus (Fowlie et al., 2020, Buchner, 2021).
  • Shrinkage tests: Compare actual volume shrinkages to the theoretical Beta distribution, identifying over- or under-compression, e.g., due to region sampling errors (Buchner, 2014).
  • Thread-based and multi-run variance diagnostics: Partitioning and comparing “live-point threads” across runs to gauge implementation variance and mode-finding robustness (Higson et al., 2018).

Visual tools such as logXlog\,X-sample trace diagrams and bootstrap-based uncertainty plots add further diagnostic granularity (Higson et al., 2018, Handley, 2019).

5. Algorithmic Extensions: Geometries, Dynamic NS, Parallelization

Nested sampling is highly adaptable to different geometries:

  • Wrapped proposal distributions for periodic domains (circles/tori) and projection moves for spheres prevent boundary-induced sampling bias and increase acceptance rates in non-trivial geometries (2002.04123, Javid, 2019).
  • Learned bijective transforms allow arbitrary priors and enable nested sampling in complex constraint spaces without analytic prior transforms (Alsing et al., 2021).

Dynamic Nested Sampling (dynesty, dyPolyChord) dynamically varies the number of live points or allocates sampling density to regions with the highest evidence or posterior mass, optimizing resource allocation during the run (Buchner, 2021, Yallup et al., 2022).

Parallelization can be exploited by batch removal/insertion of live points or by farmed constrained-prior samplers, with careful attention to shrinkage-statistic variance (Pfeifenberger et al., 2016, Buchner, 2021).

Snowballing Nested Sampling progressively increases the number of live points, stabilizing MCMC-proposal parameters and converging asymptotically to unbiased evidence and posterior estimates without requiring ever-longer MCMC chains at fixed NN (Buchner, 2023).

6. Applications, Performance, and Limitations

Nested sampling is applied across a range of scientific domains:

Algorithmic performance is competitive or superior to thermodynamic integration and traditional MCMC/posterior samplers, especially in high dimensionality or severe multi-modality. For evidence estimation, NS achieves uncertainties scaling as 1/N\sim 1/\sqrt{N}, provided all posterior modes are populated by live points.

Limitations: Failure to uniformly sample from the constrained prior (e.g., due to inadequate region construction or insufficient MCMC decorrelation) induces evidence bias or underexplored modes (Buchner, 2014, Fowlie et al., 2020). Plateaus and discontinuities in the likelihood lead to non-uniformity in the transformed XX-space, necessitating special handling for both evidence and uncertainty estimation (Latz et al., 2023, Buchner, 2021). In very high-dimensional problems, region samplers become exponentially inefficient; step-based or flow-based adaptations are required (Ashton et al., 2022, Albert, 2020). Memory and computational requirements scale with the product of the number of live points and parameter dimension.

Well-maintained, high-performance open-source NS implementations include MultiNest, PolyChord, dynesty, UltraNest, cpnest, JAXNS, and ecosystem tools for diagnostics (nestcheck), visualization (anesthetic), and postprocessing (Albert, 2020, Handley, 2019, Higson et al., 2018).

Best Practices:

  • Select the number of live points commensurate with posterior complexity and evidence tolerance.
  • Employ diagnostic checks (insertion rank, shrinkage, multi-run variance) routinely (Fowlie et al., 2020, Higson et al., 2018, Buchner, 2021).
  • Use appropriate sampling kernels for the problem’s geometry and dimensionality.
  • Publish run metadata (priors, code, evidence, diagnostics) and, where possible, the sequence of discarded (“dead”) points and weights for transparency and replication (Ashton et al., 2022).
  • Address plateaus and rare events with algorithmic extensions as required (Latz et al., 2023).

Trends in the field include the adoption of flow-based proposals, dynamic allocation of sampling effort, ensemble and parallelized approaches, and robust error estimation for both evidence and posterior moments (Buchner, 2021, Alsing et al., 2021, Buchner, 2023). The approach continues to adapt to new challenges in high-dimensional, multimodal, non-Euclidean, or non-standard integration domains.


References

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Nested Sampling.