Nested Sampling for Bayesian Inference

Updated 25 January 2026

Nested sampling is a Monte Carlo algorithm that reformulates a high-dimensional Bayesian evidence integral into a one-dimensional integration over the prior volume.
It iteratively removes the lowest-likelihood live point and replaces it by sampling within a constrained prior, yielding both evidence estimates and posterior samples.
The method features robust diagnostic tests, adaptable sampling strategies, and is widely applied in fields like astronomy, cosmology, statistical physics, and engineering.

Nested sampling is a Monte Carlo algorithm introduced by Skilling (2004) for the efficient computation of Bayesian evidence (marginal likelihoods) and posterior inference in high-dimensional, multimodal, or degenerate probability distributions. The method reformulates the integral for Bayesian evidence into a one-dimensional integral over the prior volume, facilitating rigorous model comparison while producing posterior samples as a byproduct. Its sampling strategy and error quantification, adaptability to various domain geometries, and robust diagnostic and parallelization procedures have made it a cornerstone methodology in astronomy, cosmology, statistical physics, and engineering applications.

1. Mathematical Foundations and Evidence Integral

In Bayesian inference, the evidence for model parameters $\theta \in \Omega$ is given by

$Z = \int_{\Omega} L(\theta)\,\pi(\theta)\,d\theta,$

where $L(\theta)$ is the likelihood and $\pi(\theta)$ is the prior density on $\Omega$ . Nested sampling reformulates this high-dimensional integral as a one-dimensional integral over the “prior volume” $X$ : $X(\lambda) = \int_{L(\theta) > \lambda} \pi(\theta) d\theta,$ so that $X(\lambda)$ is a monotonically decreasing function as the threshold $\lambda$ increases. The evidence integral becomes

$Z = \int_0^1 L(X) dX,$

where $L(X)$ is the inverse function of $X(\lambda)$ , mapping a prior mass $X$ to the corresponding likelihood level. This transformation is exact and allows the computation of $Z$ via a numerically stable quadrature over $X$ (Buchner, 2021, Ashton et al., 2022, Latz et al., 2023, Feroz et al., 2013).

2. Core Algorithm and Implementation

Nested sampling maintains a set of $N$ “live” points $\{\theta_i\}$ sampled from the prior. At each iteration, it:

Identifies and removes the live point with the smallest likelihood, $L_i$ .
Estimates the shrinkage of the prior volume via an order-statistics result: $t_i \sim \operatorname{Beta}(N,1)$ , so $X_i = t_i X_{i-1}$ , with $X_0 = 1$ .
Accumulates the evidence contribution $Z \leftarrow Z + L_i (X_{i-1} - X_i)$ .
Replaces the discarded point by sampling from the prior restricted to $L(\theta) > L_i$ .
Repeats until the remaining possible evidence contribution is negligible.

Posterior samples and weighted estimates for any function $f(\theta)$ are produced from the sequence of dead points and their associated weights $w_i = X_{i-1} - X_i$ (Buchner, 2021, Ashton et al., 2022, Feroz et al., 2013, Betancourt, 2010). Deterministic approximations $X_i \approx \exp(-i/N)$ are commonly employed in practical settings.

3. Sampling Strategies: Constrained Priors and Algorithmic Variants

The critical computational step in nested sampling is drawing new live points from the constrained prior $\pi(\theta) \,\mathbb{I}[L(\theta) > L_i]$ , where $\mathbb{I}$ is the indicator function. Principal strategies include:

Region Samplers (e.g., MultiNest): Define simple geometric shapes (ellipsoids, clusters) containing the live points, sampling uniformly within or via rejection (Buchner, 2014, Ashton et al., 2022).
Markov Chain Methods (e.g., PolyChord, slice sampling): Evolve the state via MCMC tailored to respect the hard likelihood constraint (Albert, 2020, Buchner, 2021, Buchner, 2023).
Hamiltonian/Geometric Methods: Utilize reflected Hamiltonian trajectories (CHMC (Betancourt, 2010), Galilean MC (Feroz et al., 2013)), yielding high-probability exploration within constrained contours.
Geometric Samplers: For domains such as tori and spheres, employ wrapped proposals or projections to avoid boundary inefficiencies; e.g., geometric nested sampling (GNS) applies embedded move sets on non-Euclidean parameter spaces (2002.04123, Javid, 2019).

Advanced region samplers employ machine learning—normalizing flows or bijectors—to model arbitrary priors or to efficiently represent complex constraint boundaries (Alsing et al., 2021, Yallup et al., 2022).

Key Algorithmic Variants

Variant	Sampling Kernel	Feature
Ellipsoidal/Cluster Samplers	Rejection in union regions	Fast in low $d$ , poor in high $d$ multimodal
Slice sampling/MCMC	Local moves (constraint)	Polynomial scaling in $d$ , robust to degeneracies
Hamiltonian MC	Reflected HMC, Galilean	Fast, handles thin curving regions
Geometric NS	Domain-wrapped proposals	Efficiency on circles, spheres, tori
Snowballing NS	Increasing $N$ , fixed steps	Asymptotically unbiased, MCMC step stabilization
Bijector-based NS	Learned invertible transforms	Arbitrary prior/geometric support

4. Error Analysis, Uncertainty Quantification, and Diagnostics

Nested sampling provides principled error estimates for the computed evidence. The dominant source of Monte Carlo error arises from the stochastic nature of the volume shrinkage $t_i$ . Leading-order uncertainty in the log-evidence is

$\sigma_{\ln Z} \approx \sqrt{H/N},$

where $H$ is the information gain (Kullback–Leibler divergence) from prior to posterior mass in $X$ space (Buchner, 2021, Fowlie et al., 2022, Keeton, 2011).

Skilling’s information-theoretic variance and Keeton’s moment-propagation variance agree to leading order, both reducing to $M \sigma_Z^2/Z^2 \simeq -1 + \langle -\ln X \rangle$ as long as $L(X)$ is locally smooth and $\Delta Z/Z \ll 1$ (Fowlie et al., 2022, Keeton, 2011). Statistical error from a single NS run (via volume-shrinkage resampling) closely matches the run-to-run variance over independent runs.

Diagnostic tests for correctness and implementation-specific bias are essential:

Insertion index uniformity: Checks the uniformity in the rank at which new live points enter the likelihood-sorted list. Significant deviations (via KS tests) flag failures in constrained prior sampling or the presence of likelihood plateaus (Fowlie et al., 2020, Buchner, 2021).
Shrinkage tests: Compare actual volume shrinkages to the theoretical Beta distribution, identifying over- or under-compression, e.g., due to region sampling errors (Buchner, 2014).
Thread-based and multi-run variance diagnostics: Partitioning and comparing “live-point threads” across runs to gauge implementation variance and mode-finding robustness (Higson et al., 2018).

Visual tools such as $log\,X$ -sample trace diagrams and bootstrap-based uncertainty plots add further diagnostic granularity (Higson et al., 2018, Handley, 2019).

5. Algorithmic Extensions: Geometries, Dynamic NS, Parallelization

Nested sampling is highly adaptable to different geometries:

Wrapped proposal distributions for periodic domains (circles/tori) and projection moves for spheres prevent boundary-induced sampling bias and increase acceptance rates in non-trivial geometries (2002.04123, Javid, 2019).
Learned bijective transforms allow arbitrary priors and enable nested sampling in complex constraint spaces without analytic prior transforms (Alsing et al., 2021).

Dynamic Nested Sampling (dynesty, dyPolyChord) dynamically varies the number of live points or allocates sampling density to regions with the highest evidence or posterior mass, optimizing resource allocation during the run (Buchner, 2021, Yallup et al., 2022).

Parallelization can be exploited by batch removal/insertion of live points or by farmed constrained-prior samplers, with careful attention to shrinkage-statistic variance (Pfeifenberger et al., 2016, Buchner, 2021).

Snowballing Nested Sampling progressively increases the number of live points, stabilizing MCMC-proposal parameters and converging asymptotically to unbiased evidence and posterior estimates without requiring ever-longer MCMC chains at fixed $N$ (Buchner, 2023).

6. Applications, Performance, and Limitations

Nested sampling is applied across a range of scientific domains:

Bayesian model selection: Calculation of Bayes factors for cosmology, particle physics, and model choice scenarios (Ashton et al., 2022, Latz et al., 2023).
Multimodal and degenerate inference: Simultaneous localization and mapping (SLAM) factor graphs, high-dimensional Gaussian mixtures, object detection, particle event-generation phase space (Huang et al., 2021, Feroz et al., 2013, Yallup et al., 2022).
Statistical physics: Computation of partition functions, free energies, and thermodynamic observables in the Potts and Ising models (Pfeifenberger et al., 2016).
Engineering and rare event estimation: Quantification of extremely small probabilities in reliability engineering or finance (Latz et al., 2023).

Algorithmic performance is competitive or superior to thermodynamic integration and traditional MCMC/posterior samplers, especially in high dimensionality or severe multi-modality. For evidence estimation, NS achieves uncertainties scaling as $\sim 1/\sqrt{N}$ , provided all posterior modes are populated by live points.

Limitations: Failure to uniformly sample from the constrained prior (e.g., due to inadequate region construction or insufficient MCMC decorrelation) induces evidence bias or underexplored modes (Buchner, 2014, Fowlie et al., 2020). Plateaus and discontinuities in the likelihood lead to non-uniformity in the transformed $X$ -space, necessitating special handling for both evidence and uncertainty estimation (Latz et al., 2023, Buchner, 2021). In very high-dimensional problems, region samplers become exponentially inefficient; step-based or flow-based adaptations are required (Ashton et al., 2022, Albert, 2020). Memory and computational requirements scale with the product of the number of live points and parameter dimension.

7. Software Ecosystem, Best Practices, and Current Trends

Well-maintained, high-performance open-source NS implementations include MultiNest, PolyChord, dynesty, UltraNest, cpnest, JAXNS, and ecosystem tools for diagnostics (nestcheck), visualization (anesthetic), and postprocessing (Albert, 2020, Handley, 2019, Higson et al., 2018).

Best Practices:

Select the number of live points commensurate with posterior complexity and evidence tolerance.
Employ diagnostic checks (insertion rank, shrinkage, multi-run variance) routinely (Fowlie et al., 2020, Higson et al., 2018, Buchner, 2021).
Use appropriate sampling kernels for the problem’s geometry and dimensionality.
Publish run metadata (priors, code, evidence, diagnostics) and, where possible, the sequence of discarded (“dead”) points and weights for transparency and replication (Ashton et al., 2022).
Address plateaus and rare events with algorithmic extensions as required (Latz et al., 2023).

Trends in the field include the adoption of flow-based proposals, dynamic allocation of sampling effort, ensemble and parallelized approaches, and robust error estimation for both evidence and posterior moments (Buchner, 2021, Alsing et al., 2021, Buchner, 2023). The approach continues to adapt to new challenges in high-dimensional, multimodal, non-Euclidean, or non-standard integration domains.

References