Rejection Sampling: Methods & Applications
- Rejection sampling is a Monte Carlo technique that transforms sampling from a tractable proposal distribution into exact samples from a target pdf.
- It employs an envelope condition (f(x) ≤ Cg(x)) and has evolved with adaptive variants like ARS and CARS to improve efficiency in Bayesian and generative modeling.
- Extensions into quantum algorithms, diffusion models, and combinatorial methods showcase its versatility in addressing diverse computational challenges.
Rejection sampling is a foundational Monte Carlo method for generating independent, exactly distributed samples from a target probability density function (pdf) using a proposal (or envelope) distribution. Introduced formally by von Neumann in 1951, the procedure is widely used as a basic building block in Bayesian inference, rare event simulation, generative modeling, state-space estimation, and in quantum and classical algorithms across computational sciences.
1. Fundamental Principles and Classical Methodology
Rejection sampling aims to transform the ability to sample from a tractable proposal distribution into the ability to sample from a generally intractable target distribution . Given and satisfying for some constant and all in the support, the procedure is as follows:
- Draw and .
- Accept if ; otherwise, reject and try again.
Each accepted is distributed according to (assuming satisfies the envelope condition). Tightness of —that is, how closely bounds —controls the acceptance rate and thereby the efficiency. The acceptance probability is , where and are the normalizing constants of and , respectively.
Rejection sampling serves as a prototype for perfect simulation and underlies a vast array of accept–reject–correct algorithms in statistics, physics, and machine learning.
2. Advances in Adaptive and Parsimonious Rejection Sampling
Adaptive rejection sampling (ARS) and its variants dynamically refine the proposal based on previously rejected samples, allowing the proposal to approach ever more tightly. Classic ARS constructs a piecewise exponential envelope via tangent lines to the log-concave target pdf, achieving acceptance rates close to unity but at a computational cost that increases with each added support point (1509.07985). To address scalability, Cheap Adaptive Rejection Sampling (CARS) fixes the number of support points, employing a swap mechanism to maintain a bound while ensuring constant per-sample cost, particularly useful for large-scale simulation (1509.07985). Parsimonious ARS (PARS) further refines this by introducing a selective update policy, only adding points when the current proposal significantly underestimates the target and thus holding down envelope complexity without sacrificing acceptance efficiency (1710.04948).
Beyond log-concave targets, adaptive schemes using “reduced potentials” or adaptive ratio-of-uniforms can accommodate multimodal or log-convex-tailed distributions by partitioning the domain and adaptively enhancing local proposals (1111.4942). These methods are especially effective in Bayesian estimation and filtering settings where the tails or the modality structure of the posterior is highly challenging.
Method | Applicability | Computational Cost Control |
---|---|---|
ARS | Univariate, log-concave | Complexity grows with rejections |
CARS | Univariate, log-concave | Constant (user-chosen nodes) |
Parsimonious ARS | Log-concave, scalable | Threshold-based complexity |
Adaptive RoU | Log-convex, multimodal | Iterative geometric refinement |
3. Rejection Sampling in Modern Generative and Bayesian Inference
In generative models and Bayesian inference, rejection sampling provides mechanisms for improving sample quality, enabling resource-efficient post-processing, and achieving exact (or near-exact) marginal draws.
In modern generative adversarial networks (GANs), Discriminator Rejection Sampling (DRS) post-processes generator outputs using the trained discriminator to approximate a density ratio , accepting samples with probability proportional to this ratio normalized by its maximum. This approach has markedly improved sample fidelity and diversity, as evidenced by substantial improvements in Inception Score and Fréchet Inception Distance when applied to large-scale image datasets (1810.06758). The Optimal Budgeted Rejection Sampling (OBRS) framework generalizes this concept by optimally choosing acceptance probabilities to minimize any -divergence, under a fixed sample budget, with theoretical universal optimality for all -divergences, including Rènyi divergences (2311.00460). Importantly, OBRS can be integrated into end-to-end model training, guiding generators to be mass-covering over those regions likely to be accepted post-rejection.
For Bayesian inference, especially in scenarios with streaming data or embedded resource constraints, rejection filtering combines acceptance–rejection and moment tracking to reduce both memory and computational costs (1511.06458). By updating only summary statistics of the posterior, it enables near-real-time adaptation and is applicable in operational settings such as real-time object tracking or adaptive experiment design.
Multilevel rejection sampling (MLMC-ABC) accelerates classic Approximate Bayesian Computation by employing a telescoping sum over successively finer acceptance thresholds, using optimal allocation of computational effort across levels to guarantee i.i.d. sampling properties while controlling variance (1702.03126).
Rejection-based methods are also key in high-dimensional posterior state simulation. Ensemble Rejection Sampling (ERS) circumvents the exponential decay of acceptance rates in long state sequences by crafting ensemble-based proposals and forward-backward dynamic programming, with expected cost scaling only cubically with sequence length under regularity (2001.09188).
4. Extensions and Specializations: Quantum, Diffusion, and Autodifferentiable Rejection Sampling
The principle of rejection sampling extends beyond classical data.
Quantum Rejection Sampling
Quantum rejection sampling adapts the concept to transform an initial superposition into a target via amplitude amplification, with the cost (query complexity) governed by a semidefinite program involving a “water-filling” vector of amplitudes (1103.2774). This primitive appears in algorithms for linear systems (HHL), quantum Metropolis sampling, and hidden shift problems, underlining its centrality in constructing quantum algorithms.
Diffusion Models
Diffusion Rejection Sampling (DiffRS) improves sample quality in modern diffusion-based generative models by embedding rejection checks at each reverse transition step, using discriminator networks to estimate likelihood ratios and adaptively refine or resample at each timestep (2405.17880). This per-timestep correction aligns transition paths more tightly with the true data-generating process, yielding empirically state-of-the-art performance on large-scale benchmarks.
Autodifferentiable Rejection Sampling
Rejection Sampling with Autodifferentiation (RSA) enables differentiable parameter inference in simulation-based models, smoothing out binary accept–reject decisions with gradient-propagatable weights based on likelihood ratios between base and alternate parameterizations (2411.02194). This facilitates gradient-based model tuning without the need for repeated re-simulation, and enables the integration of complex, ML-derived observables as part of the loss function in parameter fitting workflows, exemplified in hadronization model optimization.
5. Specialized Proposals, Efficient Construction, and Theoretical Results
Beyond envelope selection, efficient rejection sampling often requires careful proposal construction or domain partitioning:
- The Vertical Weighted Strips (VWS) framework handles targets of the form by partitioning the support and majorizing in each region. This can readily yield a finite mixture proposal amenable to inverse-CDF sampling, with explicit upper bounds on the rejection probability that guide region refinement (2401.09696).
- The Greedy Poisson Rejection Sampler (GPRS) achieves optimal runtime for one-dimensional cases where the likelihood ratio is unimodal, translating the accept–reject criterion into a greedy search over a Poisson process (2305.15313).
Theoretical advances include provably near-optimal adaptive methods: for instance, Nearest Neighbor Adaptive Rejection Sampling (NNARS) achieves minimax-near-optimal rates (up to logarithmic factors) for s-Hölder densities, with the average loss bounded by (1810.09390).
6. Rejection Rate Minimization and MCMC Efficiency
In Markov chain Monte Carlo (MCMC), the rejection rate is a direct contributor to autocorrelation time and sampling efficiency. By introducing a one-parameter rejection control transition kernel, one can continuously reduce the rejection probability (e.g., via "tower-shift" mechanisms), yielding exponential improvements in integrated autocorrelation time in sequential update regimes and power-law in random update regimes, independent of the detailed kernel mechanics (2208.03935). This provides a robust guiding principle for discrete-variable MCMC: minimizing rejection is paramount for optimal sampler efficiency.
7. Partial and Conditional Rejection: Algorithmic and Combinatorial Sampling
Partial Rejection Sampling (PRS) adapts rejection to combinatorial problems by only resampling variable subsets involved in unsatisfied constraints, rather than the entire configuration (2106.07744). This localized strategy, closely related to the resampling-based Lovász Local Lemma by Moser and Tardos, leads to efficiency gains by exploiting problem structure and dependency graphs. For extremal or quasi-extremal instances, PRS ensures perfect sampling from the conditional product distribution, with applications in uniform sampling of sink-free orientations, spanning trees (cycle popping), root-connected subgraphs, and independent sets. PRS is notable for converting approximate Markov chain strategies into perfect samplers for conditioned distributions in combinatorial spaces.
Rejection sampling, in its classical, adaptive, budgeted, partial, and domain-specialized forms, remains a central theme in computational statistics, generative modeling, Bayesian inference, MCMC, and quantum algorithms. Innovations continue to broaden its applicability, automate its parameterization, and deepen its theoretical understanding, cementing its role as a versatile and robust tool in both theoretical research and practical implementation.