Probabilistic Rejection Sampling

Updated 12 June 2026

Probabilistic rejection sampling is a Monte Carlo method that generates independent samples using a tractable proposal distribution and an accept/reject mechanism based on density ratios.
It computes an acceptance probability as f(x)/(Mq(x)) to ensure exact sampling from both continuous and discrete target distributions under guaranteed bounds.
Recent advances incorporate adaptive and nonparametric proposals, such as NNARS and PRS, which improve efficiency and reduce computational cost in practical applications.

Probabilistic rejection sampling is a foundational class of Monte Carlo algorithms for generating exact independent samples from complex or intractable probability distributions by leveraging a tractable proposal distribution and a stochastic accept/reject criterion. The method applies broadly in discrete, continuous, and geometric probability settings, and underlies numerous theoretical results and practical algorithms in stochastic simulation, computational statistics, and machine learning.

1. The Rejection Sampling Principle

At its core, rejection sampling generates proposals $x$ from a tractable distribution $q(x)$ and accepts each proposal with probability proportional to the ratio of the (unnormalized) target density $f(x)$ to the scaled proposal $Mq(x)$ , where $M$ is a known constant satisfying $f(x) \leq Mq(x)$ for all $x$ . The protocol ensures that the distribution of accepted samples matches the target $f(x)/Z$ exactly, with $Z$ the normalizing constant. Explicitly, the acceptance probability is $\alpha(x) = f(x)/(M q(x))$ , and the expected number of proposals per accepted sample is $q(x)$ 0 (Naesseth et al., 2016, Raim et al., 2024).

This principle extends seamlessly to discrete distributions: given $q(x)$ 1 as the probability weights and $q(x)$ 2 a proposal mass function, the acceptance rule is $q(x)$ 3 for $q(x)$ 4 (Draper et al., 5 Apr 2025, Chewi et al., 2021). The method assumes only that $q(x)$ 5 is easy to sample from and to evaluate.

2. Information-Theoretic and Computational Aspects

Knuth and Yao's entropy theory of discrete nonuniform random variate generation characterizes the bit-complexity of exact samplers, showing that any perfect sampler requires at least $q(x)$ 6 random bits per draw on average (where $q(x)$ 7 is the target entropy), with known constructions achieving average bit-costs in $q(x)$ 8 but often with exponential space requirements (Draper et al., 5 Apr 2025). Practical algorithms, notably the ALDR (Amplified Loaded Dice Roller) family, achieve bit-costs in the entropy-optimal range $q(x)$ 9 with only $f(x)$ 0 space and preprocessing for rational $f(x)$ 1-type $f(x)$ 2 (Draper et al., 5 Apr 2025).

For real-valued problems, the number of "randomness-revealing steps" (e.g., revealed bits) required for accept/reject decisions is a key metric. For monotone increasing densities on $f(x)$ 3, the expected number of bit queries needed to decide $f(x)$ 4 for $f(x)$ 5-variate samples ranges from $f(x)$ 6 to $f(x)$ 7 in the worst case (Langevin et al., 29 Sep 2025), setting structural lower and upper bounds for adaptive algorithms probing random variates bitwise.

3. Adaptive and Nonparametric Proposal Construction

Naive rejection sampling is often inefficient due to high rejection rates for poorly chosen proposals. Recent techniques focus on adaptive or learned proposals:

Nearest-Neighbor Adaptive Rejection Sampling (NNARS): Constructs piecewise-constant nonparametric envelopes on the basis of accumulative sample histories. For Hölder-continuous $f(x)$ 8 on $f(x)$ 9, NNARS achieves minimax near-optimal global rejection rates $Mq(x)$ 0 for $Mq(x)$ 1 total function evaluations, assuming only $Mq(x)$ 2 regularity and $Mq(x)$ 3 (Achdou et al., 2018).
Pliable Rejection Sampling (PRS): Uses a kernel density estimate from a pilot sample and adds a "floor" uniform component to create a data-dependent proposal $Mq(x)$ 4. Under smoothness assumptions, PRS ensures with high probability (over proposal construction) that all accepted samples are exactly i.i.d. from $Mq(x)$ 5, with explicit guarantees on the fraction of accepted samples as a function of the computational budget and regularity $Mq(x)$ 6 (Erraqabi et al., 24 Apr 2026).
Gradient-Refined Rejection Sampling: For differentiable $Mq(x)$ 7, fits a parameterized proposal (e.g., a truncated Gaussian mixture) and optimizes it directly toward minimizing the empirical sup-norm of $Mq(x)$ 8, updating adaptively to maximize acceptance and requiring no special analytic bounding (Raff et al., 2023).

For univariate log-concave targets, classical Adaptive Rejection Sampling (ARS) maintains a set of support points and builds a piecewise-linear exponential envelope; acceptance rates approach unity as more points are added (Martino et al., 2015), but computational cost per draw increases. Fixed-node "Cheap ARS" (CARS) versions balance acceptance and per-draw computational cost by fixing the number of nodes and swapping them adaptively (Martino et al., 2015).

4. Structured Proposal Schemes and Partition-Based Methods

For targets with partially structured or composite form, proposal construction via weighted mixtures or vertical partitioning yields efficient exact samplers:

Vertical Weighted Strips (VWS): For targets $Mq(x)$ 9, with $M$ 0 a tractable base density and $M$ 1 an arbitrary nonnegative weight, VWS partitions the domain, and on each strip constructs a local constant upper bound of $M$ 2, forming a finite mixture envelope $M$ 3. The rejection rate is controlled via an explicit upper bound involving these subregion majorants and minorants (Raim et al., 2024); adaptive refinement focuses effort on regions making the largest contribution to non-tightness.
Self-Tuned VWS in Gibbs Samplers: When applying VWS within large-scale Gibbs samplers (e.g., small area estimation applications), the proposal is gradually refined and strips with negligible contribution are removed, keeping computational complexity tractable and acceptance rates high (often $M$ 490%) even for non-logconcave or non-standard conditionals (Raim et al., 21 Sep 2025).

5. Specialized Methods: Geometric, Divide-and-Conquer, and Data-Augmentation

Rejecting sampling also underlies advances in geometrically structured problems and tractable simulation of complex models:

Curvature-Based Rejection Sampling (CURS): On Riemannian manifolds, when the target is radially symmetric around a point ( $M$ 5), volume comparison theorems (e.g., Bishop-Gromov) deliver global comparison envelopes: the proposal is derived from a constant-curvature model $M$ 6, and the acceptance probability is exactly controlled by integrals of $M$ 7 (Maia et al., 28 Oct 2025).
Probabilistic Divide and Conquer (PDC): PDC algorithms improve classical rejection cost, sometimes from superlinear to constant, by factorizing the proposal and conditioning, recursively or non-recursively, on partial information about the sample (Arratia et al., 2011). In integer partition sampling, recursive PDC leverages self-similarity in generating functions and bit-peeling, yielding constant expected rejection cost, in contrast to naive rejection's polynomial growth.
Data Augmentation for Doubly-Intractable Posteriors: When data-generation in a model involves rejection sampling, one can introduce the explicit latent variables corresponding to rejected proposals. This augmentation often simplifies the joint distribution and enables efficient MCMC for otherwise doubly-intractable posteriors by alternating between updating model parameters and regenerating rejected proposals via rerunning the rejection sampler (Rao et al., 2014).

6. Extensions: Universal Probabilistic Programming and Rejection in MCMC

In universal probabilistic programming systems, explicit rejection sampling loops inside user programs can cause variance pathologies for amortized importance-sampling estimators. Collapsing out rejection loops into collapsed weights, corrected by estimates of acceptance probabilities, yields unbiased and finite-variance estimators regardless of unbounded rejection-loop depths, as shown in amortized rejection-sampling estimators (Naderiparizi et al., 2019). This approach is implemented in open-source platforms and enables rigorous, automated inference in programs with explicit user rejection loops.

Further, in Markov chain Monte Carlo (MCMC), locally optimal control of rejection rates can have dramatic algorithmic effects. In discrete-variable MCMC, controlling rejection rate via a one-parameter transition kernel provides exponential speedup in autocorrelation time, and the direct relationship between rejection probability and sampling efficiency is explicated and exploited (Suwa, 2022).

The field of probabilistic rejection sampling continues to integrate advances from nonparametric density estimation, geometric analysis, computational theory, and adaptive MCMC, delivering both theoretical optimality (in minimax and entropy senses) and state-of-the-art practical performance across complex simulation and inference contexts. Each method brings provable guarantees under explicit regularity, smoothness, or combinatorial conditions, and the growing ecosystem of adaptive, partitioned, and learned-proposal strategies expands the applicability of rejection sampling for contemporary statistical and computational challenges.