Pliable rejection sampling

Published 24 Apr 2026 in stat.ML and cs.LG | (2604.22385v1)

Abstract: Rejection sampling is a technique for sampling from difficult distributions. However, its use is limited due to a high rejection rate. Common adaptive rejection sampling methods either work only for very specific distributions or without performance guarantees. In this paper, we present pliable rejection sampling (PRS), a new approach to rejection sampling, where we learn the sampling proposal using a kernel estimator. Since our method builds on rejection sampling, the samples obtained are with high probability i.i.d. and distributed according to f. Moreover, PRS comes with a guarantee on the number of accepted samples.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces a novel rejection sampling method that builds adaptive proposals using nonparametric kernel density estimation.
It provides high-probability performance guarantees to yield i.i.d. samples with improved rejection rates for multimodal, non-log-concave densities.
It enhances scalability in moderate dimensions by incorporating optimization-based localization strategies to focus on high-density regions.

Pliable Rejection Sampling: A High-Probability Guarantee for Adaptive Proposal Learning

Introduction

The paper "Pliable rejection sampling" (2604.22385) introduces a novel sampling methodology that addresses the inefficiencies of standard rejection sampling (SRS) in scenarios where the proposal distribution is poorly matched to the target density $f$ . By leveraging nonparametric kernel density estimation, pliable rejection sampling (PRS) produces an adaptive proposal that approximates the target and provides explicit high-probability performance guarantees on sample yield per budgeted evaluation of $f$ . This approach avoids the limiting assumptions required by classical and modern adaptive rejection sampling methods and yields i.i.d. samples with high efficiency.

Context: Limitations of Existing Methods

Classical SRS relies on hand-crafted proposals $g$ and envelope constants $M$ such that $Mg \geq f$ , resulting in prohibitive rejection rates when $f$ is complex or multimodal. Adaptive rejection sampling (ARS) methods (e.g., [Gilks et al., 1992]) efficiently adapt proposals but are limited to log-concave targets. Extensions such as Adaptive Rejection Metropolis Sampling (ARMS) and A* sampling relax these constraints but sacrifice i.i.d. sampling or require non-trivial problem structure (e.g., a Gumbel-Max decomposition).

Moreover, existing adaptive rejection samplers either impose strong structural assumptions on $f$ (e.g., log-concavity) or lack finite-sample performance guarantees, and often require iterative refinement that adds to computational overhead.

Methodology

PRS builds the proposal function in an adaptive, one-shot manner using kernel regression on $N$ uniformly sampled evaluations of $f$ over the domain $[0,A]^d$ . Key innovations include:

Kernel Estimation for Proposal Construction: A kernel estimator $f$ 0 is constructed from $f$ 1-evaluations at $f$ 2 uniformly sampled points, with bandwidth selection governed by smoothness parameter $f$ 3 via $f$ 4.
Uniform Error Control: The estimator is endowed with a high-probability $f$ 5 error guarantee uniform over $f$ 6, based on modern empirical process results for kernel estimators, as detailed in Theorem 1.
Pliable Proposal Formation: The final proposal $f$ 7 is a mixture of the kernel density estimate and a uniform base, scaled by an explicit additive error term to guarantee $f$ 8 everywhere. Critically, the normalizing constant and rejection threshold are explicitly data-driven.

This construction is nonparametric and only assumes that $f$ 9 is bounded and locally Hölder-smooth (with $g$ 0). This encompasses a much broader function class than log-concavity, covering densities in Besov balls.

Algorithmic Performance and Guarantees

The principal theoretical result (Theorem 2) proves that, for a budget of $g$ 1 evaluations of $g$ 2, the expected number of i.i.d. samples produced by PRS is at least

$g$ 3

with probability at least $g$ 4.

This rejection rate, decaying polynomially in $g$ 5 with exponent $g$ 6, is provably negligible for large budgets and mild smoothness $g$ 7, contrasting sharply with SRS, whose acceptance is determined by the often intractable global sup-norm bound on $g$ 8. In empirical comparison to SRS and A* sampling, PRS consistently provides higher or competitive acceptance rates, especially when only black-box access to $g$ 9 is given, and does not require the structural or decomposition information that A* demands.

In contrast to MCMC and ARMS methods, PRS yields i.i.d. samples up to a quantifiable approximation error, and accommodates non-log-concave, multimodal, and unnormalized densities.

Extensions and High-Dimensional Regimes

The paper addresses the high-dimensional degradation common to all rejection sampling schemes. Uniform initial sampling becomes exponentially inefficient in $M$ 0, but PRS proposes recourse to optimization-based localization strategies:

If the bulk of the mass of $M$ 1 is concentrated on a small region, and $M$ 2 (or a transformation $M$ 3) is convex outside a high-density support, first-stage random initialization followed by gradient-based (or Hessian-accelerated) optimization locates regions of non-negligible mass in $M$ 4 or $M$ 5 steps.
This localization enables construction of the kernel estimator in an informative region, improving scalability in moderate dimensions.

An extension for unbounded-support densities is discussed: by truncating the sampling domain adaptively or applying two-stage rejection with sub-Gaussian proposals, one can obtain bounds that scale as $M$ 6 in $M$ 7.

Numerical Results

Comprehensive empirical results validate PRS:

On a synthetic "peaky" unimodal target, PRS achieves superior acceptance rates to both SRS and A* sampling, especially as budget increases and for moderate peakiness.
On two-dimensional multimodal targets, PRS acceptance rates approach those of A* and significantly exceed SRS.
On the challenging "clutter" posterior, PRS consistently outperforms SRS and tracks A* (which is given additional structural information).
In all cases, the number of i.i.d. samples per $M$ 8 evaluation is close to optimal, as predicted by theory.

Implications and Future Directions

The PRS framework provides a new paradigm for rejection sampling in settings where the target density is smooth, potentially multimodal, unnormalized, or lacking restrictive structure. Strong performance guarantees and high empirical efficiency support its deployment in Bayesian inference, probabilistic modeling, and any domain where sample i.i.d.-ness is critical and function evaluations are expensive.

While PRS is fundamentally limited by the curse of dimensionality, the proposed localization heuristics and extension to convex-transformed densities offer promising directions. Future work could integrate iterative proposal refinement, online kernel bandwidth adaptation, or exploit structure in $M$ 9 (e.g., via score estimation or Stein operators) to further reduce rejection rates and scale towards higher $Mg \geq f$ 0.

Conclusion

Pliable rejection sampling merges kernel-based density estimation with rejection sampling to yield an adaptive, nonparametric proposal mechanism with explicit high-probability efficiency guarantees. This approach removes much of the ad hoc or restrictive nature of previous adaptive samplers, provides i.i.d. sampling with minimal budget wastage, and is broadly applicable in settings of black-box smooth target densities. Remaining challenges include high-dimensional scalability and online adaptation, which present fertile directions for continued research.

Markdown Report Issue