- The paper introduces a novel rejection sampling method that builds adaptive proposals using nonparametric kernel density estimation.
- It provides high-probability performance guarantees to yield i.i.d. samples with improved rejection rates for multimodal, non-log-concave densities.
- It enhances scalability in moderate dimensions by incorporating optimization-based localization strategies to focus on high-density regions.
Pliable Rejection Sampling: A High-Probability Guarantee for Adaptive Proposal Learning
Introduction
The paper "Pliable rejection sampling" (2604.22385) introduces a novel sampling methodology that addresses the inefficiencies of standard rejection sampling (SRS) in scenarios where the proposal distribution is poorly matched to the target density f. By leveraging nonparametric kernel density estimation, pliable rejection sampling (PRS) produces an adaptive proposal that approximates the target and provides explicit high-probability performance guarantees on sample yield per budgeted evaluation of f. This approach avoids the limiting assumptions required by classical and modern adaptive rejection sampling methods and yields i.i.d. samples with high efficiency.
Context: Limitations of Existing Methods
Classical SRS relies on hand-crafted proposals g and envelope constants M such that Mg≥f, resulting in prohibitive rejection rates when f is complex or multimodal. Adaptive rejection sampling (ARS) methods (e.g., [Gilks et al., 1992]) efficiently adapt proposals but are limited to log-concave targets. Extensions such as Adaptive Rejection Metropolis Sampling (ARMS) and A* sampling relax these constraints but sacrifice i.i.d. sampling or require non-trivial problem structure (e.g., a Gumbel-Max decomposition).
Moreover, existing adaptive rejection samplers either impose strong structural assumptions on f (e.g., log-concavity) or lack finite-sample performance guarantees, and often require iterative refinement that adds to computational overhead.
Methodology
PRS builds the proposal function in an adaptive, one-shot manner using kernel regression on N uniformly sampled evaluations of f over the domain [0,A]d. Key innovations include:
- Kernel Estimation for Proposal Construction: A kernel estimator f0 is constructed from f1-evaluations at f2 uniformly sampled points, with bandwidth selection governed by smoothness parameter f3 via f4.
- Uniform Error Control: The estimator is endowed with a high-probability f5 error guarantee uniform over f6, based on modern empirical process results for kernel estimators, as detailed in Theorem 1.
- Pliable Proposal Formation: The final proposal f7 is a mixture of the kernel density estimate and a uniform base, scaled by an explicit additive error term to guarantee f8 everywhere. Critically, the normalizing constant and rejection threshold are explicitly data-driven.
This construction is nonparametric and only assumes that f9 is bounded and locally Hölder-smooth (with g0). This encompasses a much broader function class than log-concavity, covering densities in Besov balls.
The principal theoretical result (Theorem 2) proves that, for a budget of g1 evaluations of g2, the expected number of i.i.d. samples produced by PRS is at least
g3
with probability at least g4.
This rejection rate, decaying polynomially in g5 with exponent g6, is provably negligible for large budgets and mild smoothness g7, contrasting sharply with SRS, whose acceptance is determined by the often intractable global sup-norm bound on g8. In empirical comparison to SRS and A* sampling, PRS consistently provides higher or competitive acceptance rates, especially when only black-box access to g9 is given, and does not require the structural or decomposition information that A* demands.
In contrast to MCMC and ARMS methods, PRS yields i.i.d. samples up to a quantifiable approximation error, and accommodates non-log-concave, multimodal, and unnormalized densities.
Extensions and High-Dimensional Regimes
The paper addresses the high-dimensional degradation common to all rejection sampling schemes. Uniform initial sampling becomes exponentially inefficient in M0, but PRS proposes recourse to optimization-based localization strategies:
- If the bulk of the mass of M1 is concentrated on a small region, and M2 (or a transformation M3) is convex outside a high-density support, first-stage random initialization followed by gradient-based (or Hessian-accelerated) optimization locates regions of non-negligible mass in M4 or M5 steps.
- This localization enables construction of the kernel estimator in an informative region, improving scalability in moderate dimensions.
An extension for unbounded-support densities is discussed: by truncating the sampling domain adaptively or applying two-stage rejection with sub-Gaussian proposals, one can obtain bounds that scale as M6 in M7.
Numerical Results
Comprehensive empirical results validate PRS:
- On a synthetic "peaky" unimodal target, PRS achieves superior acceptance rates to both SRS and A* sampling, especially as budget increases and for moderate peakiness.
- On two-dimensional multimodal targets, PRS acceptance rates approach those of A* and significantly exceed SRS.
- On the challenging "clutter" posterior, PRS consistently outperforms SRS and tracks A* (which is given additional structural information).
- In all cases, the number of i.i.d. samples per M8 evaluation is close to optimal, as predicted by theory.
Implications and Future Directions
The PRS framework provides a new paradigm for rejection sampling in settings where the target density is smooth, potentially multimodal, unnormalized, or lacking restrictive structure. Strong performance guarantees and high empirical efficiency support its deployment in Bayesian inference, probabilistic modeling, and any domain where sample i.i.d.-ness is critical and function evaluations are expensive.
While PRS is fundamentally limited by the curse of dimensionality, the proposed localization heuristics and extension to convex-transformed densities offer promising directions. Future work could integrate iterative proposal refinement, online kernel bandwidth adaptation, or exploit structure in M9 (e.g., via score estimation or Stein operators) to further reduce rejection rates and scale towards higher Mg≥f0.
Conclusion
Pliable rejection sampling merges kernel-based density estimation with rejection sampling to yield an adaptive, nonparametric proposal mechanism with explicit high-probability efficiency guarantees. This approach removes much of the ad hoc or restrictive nature of previous adaptive samplers, provides i.i.d. sampling with minimal budget wastage, and is broadly applicable in settings of black-box smooth target densities. Remaining challenges include high-dimensional scalability and online adaptation, which present fertile directions for continued research.