Adaptive Rejection Sampling (ARS)

Updated 19 August 2025

Adaptive Rejection Sampling (ARS) is a method for simulating samples from univariate distributions by adaptively refining an envelope based on the target density, particularly effective for log-concave cases.
It employs tangent lines to construct exponential envelopes and has been extended with techniques like the ratio-of-uniforms and MCMC-integrated ARMS to address non-log-concave, multimodal, and heavy-tailed densities.
Recent innovations focus on managing computational complexity through fixed support point methods (like CARS and PARS) and integrating gradient-based optimization for improved efficiency in Bayesian and deep learning contexts.

Adaptive Rejection Sampling (ARS) is a class of algorithms for simulating independent samples from univariate probability density functions (pdfs) via rejection sampling whose proposal distribution is sequentially and adaptively refined. The ARS framework has played a crucial role in computational statistics, Bayesian estimation, Monte Carlo methods, and a broad range of scientific domains, and has spawned a large family of algorithmic variants suited to both log-concave and non-log-concave densities, multimodal targets, and modern machine learning practice.

1. Fundamentals of ARS and the Log-Concave Case

Canonical ARS algorithms address the problem of generating a sample from a target density $p(x)$ , known up to a normalization constant, by constructing an envelope or proposal density $\pi(x)$ that dominates $p(x)$ . The classical ARS scheme [Gilks and Wild, 1992] applies when $p(x)$ is log-concave—i.e., $\log p(x)$ is a concave function—enabling the use of piecewise linear tangent lines to build an exponential envelope for $\log p(x)$ . At each ARS iteration:

Support points $\{s_1, \ldots, s_m\}$ are maintained. Tangent lines at these points construct the envelope;
Candidates are drawn from the current proposal $\pi(x)$ and accepted with probability $u \leq p(x)/\pi(x)$ , $u \sim U[0,1]$ ;
Rejected candidates augment the set of support points, tightening the envelope and raising the acceptance rate.

Log-concavity ensures the envelope always dominates $p(x)$ ; with each rejection, the proposal adapts and acceptance probability increases, approaching $1$ asymptotically. The computational cost, however, increases with the number of support points, so efficiency in balancing envelope tightness and complexity is paramount.

2. Extensions Beyond Log-Concave Targets

A major limitation of standard ARS is its restriction to log-concave densities. Many practical distributions—those that are multimodal, admit non-concave regions, or possess log-convex tails—do not fit this criterion. Two notable developments generalizing ARS beyond this setting are:

Marginal-Potential Decomposition ARS: The target's potential $V(x;g)$ is decomposed as $\sum_{i=1}^n \bar{V}_i(g_i(x))$ , with each $\bar{V}_i$ convex and $g_i(x)$ convex or concave. One selects a dominant term (typically corresponding to a tractable prior), forms a proposal from this piece, and adaptively upper-bounds the remainder by piecewise constants within partitions of the domain, refining these as rejected samples are accrued (Martino et al., 2011).
Adaptive Ratio-of-Uniforms (RoU) ARS: The RoU transformation constructs the set $\mathcal{A} = \{(v,u): 0 \leq u \leq \sqrt{p(v/u)}\}$ and produces samples $x = v/u$ from $p(x)$ by sampling $(v,u)$ uniformly from $\mathcal{A}$ . Practical implementation covers $\mathcal{A}$ with a union of adaptively-refined triangles, using upper bounds for $\sqrt{p(x)}$ and $x\sqrt{p(x)}$ over intervals partitioned by support points. This approach admits multimodal and heavy-tailed densities, even escaping the need for explicit analytic derivatives (Martino et al., 2011).

Both strategies allow application to univariate targets with arbitrarily-shaped tails, multimodality, or non-differentiability, expanding the utility of ARS to models such as non-Gaussian hierarchical Bayes, stochastic volatility filtering, and other intractable settings.

3. ARS Within Markov Chain Monte Carlo and Metropolis Extensions

ARS has been integrated within Markov Chain Monte Carlo (MCMC) methods to facilitate the generation of valid proposals in high-dimensional spaces. The Adaptive Rejection Metropolis Sampling (ARMS) extension addresses the log-concave limitation by:

Allowing envelopes that can fall below the log density in some regions;
Introducing a Metropolis–Hastings (MH) correction after the rejection–sampling test, ensuring samples are ultimately distributed according to the target even if the envelope is imperfect (Martino et al., 2012);
Support points are added only if the RS test fails (envelope exceeds target), leading to potential adaptation stagnation if in regions where $\pi(x) < p(x)$ .

Recent works have identified and fixed a crucial deficiency in the ARMS adaptation rule: when $\pi(x) < p(x)$ , support points are never added, causing the proposal to stagnate in subdominant regions. Improved algorithms such as A²RMS and IA²RMS stochastically add support points even in these regions (based on a secondary uniform test), ensuring domain-wide adaptation and guaranteeing convergence ("diminishing adaptation," "bounded convergence") (Martino et al., 2012).

4. Algorithmic Variants and Efficiency-Oriented Methods

Algorithmic developments have targeted the computational overhead that arises as envelopes become increasingly refined:

Cheap Adaptive Rejection Sampling (CARS): Maintains a fixed number $M$ of support points throughout sampling. After a rejection, CARS considers swapping the nearest support point with the rejected sample only if this reduces the proposal's normalizing constant, thus adaptively optimizes the envelope with controllable complexity. Acceptance rates are slightly reduced relative to unconstrained ARS, but per-sample cost remains constant, enabling superior performance for large sample sizes (Martino et al., 2015).
Parsimonious ARS (PARS): Introduces a user-tunable threshold $\delta$ (acceptance probability below which support points are augmented). This method balances acceptance rate and envelope complexity—limiting support growth to genuinely suboptimal regions and reducing evaluation overhead (Martino, 2017).

These envelope-complexity–controlled ARS schemes substantially reduce runtime for large-scale simulation tasks, striking a pragmatic trade-off between acceptance efficiency and proposal sampling cost.

5. Minimax-Optimal and Data-Driven ARS

Theoretical analysis of ARS has produced both lower bounds—quantifying inherent limitations—and adaptive algorithms attaining near-minimax performance:

Nearest Neighbor Adaptive Rejection Sampling (NNARS): Eschews restrictive assumptions such as log-concavity, requiring only Hölder continuity of the target density. NNARS constructs piecewise-constant envelopes using nearest neighbor estimators plus a confidence margin, adaptively refining proposals with guarantees on validity ( $f(x) \leq M_k g_k(x)$ at each round). Its rejection rate is $O(\log^2(n) n^{1-s/d})$ , matching the first minimax lower bound (to logarithmic factors) over the class of $s$ -Hölder densities (Achdou et al., 2018).
Gradient-Refined Proposals and Autodiff-Enabled ARS: Advanced approaches automate envelope optimization via differentiable loss minimization. For example, the ERS method uses gradient descent to optimize a Gaussian mixture model envelope towards minimizing the maxima of $f(x)/g(x)$ over observed samples, achieving significantly improved acceptance rates without user derivation or specification of bounds (Raff et al., 2023). Autodifferentiation also enables efficient parameter estimation via smooth reweighting of rejection-sampled events, facilitating gradient-based learning in simulation-based inference tasks (Heller et al., 4 Nov 2024).

6. Adaptive Rejection Sampling in Practice and Specialized Contexts

Recent ARS research has produced a variety of specialized variants for modern computational problems:

Multimodal/Heavy-Tailed and Multivariate Distributions: RoU-based ARS and hit-and-run ARMS methodologies enable sampling from target distributions that are multimodal or exhibit difficult tail behavior, including via random direction searches to avoid isolated subspaces in high dimensions (Martino et al., 2011, Zhang et al., 2015).
Likelihood-Free and Likelihood-Tempered Inference: Multilevel Rejection Sampling (MLMC-ABC) combines independent rejection sampling with MLMC variance reduction and coupling, efficiently generating i.i.d. samples even when the likelihood is intractable, as in approximate Bayesian computation (Warne et al., 2017).
Applications in Privacy, Information Theory, and Controlled Generation:
- Privacy-aware ARS quantifies and corrects the privacy cost from runtime durations of rejection samplers under (Rényi) differential privacy, with modifications to ensure constant-time or noise-perturbed sampling (Awan et al., 2021).
- Adaptive Greedy Rejection Sampling (AGRS) supports simulators for communication channels, data compression, and relative entropy coding, adapting the proposal distribution dynamically and operating in general (even non-log-concave) probability spaces (Flamich et al., 2023).
- Vertical Weighted Strips (VWS) ARS constructs proposal mixtures majorizing a weight function instead of the full target density, permitting efficient sampling beyond log-concavity and providing explicit bounds on rejection rates, e.g., for the von Mises Fisher distribution (Raim et al., 18 Jan 2024).
- Modern constrained language-model generation leverages ARS-inspired adaptive token selection with importance weighting to efficiently and correctly generate outputs under complex constraints, outperforming token-masking and other constrained decoding paradigms (Lipkin et al., 7 Apr 2025).

7. Connections to Variational Inference, Deep Models, and Theory

The ARS principle has influenced recent advances in flexible posterior inference and deep generative models:

Learned Accept/Reject Sampling (LARS): Learned accept/reject functions (parameterized as neural networks) transform simple priors into expressive flexible posteriors within variational autoencoders, effectively generalizing ARS principles to settings with implicit or intractable target densities (Bauer et al., 2018). Truncation strategies ensure computational tractability, while learnable proposals enhance expressiveness.
Refined Divergence-Based Inference: Variational approximations based on α-divergences (RDVI) can be systematically improved by integrating ARS, leveraging the connection between $D_\infty(p\|q)$ and the minimal envelope constant for rejection sampling. This produces a two-stage variational-inference scheme that provably reduces divergence and improves empirical estimation in challenging settings (Sharma et al., 2019).

These connections highlight both the theoretical significance of the ARS envelope construction (e.g., minimax lower bounds, divergence minimization) and the versatility of ARS-inspired adaptation in deep learning and simulation-based inference.

Summary Table: Selected ARS Algorithmic Contributions

Algorithm/Method	Key Features	Reference
Classical ARS	Log-concave targets, tangent envelope, adaptivity	[Gilks1992], (Martino et al., 2011)
Marginal-Potential ARS	Non-log-concave, decomposed potential, piecewise bounds	(Martino et al., 2011)
Ratio-of-Uniforms ARS	Robust to log-convex tails, geometric covering	(Martino et al., 2011)
ARMS/A²RMS/IA²RMS	MCMC setting, domain-wide adaptation, convergence allowed	(Martino et al., 2012)
Hit-and-Run ARMS	Multimodal/multivariate, arbitrary direction updates	(Zhang et al., 2015)
CARS/PARS	Fixed/node-threshold envelope complexity, speed up	(Martino et al., 2015, Martino, 2017)
NNARS	Near minimax-optimal, Hölder classes, data-driven	(Achdou et al., 2018)
ERS	Gradient-refined, autodiff, no hand-tuned proposal	(Raff et al., 2023)
VWS	Weighted mixture envelope, adaptivity, explicit bounds	(Raim et al., 18 Jan 2024)
AGRS	Channel simulation, info-theoretic coding	(Flamich et al., 2023)
AWRS (LM generation)	Efficient constraint-aware decoding, exact sampling	(Lipkin et al., 7 Apr 2025)

In conclusion, adaptive rejection sampling constitutes a foundational methodology at the intersection of computational statistics, Bayesian inference, and modern simulation-based learning. From its seminal log-concave origins through nonparametric, multimodal, and deep-learning extensions, ARS methodology continues to evolve—yielding principled, efficient, and widely applicable sampling tools.