Adaptive Rejection Metropolis Sampling (ARMS)
- Adaptive Rejection Metropolis Sampling (ARMS) is an adaptive MCMC method that builds a piecewise linear envelope over the log-density to sample efficiently from both log-concave and non-log-concave targets.
- ARMS enhances classical adaptive rejection sampling by incorporating a Metropolis-Hastings correction step, thus extending its applicability to multidimensional and multimodal problems.
- Extensions like HARARMS, A²RMS, and IA²RMS have demonstrated reduced bias, improved convergence, and efficient mode exploration in complex Bayesian models.
Adaptive Rejection Metropolis Sampling (ARMS) is an adaptive Markov Chain Monte Carlo (MCMC) method designed for efficient sampling from univariate and multivariate target densities, particularly those that are not log-concave. ARMS combines adaptive proposal construction—via piecewise linear envelopes over the log-density—with a Metropolis-Hastings (MH) correction, thereby extending the range and efficiency of Adaptive Rejection Sampling (ARS) to non-log-concave and multidimensional settings. Extensions of ARMS, such as Hit-and-Run ARMS (HARARMS), address slow mixing and mode-trapping in high-dimensional, multimodal targets by introducing random direction proposals (Zhang et al., 2015). Further advances have improved ARMS’ adaptivity and convergence properties in univariate contexts (Martino et al., 2012).
1. Classical ARMS: Principles and Workflow
ARMS operates on the principle of constructing and iteratively refining an adaptive envelope function such that for all , where is the unnormalized target density. The envelope is initialized using a set of support points . Piecewise linear segments (secants) are used, and the pointwise maximum of these secants forms the envelope:
where denotes the line connecting and .
This envelope defines an instrumental density:
The ARMS iteration consists of:
- Envelope–Reject Step: Draw 0 and 1. If 2, 3 is an ARS candidate; otherwise, insert 4 into 5, update 6 and 7.
- Metropolis–Hastings Step: From current 8, calculate acceptance probability
9
where 0. With probability 1, accept 2; otherwise, retain 3.
This approach allows for adaptive proposal densities and robust sampling from targets with complex structure (Zhang et al., 2015, Martino et al., 2012).
2. Limitations in Multivariate and Non-Log-Concave Settings
ARMS, when applied in a coordinate-wise fashion (e.g., embedded in Gibbs sampling), suffers from two principal limitations in high-dimensional and multimodal problems:
- Trapping in Local Subspaces: If target modes lie off coordinate axes, Gibbs–ARMS can become confined to a subset of these modes, missing others entirely. For example, in a four-component mixture of bivariate normals, Gibbs–ARMS failed to sample from a "diagonal" mode over 10,000 iterations (Zhang et al., 2015).
- Lethargy and Slow Mixing in Nonlinear Regression: In free-knot spline regression, the posterior on knot locations is highly multimodal with "lethargy"—i.e., samplers struggle to move between basins separated by high ridges. Axis-aligned proposals exacerbate this, yielding slow convergence.
A key structural deficiency is that ARMS updates only along coordinate axes; thus, in high dimensions or complex landscapes, the algorithm may never escape certain subspaces or modes (Zhang et al., 2015).
3. Extensions: Hit-and-Run ARMS (HARARMS) and Adaptive Schemes
3.1 Hit-and-Run ARMS (HARARMS)
HARARMS integrates Hit-and-Run random direction proposals with the ARMS envelope/MH correction scheme. At each iteration:
- Random Direction Selection: From current 4, generate a random direction 5 with 6 or 7.
- Line-Based ARMS: Consider the univariate target 8 along the line 9. Build the ARMS envelope 0.
- Envelope–Reject and MH Correction in 1: Sample 2 using ARMS on 3. Set 4; accept 5 with MH probability
6
- State Update: 7.
This procedure allows proposals in arbitrary directions, greatly increasing the probability of visiting isolated modes and facilitating efficient exploration of multimodal, non-convex landscapes (Zhang et al., 2015).
3.2 Improved Adaptive Proposals: A²RMS and IA²RMS
Classical ARMS only adds support points when 8—i.e., in regions where the envelope dominates the target. If 9 on an interval, the envelope never adapts there, potentially missing entire modes or structure.
Two improved schemes address this:
- A²RMS (Augmented ARMS): Upon a RS pass (0), flip 1; if 2, add 3 to 4. This ensures adaptation even where 5, while preserving the diminishing adaptation property for ergodicity.
- IA²RMS (Independent Adaptive ARMS): Maintain proposals independent of the current state by conditioning on "loser" points of the MH trial for proposal adaptation, ensuring theoretical convergence properties of adaptive independent MH (Martino et al., 2012).
Both schemes demonstrably reduce bias, variance, and autocorrelation, leading to improved performance (see Section 5 for empirical results).
4. Empirical Evidence and Performance Comparison
Multiple empirical studies demonstrate the performance improvements yielded by HARARMS and improved adaptive ARMS methods:
- HARARMS in Multivariate Mode Recovery: In a four-component bivariate normal mixture, HARARMS (unlike Gibbs–ARMS) sampled all four modes in correct proportions within the same computational budget (10,000 steps), confirming the benefit of random direction updates (Zhang et al., 2015).
- Free-Knot Spline Regression: In both linear and quadratic settings, HARARMS consistently recovered the true number and location of knots, with the log-likelihood exhibiting a sharp elbow at the correct 6, and credible intervals for knot locations being tight around the ground truth (Zhang et al., 2015).
- Classic ARMS vs. Improved Schemes: For a multimodal univariate mixture, A²RMS and IA²RMS reduced bias (from 0.161 to 0.124), standard deviation (from 0.613 to 0.219), and lag-1 autocorrelation (from 0.77 to 0.02), while decreasing 7-distance between proposal and target (from 8.05 to 0.056), compared to classical ARMS (Martino et al., 2012).
| Method | Bias | Std | Lag-1 Autocor. | 8 Dist. |
|---|---|---|---|---|
| ARMS | 0.161 | 0.613 | 0.77 | 8.05 |
| A²RMS | 0.138 | 0.226 | 0.02 | 0.058 |
| IA²RMS | 0.124 | 0.219 | 0.02 | 0.056 |
A²RMS and IA²RMS maintain computational costs comparable to ARMS—pseudocode requires only a single extra uniform variate and coin flip per iteration—while providing significant gains in mixing and accuracy (Martino et al., 2012).
5. Theoretical Properties and Applicability
Both HARARMS and improved ARMS schemes retain desirable properties under appropriate regularity conditions:
- Ergodicity and Convergence: HARARMS inherits uniform ergodicity and rapid mixing rates of hit-and-run samplers; A²RMS and IA²RMS satisfy diminishing adaptation condition necessary for ergodic convergence (Zhang et al., 2015, Martino et al., 2012).
- Computational Complexity: The dominant iteration cost is 9, where 0 is the dimension (for random direction generation) and 1 is the number of envelope segments. Typically, a small number of envelope insertions occur per accept/reject cycle.
- General Applicability: Methods apply to any continuous density where pointwise evaluation is feasible, including non-log-concave, multimodal, and constrained models relevant to Bayesian regression, mixture modeling, model selection, and change-point analysis (Zhang et al., 2015).
6. Context, Impact, and Future Directions
ARMS and its extensions address critical challenges in MCMC for non-log-concave, multimodal, and high-dimensional targets. By combining adaptive envelope construction, Metropolis–Hastings correction, and—in HARARMS—random direction proposal generation, these methods achieve efficient exploration without requiring target-specific tuning or restrictive assumptions.
Common misconceptions include the belief that ARMS is limited to log-concave or univariate targets; in fact, while classical ARMS’ efficiency degrades in higher dimensions, HARARMS and improved adaptation strategies enable tractable, robust sampling in broader contexts.
A plausible implication is that these strategies may be extended to constrained or high-dimensional problems beyond those explored in regression and mixture settings, subject to further investigation of scalability and adaptation mechanics in very high dimensional spaces.
Key references: (Zhang et al., 2015, Martino et al., 2012).