Acceptance-Rejection Reparameterization
- Acceptance-Rejection Reparameterization is a method that makes non-differentiable accept/reject decisions differentiable, enabling gradient-based optimization in probabilistic models.
- It reduces variance in gradient estimators by using pathwise reparameterization and analytic differentiation, thereby enhancing efficiency in variational inference and MCMC.
- Practical implementations in Gamma and Dirichlet distributions illustrate how tuning acceptance thresholds trades off between approximation fidelity and computational cost.
Acceptance-rejection reparameterization refers to a class of techniques that enable gradient-based learning and efficient sampling in the presence of acceptance-rejection mechanisms, particularly in variational inference (VI) and Markov Chain Monte Carlo (MCMC). These methods reparameterize the stochastic non-differentiable accept/reject decisions to provide differentiability or improve sampler efficiency, thus extending the toolkit for probabilistic modeling where standard reparameterization or straightforward MCMC falls short. This article covers foundational constructions, algorithmic elaboration, variance reduction properties, and representative applications, referencing key developments in variational inference and MCMC.
1. Acceptance-Rejection in Variational Inference
Classic variational inference methods use parametric variational families that are often limited in expressivity. Variational rejection sampling (VRS) (Jankowiak et al., 2023) expands this space by defining the variational density through a smoothed acceptance-rejection process. The VRS density is given by
where , is the logistic function, is a scalar threshold, and is the normalization constant . Here, samples are proposed from and accepted probabilistically with . This construction yields a continuous, nonparametric variational family that reflects both the proposal and target densities, allowing controlled interpolation between variational and exact inference by tuning .
2. Pathwise Reparameterization for Low-Variance Gradients
A major challenge in optimizing objectives involving acceptance-rejection is the high variance of naïve score-function (REINFORCE) estimators. VRS originally relied on the covariance-based estimator
with . The acceptance-rejection reparameterization, introduced in Reparameterized VRS (RVRS) (Jankowiak et al., 2023) and RS-VI (Naesseth et al., 2016), leverages the existence of a deterministic, differentiable function , , to obtain pathwise gradients. By marginalizing over the accept/reject randomness and analytically differentiating through the smoothed or marginalized acceptance function, one obtains low-variance unbiased estimators suitable for black-box variational inference.
Key RVRS gradient (Proposition 1, Eq. 9 in (Jankowiak et al., 2023)):
This estimator combines the pathwise (reparameterization) component with analytic derivatives of the acceptance function, leading to practical implementation via automatic differentiation and variance reduction by an order of magnitude or more.
3. Reparameterization through Acceptance-Rejection Samplers
Many variational distributions, such as Gamma or Dirichlet, are intrinsically tied to acceptance-rejection samplers. Traditional reparameterization tricks fail due to the non-differentiable accept/reject logic. RS-VI (Naesseth et al., 2016) circumvents this by marginalizing out the accept/reject variable, thereby defining a smooth density for the auxiliary variable . The gradient of the ELBO with respect to variational parameters decomposes as:
where is the proposal-to-sample transform, is the proposal density, is the target (variational) density, and is the entropy term. This approach has been instantiated for a wide array of common distributions, with closed-form derivatives for the correction term in the Gamma and Dirichlet settings.
4. Variance, Computational Cost, and Tradeoffs
Both RVRS (Jankowiak et al., 2023) and RS-VI (Naesseth et al., 2016) demonstrate significant variance reduction in gradient estimation compared to score-function estimators and previous reparameterization-based approaches. Empirical results confirm that RVRS can reduce variance by an order of magnitude or more, especially as dimensionality increases. The expected computational cost per accepted sample is inversely proportional to the acceptance rate . As is lowered (in VRS/RVRS), sculpts closer to , improving ELBO tightness but decreasing and increasing per-sample cost. Increasing (i.e., ) recovers standard variational inference with lower computational cost but coarser approximations.
| Method | Variance Reduction | Computational Cost per Sample |
|---|---|---|
| Score-fn | High variance | Moderate (no accept/reject) |
| RVRS/RS-VI | 1–2 orders lower variance | samples per accept |
5. Acceptance-Rejection Reparameterization in MCMC
Acceptance-rejection reparameterization has also been leveraged for Markov Chain Monte Carlo. Neal (Neal, 2020) analyzes the standard Metropolis-Hastings (MH) accept/reject step, where the Uniform(0,1) random variable determines acceptance. By augmenting the Markov state to include and updating non-reversibly rather than resampling each iteration, one obtains a non-reversible chain preserving the target marginal. The deterministic update of through a translation with wrap-around or persistent additive noise can reduce random walk behavior and improve sampling efficiency, especially for algorithms with persistent momentum or in mixed discrete-continuous models.
Empirical results indicate that for persistent Langevin and hybrid Gibbs/Langevin settings, this approach achieves up to a factor of two in sampling efficiency over HMC and reduces autocorrelation times for statistics of interest. Improvements for basic random-walk Metropolis in high dimension are more modest (10–20%).
6. Practical Implementation and Applications
For variational inference, RVRS and RS-VI directly enable black-box variational inference for continuous latent variable models and variational families defined via rejection samplers. Implementation involves drawing samples via the reparameterized proposal, applying the acceptance probability or marginalized density, and employing automatic differentiation through the acceptance-rejection logic, with gradient centering as variance reduction. Notable practical instantiations include Gamma and Dirichlet variational families and shape-augmentation tricks.
For MCMC, integrating the non-reversibly updated auxiliary variable into Metropolis or Langevin chains provides improved performance, notably when interleaving other updates (e.g., Gibbs for discrete variables) is important or when persistent momentum is used. The method is broadly applicable to any MH step and is compatible with external Gibbs or HMC moves.
7. Significance, Limitations, and Future Directions
Acceptance-rejection reparameterization techniques have expanded the reach of reparameterization-based stochastic optimization and improved MCMC mixing in various settings. The approach allows practitioners to employ flexible, nonparametric posterior approximations in VI and to optimize or sample efficiently from models otherwise recalcitrant to gradient methods. The main tradeoff is between the fidelity of the approximation and computational cost, controlled via acceptance rate. For MCMC, the non-reversible augmentation yields more efficient exploration for complex or hybrid models, though the gains for standard high-dimensional Metropolis–Hastings are relatively modest.
The methodology remains under active development, with ongoing extensions to broader variational families, more sophisticated samplers, and integration in automatic inference frameworks. The geometric interpretation of smoothed rejection and the modularity of the gradient estimation procedure position acceptance-rejection reparameterization as a foundational tool for scalable Bayesian inference (Jankowiak et al., 2023, Naesseth et al., 2016, Neal, 2020).