Papers
Topics
Authors
Recent
Search
2000 character limit reached

Variational Rejection Sampling (VRS)

Updated 22 April 2026
  • Variational Rejection Sampling is a hybrid method that refines variational proposals with an acceptance-rejection mechanism to better approximate complex posterior distributions.
  • The algorithm employs both soft and hard acceptance thresholds alongside α-divergence objectives to balance computational cost and approximation fidelity.
  • Empirical results show improved log-likelihood and reduced gradient variance, demonstrating VRS’s efficacy in enhancing latent variable model inference.

Variational Rejection Sampling (VRS) is a hybrid framework for approximate inference that refines a parametric variational proposal by integrating a rejection sampling mechanism. The approach systematically enhances the fidelity of variational approximations to complex target distributions, particularly in latent-variable models. By combining properties of traditional rejection sampling with variational inference objectives—often via α-divergence or ELBO minimization—VRS bridges the gap between efficient proposal construction and principled sample-based correction. Extensions such as Refined α-Divergence Rejection Sampling (α-DRS) and Reparameterized Variational Rejection Sampling (RVRS) further broaden the practical and theoretical impact of this scheme in probabilistic modeling (grover et al., 2018, Sharma et al., 2019, Jankowiak et al., 2023).

1. Theoretical Foundations and Model Setup

VRS addresses inference for latent variable models, where the joint distribution takes the form pθ(x,z)=pθ(z)pθ(xz)p_\theta(x, z) = p_\theta(z)p_\theta(x | z), with xx observed and zz latent. The key challenge is constructing an efficient, high-fidelity posterior approximation qϕ(zx)q_\phi(z \mid x), especially when the true posterior pθ(zx)p_\theta(z \mid x) exhibits complex, multimodal, or heavy-tailed structure that standard variational families fail to capture (grover et al., 2018, Jankowiak et al., 2023).

The essential idea is to build an improved variational family by "resampling" or "refining" qϕq_\phi via an accept-reject process. This yields a new density: r(zx)qϕ(zx)a(zx),r(z \mid x) \propto q_\phi(z \mid x) a(z \mid x), where the acceptance function a(zx)a(z \mid x) depends on both the model and proposal densities, ensuring that as the acceptance criterion is tightened, the refined r(zx)r(z\mid x) approaches pθ(zx)p_\theta(z\mid x) (grover et al., 2018).

2. Core Algorithm and Mathematical Framework

The canonical VRS acceptance probability follows the (hard) rejection sampling prescription: xx0 where xx1 is a normalization constant bounding xx2. In practice, to circumvent the need for a strict bound, VRS employs a "soft" acceptance threshold xx3, yielding

xx4

with xx5 and xx6 (softplus). The resulting resampled proposal is then normalized as

xx7

This procedure ensures a continuous, tunable trade-off between computational cost and closeness to the true posterior (grover et al., 2018, Jankowiak et al., 2023).

3. Relation to α-Divergence and the α-DRS Scheme

The "Refined α-Divergence Variational Inference via Rejection Sampling" framework (α-DRS) generalizes VRS by introducing Rényi α-divergence as an objective. For xx8, xx9, the Rényi divergence between target zz0 and proposal zz1 is

zz2

The α-DRS algorithm proceeds in two stages (Sharma et al., 2019):

  • Stage 1: Optimize zz3 by minimizing a Monte Carlo estimate of zz4.
  • Stage 2: Use learned zz5 and an (approximate) optimal RS constant (or quantile-based surrogate) to perform rejection sampling, generating a refined sample-based approximation.

The key theoretical link is that as zz6, zz7, with zz8 the tightest rejection sampling constant for zz9. Crucially, it is established that the rejection step cannot increase qϕ(zx)q_\phi(z \mid x)0, i.e.,

qϕ(zx)q_\phi(z \mid x)1

which guarantees improvement (or at least non-degradation) in the variational approximation after rejection sampling (Sharma et al., 2019).

4. Variational Objectives, Gradient Estimation, and Reparameterization

VRS can be cast within a variational inference framework, where the Evidence Lower Bound (ELBO) under the resampled proposal is

qϕ(zx)q_\phi(z \mid x)2

Taking advantage of the structure of qϕ(zx)q_\phi(z \mid x)3, Grover et al. derive low-variance gradient estimators, involving covariances under the resampled proposal. For reparameterizable base proposals (e.g., Gaussian), Jankowiak & Phan introduce a "pathwise" (low-variance) gradient for the parameters qϕ(zx)q_\phi(z \mid x)4 of qϕ(zx)q_\phi(z \mid x)5 via the identity: qϕ(zx)q_\phi(z \mid x)6 where qϕ(zx)q_\phi(z \mid x)7 is the (smooth) acceptance, qϕ(zx)q_\phi(z \mid x)8 is a function of the log-density ratios, and qϕ(zx)q_\phi(z \mid x)9 is the Jacobian from reparameterization (Jankowiak et al., 2023). This estimator exhibits substantially reduced variance compared to REINFORCE-style alternatives, enabling scalable and robust training.

5. Cost–Fidelity Trade-offs and Algorithmic Structure

The expected cost per accepted sample is inversely proportional to the mean acceptance probability, pθ(zx)p_\theta(z \mid x)0. Lowering the acceptance threshold tightens the approximation (reducing bias/KL-divergence) but concomitantly reduces pθ(zx)p_\theta(z \mid x)1, thus increasing computational cost. The variational gap pθ(zx)p_\theta(z \mid x)2 can be bounded by

pθ(zx)p_\theta(z \mid x)3

for sufficiently heavy-tailed pθ(zx)p_\theta(z \mid x)4 and pθ(zx)p_\theta(z \mid x)5, reinforcing that accuracy improves at the expense of sampling effort (Jankowiak et al., 2023). The full algorithm typically involves an inner sample-reject loop embedded within standard SGD updates, with threshold pθ(zx)p_\theta(z \mid x)6 either dynamically tuned (e.g., to match a target quantile acceptance) or set via theoretical criteria.

Pseudocode Structure for α-DRS

qϕq_\phi0 (Sharma et al., 2019)

6. Empirical Results and Applications

Empirical evidence demonstrates significant improvements using VRS-based methods:

  • Grover et al. report that on sigmoid belief networks trained on MNIST, VRS yields average improvements of 3.71 nats (single-sample) and 0.21 nats (multi-sample) in marginal log-likelihood over state-of-the-art baselines (grover et al., 2018).
  • α-DRS provides substantial reductions in pθ(zx)p_\theta(z \mid x)7, with marked fidelity improvements (e.g., posterior mode recovery in mixture models, improvement in Bayesian neural network regression) (Sharma et al., 2019).
  • Jankowiak & Phan observe that RVRS achieves lower gradient variance and superior or competitive posterior fidelity compared to normalizing flows, importance weighting, and hybrid MCMC/variational schemes. For instance, RVRS with a simple Gaussian proposal outpaces IWAE and normalizing flow VI on several inference tasks, with empirical speedups and improved negative ELBOs (Jankowiak et al., 2023).

7. Practical Considerations, Limitations, and Extensions

Principal algorithmic considerations include the tuning of acceptance thresholds (either through quantile-based rules or theoretical approximations), selection of divergence order pθ(zx)p_\theta(z \mid x)8 (α-DRS), and proposal family expressivity. Too aggressive rejection increases variance and computational cost, while soft acceptance thresholds facilitate efficient trade-offs.

Limitations include:

  • Increased complexity from rejection loops, leading to variable per-sample compute.
  • Added hyperparameter tuning (thresholds, quantiles, or pθ(zx)p_\theta(z \mid x)9).
  • For non-reparameterizable proposals, standard VRS relies on higher-variance gradient estimators.

VRS variants extend to richer variational families (normalizing flows, hierarchical structures), per-layer factorized resampling, and potentially reinforcement learning/policy search (grover et al., 2018, Jankowiak et al., 2023).

VRS and its descendants stand as a flexible, model-agnostic enhancement to variational inference pipelines—explicitly utilizing model densities to refine approximate inference and offering principled mechanisms to balance fidelity and cost in probabilistic modeling (grover et al., 2018, Sharma et al., 2019, Jankowiak et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (3)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Variational Rejection Sampling (VRS).