Papers
Topics
Authors
Recent
Search
2000 character limit reached

Refined α-Divergence Variational Inference (RDVI)

Updated 2 June 2026
  • The paper introduces a novel two-stage RDVI framework that integrates rejection sampling to rigorously reduce α-divergence for improved variational posterior approximation.
  • The method provides precise control over mode-seeking and mass-covering properties through a tunable α parameter and establishes theoretical bounds on divergence reduction.
  • Empirical studies on synthetic mixtures and Bayesian neural network regression demonstrate RDVI’s enhanced sampling efficiency and predictive performance.

Refined α-Divergence Variational Inference (RDVI) is an advanced framework for approximate Bayesian inference that generalizes classic variational inference by replacing the standard Kullback–Leibler objective with the Rényi α-divergence. RDVI further integrates rejection sampling to minimize worst-case density ratio discrepancies between the target and variational distributions, yielding improved empirical and theoretical performance, especially in complex or multimodal posterior landscapes (Sharma et al., 2019). This method unifies and extends prior α-divergence-based VI schemes, permitting rigorous control over mode-seeking, mass-covering, and robustness properties through the parameter α, and introduces a two-stage optimization-refinement procedure that is theoretically guaranteed to improve the fit.

1. Formal Definition of RDVI and α-Divergence

RDVI is grounded in the minimization of the Rényi α-divergence, which for two densities p(x)p(x) (target, possibly unnormalized) and qθ(x)q_\theta(x) (variational approximation), is given as

Dα(pqθ)=1α1logqθ(x)[p~(x)qθ(x)]αdxαα1logZp,D_\alpha(p\|q_\theta) = \frac{1}{\alpha-1} \log \int q_\theta(x) \left[ \frac{\tilde{p}(x)}{q_\theta(x)} \right]^\alpha dx - \frac{\alpha}{\alpha-1} \log Z_p,

where p(x)=p~(x)/Zpp(x)=\tilde{p}(x)/Z_p and α>0\alpha>0, α1\alpha\neq1 (Sharma et al., 2019). In the limit α→1, this recovers the standard KL divergence. The α-divergence allows continuous control between mass-covering (α<1) and mode-seeking (α>1), and provides a unified variational inference framework (Li et al., 2016).

2. Two-Stage RDVI: Hybridization with Rejection Sampling

A key innovation of RDVI is embedding a rejection sampling (RS) refinement stage. The crucial observation is that

limαDα(pqθ)=logmaxxp(x)qθ(x),\lim_{\alpha\to\infty} D_\alpha(p\|q_\theta) = \log \max_x \frac{p(x)}{q_\theta(x)},

which is equivalent to the log of the optimal rejection-sampler constant M(θ)M(\theta) for qθ(x)q_\theta(x). This leads to the two-stage α-Divergence Rejection Sampling (α-DRS) algorithm (Sharma et al., 2019):

  • Stage 1: Learn qθq_\theta by minimizing a Monte Carlo estimate of qθ(x)q_\theta(x)0 for finite qθ(x)q_\theta(x)1, resulting in a proposal distribution well-matched to qθ(x)q_\theta(x)2.
  • Stage 2: Estimate an (approximate) acceptance threshold qθ(x)q_\theta(x)3 (linked to qθ(x)q_\theta(x)4), and perform rejection sampling with acceptance probability

qθ(x)q_\theta(x)5

yields a sample-based refined approximation qθ(x)q_\theta(x)6 that provably reduces the α-divergence to the target.

Stages in Algorithmic Form

Stage Steps Output
1 (RDVI) Optimize Monte Carlo estimate of qθ(x)q_\theta(x)7 over qθ(x)q_\theta(x)8 using gradient-based updates Proposal qθ(x)q_\theta(x)9
2 (RS) Compute threshold Dα(pqθ)=1α1logqθ(x)[p~(x)qθ(x)]αdxαα1logZp,D_\alpha(p\|q_\theta) = \frac{1}{\alpha-1} \log \int q_\theta(x) \left[ \frac{\tilde{p}(x)}{q_\theta(x)} \right]^\alpha dx - \frac{\alpha}{\alpha-1} \log Z_p,0 (quantile or Dα(pqθ)=1α1logqθ(x)[p~(x)qθ(x)]αdxαα1logZp,D_\alpha(p\|q_\theta) = \frac{1}{\alpha-1} \log \int q_\theta(x) \left[ \frac{\tilde{p}(x)}{q_\theta(x)} \right]^\alpha dx - \frac{\alpha}{\alpha-1} \log Z_p,1); draw Dα(pqθ)=1α1logqθ(x)[p~(x)qθ(x)]αdxαα1logZp,D_\alpha(p\|q_\theta) = \frac{1}{\alpha-1} \log \int q_\theta(x) \left[ \frac{\tilde{p}(x)}{q_\theta(x)} \right]^\alpha dx - \frac{\alpha}{\alpha-1} \log Z_p,2 and accept with Dα(pqθ)=1α1logqθ(x)[p~(x)qθ(x)]αdxαα1logZp,D_\alpha(p\|q_\theta) = \frac{1}{\alpha-1} \log \int q_\theta(x) \left[ \frac{\tilde{p}(x)}{q_\theta(x)} \right]^\alpha dx - \frac{\alpha}{\alpha-1} \log Z_p,3 until sufficient samples collected Empirical distribution Dα(pqθ)=1α1logqθ(x)[p~(x)qθ(x)]αdxαα1logZp,D_\alpha(p\|q_\theta) = \frac{1}{\alpha-1} \log \int q_\theta(x) \left[ \frac{\tilde{p}(x)}{q_\theta(x)} \right]^\alpha dx - \frac{\alpha}{\alpha-1} \log Z_p,4

Rigorous analysis guarantees that Dα(pqθ)=1α1logqθ(x)[p~(x)qθ(x)]αdxαα1logZp,D_\alpha(p\|q_\theta) = \frac{1}{\alpha-1} \log \int q_\theta(x) \left[ \frac{\tilde{p}(x)}{q_\theta(x)} \right]^\alpha dx - \frac{\alpha}{\alpha-1} \log Z_p,5 for any finite α, and in the limit Dα(pqθ)=1α1logqθ(x)[p~(x)qθ(x)]αdxαα1logZp,D_\alpha(p\|q_\theta) = \frac{1}{\alpha-1} \log \int q_\theta(x) \left[ \frac{\tilde{p}(x)}{q_\theta(x)} \right]^\alpha dx - \frac{\alpha}{\alpha-1} \log Z_p,6, Dα(pqθ)=1α1logqθ(x)[p~(x)qθ(x)]αdxαα1logZp,D_\alpha(p\|q_\theta) = \frac{1}{\alpha-1} \log \int q_\theta(x) \left[ \frac{\tilde{p}(x)}{q_\theta(x)} \right]^\alpha dx - \frac{\alpha}{\alpha-1} \log Z_p,7 converges to Dα(pqθ)=1α1logqθ(x)[p~(x)qθ(x)]αdxαα1logZp,D_\alpha(p\|q_\theta) = \frac{1}{\alpha-1} \log \int q_\theta(x) \left[ \frac{\tilde{p}(x)}{q_\theta(x)} \right]^\alpha dx - \frac{\alpha}{\alpha-1} \log Z_p,8 (Sharma et al., 2019).

3. Theoretical Properties and Bounds

RDVI establishes that the RS-based refinement always (strictly) decreases the α-divergence between the target Dα(pqθ)=1α1logqθ(x)[p~(x)qθ(x)]αdxαα1logZp,D_\alpha(p\|q_\theta) = \frac{1}{\alpha-1} \log \int q_\theta(x) \left[ \frac{\tilde{p}(x)}{q_\theta(x)} \right]^\alpha dx - \frac{\alpha}{\alpha-1} \log Z_p,9 and the approximate p(x)=p~(x)/Zpp(x)=\tilde{p}(x)/Z_p0. As p(x)=p~(x)/Zpp(x)=\tilde{p}(x)/Z_p1 decreases, the refined distribution p(x)=p~(x)/Zpp(x)=\tilde{p}(x)/Z_p2 becomes increasingly concentrated on high-density regions of p(x)=p~(x)/Zpp(x)=\tilde{p}(x)/Z_p3, bridging the gap between variational and sampling-based approaches. The result holds for all α>0 and all proposal choices. In the high-dimensional regime, quantile-based thresholds for p(x)=p~(x)/Zpp(x)=\tilde{p}(x)/Z_p4 mitigate the curse of dimensionality in naive RS (Sharma et al., 2019).

4. Empirical Performance and Case Studies

Empirical studies in (Sharma et al., 2019) demonstrate the practical efficacy of the RDVI+RS approach:

  • Synthetic Gaussian Mixture: After Stage 1, p(x)=p~(x)/Zpp(x)=\tilde{p}(x)/Z_p5 may miss certain posterior modes; after RS refinement, the sample set p(x)=p~(x)/Zpp(x)=\tilde{p}(x)/Z_p6 successfully recovers all target modes, indicated by a stark drop in p(x)=p~(x)/Zpp(x)=\tilde{p}(x)/Z_p7 relative to p(x)=p~(x)/Zpp(x)=\tilde{p}(x)/Z_p8.
  • Bayesian Neural Network Regression (UCI): On benchmarks, α-DRS achieves lower RMSE and higher log-likelihood than standalone RDVI or other p(x)=p~(x)/Zpp(x)=\tilde{p}(x)/Z_p9-divergence minimization baselines, especially with γ≈0.1 acceptance rates in RS.

These results highlight that the hybrid scheme yields both tighter fits and improved predictive accuracy compared to variational-only routines.

5. Algorithmic Implementation and Optimization Strategy

The practical RDVI+RS routine proceeds as follows (Sharma et al., 2019):

  1. Input: Unnormalized target α>0\alpha>00, α>0, acceptance hyperparameter γ.
  2. Variational phase: Randomly initialize α>0\alpha>01; iteratively sample minibatches from α>0\alpha>02, estimate the α-divergence, and perform stochastic gradient updates to minimize the divergence.
  3. Threshold estimation: For low dimension, set α>0\alpha>03; for high dimension, compute α>0\alpha>04 as the empirical quantile of α>0\alpha>05.
  4. Refinement: Repeatedly draw α>0\alpha>06, accept using α>0\alpha>07, and accumulate accepted samples until the empirical sample set matches α>0\alpha>08.

This algorithm is amenable to stochastic optimization, mini-batching, and reparameterization-trick based variance reduction.

6. Significance, Comparisons, and Broader Implications

RDVI with rejection sampling serves as a principled, provably improved bridge between purely variational and sampling-based inference. It incorporates the flexibility of α-divergence minimization, enabling continuous interpolation between ELBO/VB, mode-seeking, and mass-covering behaviors (Li et al., 2016, Hernández-Lobato et al., 2015). The explicit connection to α>0\alpha>09 endows the method with strong guarantees on worst-case density ratio control, rendering it particularly robust to mode misspecification and tail coverage issues that plague naive VI.

The framework's modularity allows seamless adaptation to high-dimensional and complex latent variable models; it is also extensible to mixture optimization, mirror descent, and neural variational representations, interfacing cleanly with geometric and black-box VI improvements (Saha et al., 2017, Birrell et al., 2020, Daudel et al., 2021, Daudel et al., 2021).

A plausible implication is that future work can explore further meta-learned or adaptive thresholding in Stage 2, richer proposal families, and joint refinement of both α1\alpha\neq10 and α1\alpha\neq11, offering avenues for closing the gap with gold-standard MCMC while retaining variational efficiency.

7. Summary Statement

Refined α-Divergence Variational Inference via rejection sampling (α-DRS) leverages the α1\alpha\neq12 asymptotics of Rényi divergence and a two-stage procedure to obtain a sample-based approximation that never worsens—and typically sharply improves—the closeness of the variational posterior to the target, compared to standalone VI. This unites advances in mode/covering control, robust optimization, and sample-based refinement into a cohesive, theoretically justified, and empirically validated variational inference framework (Sharma et al., 2019).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Refined α-Divergence Variational Inference (RDVI).