Refined α-Divergence Variational Inference (RDVI)

Updated 2 June 2026

The paper introduces a novel two-stage RDVI framework that integrates rejection sampling to rigorously reduce α-divergence for improved variational posterior approximation.
The method provides precise control over mode-seeking and mass-covering properties through a tunable α parameter and establishes theoretical bounds on divergence reduction.
Empirical studies on synthetic mixtures and Bayesian neural network regression demonstrate RDVI’s enhanced sampling efficiency and predictive performance.

Refined α-Divergence Variational Inference (RDVI) is an advanced framework for approximate Bayesian inference that generalizes classic variational inference by replacing the standard Kullback–Leibler objective with the Rényi α-divergence. RDVI further integrates rejection sampling to minimize worst-case density ratio discrepancies between the target and variational distributions, yielding improved empirical and theoretical performance, especially in complex or multimodal posterior landscapes (Sharma et al., 2019). This method unifies and extends prior α-divergence-based VI schemes, permitting rigorous control over mode-seeking, mass-covering, and robustness properties through the parameter α, and introduces a two-stage optimization-refinement procedure that is theoretically guaranteed to improve the fit.

1. Formal Definition of RDVI and α-Divergence

RDVI is grounded in the minimization of the Rényi α-divergence, which for two densities $p(x)$ (target, possibly unnormalized) and $q_\theta(x)$ (variational approximation), is given as

$D_\alpha(p\|q_\theta) = \frac{1}{\alpha-1} \log \int q_\theta(x) \left[ \frac{\tilde{p}(x)}{q_\theta(x)} \right]^\alpha dx - \frac{\alpha}{\alpha-1} \log Z_p,$

where $p(x)=\tilde{p}(x)/Z_p$ and $\alpha>0$ , $\alpha\neq1$ (Sharma et al., 2019). In the limit α→1, this recovers the standard KL divergence. The α-divergence allows continuous control between mass-covering (α<1) and mode-seeking (α>1), and provides a unified variational inference framework (Li et al., 2016).

2. Two-Stage RDVI: Hybridization with Rejection Sampling

A key innovation of RDVI is embedding a rejection sampling (RS) refinement stage. The crucial observation is that

$\lim_{\alpha\to\infty} D_\alpha(p\|q_\theta) = \log \max_x \frac{p(x)}{q_\theta(x)},$

which is equivalent to the log of the optimal rejection-sampler constant $M(\theta)$ for $q_\theta(x)$ . This leads to the two-stage α-Divergence Rejection Sampling (α-DRS) algorithm (Sharma et al., 2019):

Stage 1: Learn $q_\theta$ by minimizing a Monte Carlo estimate of $q_\theta(x)$ 0 for finite $q_\theta(x)$ 1, resulting in a proposal distribution well-matched to $q_\theta(x)$ 2.
Stage 2: Estimate an (approximate) acceptance threshold $q_\theta(x)$ 3 (linked to $q_\theta(x)$ 4), and perform rejection sampling with acceptance probability

$q_\theta(x)$ 5

yields a sample-based refined approximation $q_\theta(x)$ 6 that provably reduces the α-divergence to the target.

Stages in Algorithmic Form

Stage	Steps	Output
1 (RDVI)	Optimize Monte Carlo estimate of $q_\theta(x)$ 7 over $q_\theta(x)$ 8 using gradient-based updates	Proposal $q_\theta(x)$ 9
2 (RS)	Compute threshold $D_\alpha(p\\|q_\theta) = \frac{1}{\alpha-1} \log \int q_\theta(x) \left[ \frac{\tilde{p}(x)}{q_\theta(x)} \right]^\alpha dx - \frac{\alpha}{\alpha-1} \log Z_p,$ 0 (quantile or $D_\alpha(p\\|q_\theta) = \frac{1}{\alpha-1} \log \int q_\theta(x) \left[ \frac{\tilde{p}(x)}{q_\theta(x)} \right]^\alpha dx - \frac{\alpha}{\alpha-1} \log Z_p,$ 1); draw $D_\alpha(p\\|q_\theta) = \frac{1}{\alpha-1} \log \int q_\theta(x) \left[ \frac{\tilde{p}(x)}{q_\theta(x)} \right]^\alpha dx - \frac{\alpha}{\alpha-1} \log Z_p,$ 2 and accept with $D_\alpha(p\\|q_\theta) = \frac{1}{\alpha-1} \log \int q_\theta(x) \left[ \frac{\tilde{p}(x)}{q_\theta(x)} \right]^\alpha dx - \frac{\alpha}{\alpha-1} \log Z_p,$ 3 until sufficient samples collected	Empirical distribution $D_\alpha(p\\|q_\theta) = \frac{1}{\alpha-1} \log \int q_\theta(x) \left[ \frac{\tilde{p}(x)}{q_\theta(x)} \right]^\alpha dx - \frac{\alpha}{\alpha-1} \log Z_p,$ 4

Rigorous analysis guarantees that $D_\alpha(p\|q_\theta) = \frac{1}{\alpha-1} \log \int q_\theta(x) \left[ \frac{\tilde{p}(x)}{q_\theta(x)} \right]^\alpha dx - \frac{\alpha}{\alpha-1} \log Z_p,$ 5 for any finite α, and in the limit $D_\alpha(p\|q_\theta) = \frac{1}{\alpha-1} \log \int q_\theta(x) \left[ \frac{\tilde{p}(x)}{q_\theta(x)} \right]^\alpha dx - \frac{\alpha}{\alpha-1} \log Z_p,$ 6, $D_\alpha(p\|q_\theta) = \frac{1}{\alpha-1} \log \int q_\theta(x) \left[ \frac{\tilde{p}(x)}{q_\theta(x)} \right]^\alpha dx - \frac{\alpha}{\alpha-1} \log Z_p,$ 7 converges to $D_\alpha(p\|q_\theta) = \frac{1}{\alpha-1} \log \int q_\theta(x) \left[ \frac{\tilde{p}(x)}{q_\theta(x)} \right]^\alpha dx - \frac{\alpha}{\alpha-1} \log Z_p,$ 8 (Sharma et al., 2019).

3. Theoretical Properties and Bounds

RDVI establishes that the RS-based refinement always (strictly) decreases the α-divergence between the target $D_\alpha(p\|q_\theta) = \frac{1}{\alpha-1} \log \int q_\theta(x) \left[ \frac{\tilde{p}(x)}{q_\theta(x)} \right]^\alpha dx - \frac{\alpha}{\alpha-1} \log Z_p,$ 9 and the approximate $p(x)=\tilde{p}(x)/Z_p$ 0. As $p(x)=\tilde{p}(x)/Z_p$ 1 decreases, the refined distribution $p(x)=\tilde{p}(x)/Z_p$ 2 becomes increasingly concentrated on high-density regions of $p(x)=\tilde{p}(x)/Z_p$ 3, bridging the gap between variational and sampling-based approaches. The result holds for all α>0 and all proposal choices. In the high-dimensional regime, quantile-based thresholds for $p(x)=\tilde{p}(x)/Z_p$ 4 mitigate the curse of dimensionality in naive RS (Sharma et al., 2019).

4. Empirical Performance and Case Studies

Empirical studies in (Sharma et al., 2019) demonstrate the practical efficacy of the RDVI+RS approach:

Synthetic Gaussian Mixture: After Stage 1, $p(x)=\tilde{p}(x)/Z_p$ 5 may miss certain posterior modes; after RS refinement, the sample set $p(x)=\tilde{p}(x)/Z_p$ 6 successfully recovers all target modes, indicated by a stark drop in $p(x)=\tilde{p}(x)/Z_p$ 7 relative to $p(x)=\tilde{p}(x)/Z_p$ 8.
Bayesian Neural Network Regression (UCI): On benchmarks, α-DRS achieves lower RMSE and higher log-likelihood than standalone RDVI or other $p(x)=\tilde{p}(x)/Z_p$ 9-divergence minimization baselines, especially with γ≈0.1 acceptance rates in RS.

These results highlight that the hybrid scheme yields both tighter fits and improved predictive accuracy compared to variational-only routines.

5. Algorithmic Implementation and Optimization Strategy

The practical RDVI+RS routine proceeds as follows (Sharma et al., 2019):

Input: Unnormalized target $\alpha>0$ 0, α>0, acceptance hyperparameter γ.
Variational phase: Randomly initialize $\alpha>0$ 1; iteratively sample minibatches from $\alpha>0$ 2, estimate the α-divergence, and perform stochastic gradient updates to minimize the divergence.
Threshold estimation: For low dimension, set $\alpha>0$ 3; for high dimension, compute $\alpha>0$ 4 as the empirical quantile of $\alpha>0$ 5.
Refinement: Repeatedly draw $\alpha>0$ 6, accept using $\alpha>0$ 7, and accumulate accepted samples until the empirical sample set matches $\alpha>0$ 8.

This algorithm is amenable to stochastic optimization, mini-batching, and reparameterization-trick based variance reduction.

6. Significance, Comparisons, and Broader Implications

RDVI with rejection sampling serves as a principled, provably improved bridge between purely variational and sampling-based inference. It incorporates the flexibility of α-divergence minimization, enabling continuous interpolation between ELBO/VB, mode-seeking, and mass-covering behaviors (Li et al., 2016, Hernández-Lobato et al., 2015). The explicit connection to $\alpha>0$ 9 endows the method with strong guarantees on worst-case density ratio control, rendering it particularly robust to mode misspecification and tail coverage issues that plague naive VI.

The framework's modularity allows seamless adaptation to high-dimensional and complex latent variable models; it is also extensible to mixture optimization, mirror descent, and neural variational representations, interfacing cleanly with geometric and black-box VI improvements (Saha et al., 2017, Birrell et al., 2020, Daudel et al., 2021, Daudel et al., 2021).

A plausible implication is that future work can explore further meta-learned or adaptive thresholding in Stage 2, richer proposal families, and joint refinement of both $\alpha\neq1$ 0 and $\alpha\neq1$ 1, offering avenues for closing the gap with gold-standard MCMC while retaining variational efficiency.

7. Summary Statement

Refined α-Divergence Variational Inference via rejection sampling (α-DRS) leverages the $\alpha\neq1$ 2 asymptotics of Rényi divergence and a two-stage procedure to obtain a sample-based approximation that never worsens—and typically sharply improves—the closeness of the variational posterior to the target, compared to standalone VI. This unites advances in mode/covering control, robust optimization, and sample-based refinement into a cohesive, theoretically justified, and empirically validated variational inference framework (Sharma et al., 2019).