Reverse AIS Estimator (RAISE)
- The paper introduces RAISE as a stochastic estimator that reliably computes conservative, lower bound log-likelihoods for complex undirected models.
- RAISE reframes likelihood estimation by computing an augmented partition function via reverse annealed importance sampling, blending ideas from AIS and CSL.
- Empirical evaluations demonstrate that RAISE closely brackets true log-likelihoods with minimal bias compared to AIS and CSL, though it comes with increased computational cost.
The Reverse AIS Estimator (RAISE) is a stochastic estimator designed to provide conservative (lower bound) estimates of log-likelihoods for undirected graphical models such as Restricted Boltzmann Machines (RBMs), Deep Boltzmann Machines (DBMs), and Deep Belief Networks (DBNs). Its primary utility lies in reliably evaluating models for which the partition function is intractable, where standard Annealed Importance Sampling (AIS) may yield over-optimistic model evaluations due to a tendency to underestimate the partition function. RAISE blends ideas from AIS and Conservative Sampling-based Likelihood (CSL) to produce test log-likelihood lower bounds that are both practical and accurate for complex generative models (Upadhya et al., 2015, Burda et al., 2014).
1. Problem Setting and Motivation
Undirected probabilistic graphical models like RBMs define densities of the form , where is typically straightforward to compute, but the partition function is intractable for high-dimensional . Assessing model fit on held-out test data requires estimating the average test log-likelihood . Since is unknown, one relies on stochastic estimators.
AIS is widely used for partition function estimation. By Jensen’s inequality, , causing AIS to typically underestimate , and hence the estimated log-likelihood becomes an overestimate—making AIS a non-conservative, potentially misleading estimator in practice (Burda et al., 2014).
2. RAISE: Formulation and Theoretical Properties
RAISE reframes the estimation of as the computation of a partition function of an augmented joint distribution, enabling the use of AIS in a reverse mode to yield a stochastic lower bound on . For RBMs, the marginalized likelihood is
where are the visible units and the hidden. Identifying the summand as , its sum over gives the partition function for this "augmented" distribution (Upadhya et al., 2015).
A sequence of intermediate distributions is specified as
where is a tractable proposal (e.g., uniform or base-rate), and is the target (Upadhya et al., 2015, Burda et al., 2014).
The RAISE estimator for after a single reverse chain is
where is the known normalizer for , and the sequence is constructed by running a Markov chain backwards from the observed through the annealing path (Upadhya et al., 2015).
The estimator satisfies , and by Jensen’s inequality , ensuring a conservative (lower bound) estimate (Burda et al., 2014).
3. Algorithmic Description
A single-chain version of RAISE for RBMs proceeds as follows (Upadhya et al., 2015):
1 2 3 4 5 6 7 8 9 10 11 |
Input: test point v, model parameters θ, proposal f₀, β-schedule {β₀,…,β_K}, steps K
Output: estimate p(v)
1. Precompute: Z₀ ← partition function of proposal f₀ (analytic)
2. Initialize: x_K ← v; w ← f_K(v)/Z₀
3. For k = K downto 1:
a) Sample x_{k−1} ∼ T_k(· | x_k) (transitions keep f_k invariant)
b) Compute r ← f_{k-1}(x_{k-1}) / f_k(x_{k-1})
c) Update w ← w × r
d) Set x_k ← x_{k-1}
4. Return: p̂(v) = w |
For tractable posteriors (e.g., RBMs), the reverse chain can sample directly from at initialization. For intractable cases (e.g., DBMs), additional “heating” transitions are necessary as described in (Burda et al., 2014).
4. Hyperparameters and Estimation Quality
The principal hyperparameters include:
- Number of intermediate distributions (): Increasing reduces bias and variance at linear computational cost. Large is critical for ensuring conservativeness, especially with a uniform proposal.
- -schedule: Linear spacing is standard, but denser schedules in regions of rapid distributional change may reduce variance.
- Proposal : A uniform proposal is safest for conservativeness but requires large . A base-rate RBM proposal can yield smaller bias but may slightly overestimate unless is very large.
- Reverse chain runs per datum: While a single reverse chain can suffice for large , multiple chains further reduce variance (Upadhya et al., 2015).
- Variance-reduction: Subtracting as a control variate substantially reduces estimator variance over the test set (Burda et al., 2014).
5. Empirical Performance and Computational Cost
RAISE, AIS, and CSL were compared by (Upadhya et al., 2015) on MNIST using RBMs with 20, 200, and 500 hidden units. Representative log-likelihood results (average over 500 test points) are shown below:
| n hidden | AIS | CSL | RAISE (uniform) | RAISE (base-rate) |
|---|---|---|---|---|
| 20 | –142.38 | –143.58 | –145.99 | –144.14 |
| 200 | –112.96 | –142.64 | –112.46 | –109.01 |
| 500 | –116.46 | –154.76 | –118.02 | –112.04 |
Interpretation:
- AIS generally provides the tightest (but potentially over-optimistic) estimates due to its tendency to underestimate .
- CSL is highly conservative with significant downward bias unless very large numbers of Gibbs samples are used.
- RAISE is a lower-bound estimator; for , RAISE's log-likelihood estimates closely approach AIS and the (known for small ) ground truth, outperforming CSL by a significant margin. A base-rate proposal may slightly overestimate unless is high, while uniform is strictly conservative (but slower).
- Computational cost: RAISE, requiring one full reverse chain per test datum, is two to three orders of magnitude more expensive on MNIST than AIS (which only needs multiple chains per model), and more costly than CSL (Upadhya et al., 2015, Burda et al., 2014).
More extensive experiments on RBMs, DBMs, and DBNs (including Omniglot and larger models, (Burda et al., 2014)) confirm that RAISE typically underestimates test log-likelihood by less than 1 nat relative to AIS, demonstrating that AIS and RAISE tightly bracket the true value. Empirical plots show AIS leveling off to a possibly optimistic value, while RAISE's lower bound rises monotonically to converge just below AIS (Burda et al., 2014).
6. Relationships to CSL and AIS
RAISE synthesizes elements of both CSL and AIS:
- CSL: Estimates by Monte Carlo, but suffers strong downward (conservative) bias due to Jensen's inequality.
- AIS: Provides unbiased , yielding stochastic lower bounds on , but upper bounds on .
- RAISE: By casting as a partition function and applying AIS “in reverse,” RAISE produces an unbiased estimator of . Jensen’s inequality assures conservativeness: , e.g., a true stochastic lower bound (Upadhya et al., 2015, Burda et al., 2014).
This lower-bound property holds for any finite ; as , the bound converges to the true log-likelihood of the model.
7. Practical Recommendations
RAISE requires only the MCMC transition kernels used in standard AIS, making implementation straightforward for those with an existing AIS codebase. Gibbs sampling is commonly used for these transitions. Using a data base-rate proposal improves accuracy and reduces error for both AIS and RAISE. Subtracting as a control variate dramatically reduces variance in empirical settings (Burda et al., 2014).
For RBMs with moderate numbers of hidden units, RAISE is feasible on modern hardware, but for datasets with many test examples or expensive forward/reverse chains, the cost may be substantial. In models with intractable posteriors, additional “diagnostic” heating chains are required for correctness (Burda et al., 2014).
RAISE thus provides a robust, conservative methodology for evaluating generative undirected models, complementing standard AIS with rigorous lower-bound guarantees on log-likelihood estimation (Upadhya et al., 2015, Burda et al., 2014).