Papers
Topics
Authors
Recent
Search
2000 character limit reached

Reverse AIS Estimator (RAISE)

Updated 16 March 2026
  • The paper introduces RAISE as a stochastic estimator that reliably computes conservative, lower bound log-likelihoods for complex undirected models.
  • RAISE reframes likelihood estimation by computing an augmented partition function via reverse annealed importance sampling, blending ideas from AIS and CSL.
  • Empirical evaluations demonstrate that RAISE closely brackets true log-likelihoods with minimal bias compared to AIS and CSL, though it comes with increased computational cost.

The Reverse AIS Estimator (RAISE) is a stochastic estimator designed to provide conservative (lower bound) estimates of log-likelihoods for undirected graphical models such as Restricted Boltzmann Machines (RBMs), Deep Boltzmann Machines (DBMs), and Deep Belief Networks (DBNs). Its primary utility lies in reliably evaluating models for which the partition function is intractable, where standard Annealed Importance Sampling (AIS) may yield over-optimistic model evaluations due to a tendency to underestimate the partition function. RAISE blends ideas from AIS and Conservative Sampling-based Likelihood (CSL) to produce test log-likelihood lower bounds that are both practical and accurate for complex generative models (Upadhya et al., 2015, Burda et al., 2014).

1. Problem Setting and Motivation

Undirected probabilistic graphical models like RBMs define densities of the form p(x)=f(x)/Zp(x) = f(x) / Z, where f(x)f(x) is typically straightforward to compute, but the partition function Z=xf(x)Z = \sum_{x} f(x) is intractable for high-dimensional xx. Assessing model fit on held-out test data requires estimating the average test log-likelihood L=1Ni=1Nlogp(x(i))\mathcal{L} = \frac{1}{N}\sum_{i=1}^N \log p(x^{(i)}). Since ZZ is unknown, one relies on stochastic estimators.

AIS is widely used for partition function estimation. By Jensen’s inequality, E[logZ^AIS]logZE[\log \hat Z_\text{AIS}] \leq \log Z, causing AIS to typically underestimate logZ\log Z, and hence the estimated log-likelihood logp(x)=logf(x)logZ^\log p(x) = \log f(x) - \log \hat Z becomes an overestimate—making AIS a non-conservative, potentially misleading estimator in practice (Burda et al., 2014).

2. RAISE: Formulation and Theoretical Properties

RAISE reframes the estimation of p(x)p(x) as the computation of a partition function of an augmented joint distribution, enabling the use of AIS in a reverse mode to yield a stochastic lower bound on logp(x)\log p(x). For RBMs, the marginalized likelihood is

p(v)=hp(h)p(vh),p(v) = \sum_h p(h)\,p(v\mid h),

where vv are the visible units and hh the hidden. Identifying the summand as f(h)=p(h)p(vh)f(h) = p(h)p(v|h), its sum over hh gives the partition function for this "augmented" distribution (Upadhya et al., 2015).

A sequence of K+1K+1 intermediate distributions is specified as

fk(x)f0(x)1βkfK(x)βk,0=β0<<βK=1,f_k(x) \propto f_0(x)^{1-\beta_k} f_K(x)^{\beta_k}, \quad 0 = \beta_0 < \cdots < \beta_K = 1,

where f0f_0 is a tractable proposal (e.g., uniform or base-rate), and fKf_K is the target (Upadhya et al., 2015, Burda et al., 2014).

The RAISE estimator for p(v)p(v) after a single reverse chain is

p^(v)=fK(v)Z0k=1Kfk1(xk)fk(xk),\hat p(v) = \frac{f_K(v)}{Z_0} \prod_{k=1}^{K} \frac{f_{k-1}(x_k)}{f_k(x_k)},

where Z0Z_0 is the known normalizer for f0f_0, and the sequence {xk}\{x_k\} is constructed by running a Markov chain backwards from the observed vv through the annealing path (Upadhya et al., 2015).

The estimator satisfies E[p^(v)]=p(v)E[\hat p(v)] = p(v), and by Jensen’s inequality E[logp^(v)]logp(v)E[\log \hat p(v)] \leq \log p(v), ensuring a conservative (lower bound) estimate (Burda et al., 2014).

3. Algorithmic Description

A single-chain version of RAISE for RBMs proceeds as follows (Upadhya et al., 2015):

1
2
3
4
5
6
7
8
9
10
11
Input:  test point v, model parameters θ, proposal f₀, β-schedule {β₀,…,β_K}, steps K
Output: estimate  p(v)

1. Precompute: Z₀ ← partition function of proposal f₀ (analytic)
2. Initialize:  x_K ← v; w ← f_K(v)/Z₀
3. For k = K downto 1:
     a) Sample x_{k−1} ∼ T_k(· | x_k) (transitions keep f_k invariant)
     b) Compute r ← f_{k-1}(x_{k-1}) / f_k(x_{k-1})
     c) Update w ← w × r
     d) Set x_k ← x_{k-1}
4. Return: p̂(v) = w
Running this estimator for each test datum and averaging yields an estimate of the average log-likelihood (Upadhya et al., 2015).

For tractable posteriors (e.g., RBMs), the reverse chain can sample directly from pK(hv)p_K(h|v) at initialization. For intractable cases (e.g., DBMs), additional “heating” transitions are necessary as described in (Burda et al., 2014).

4. Hyperparameters and Estimation Quality

The principal hyperparameters include:

  • Number of intermediate distributions (KK): Increasing KK reduces bias and variance at linear computational cost. Large KK is critical for ensuring conservativeness, especially with a uniform proposal.
  • β\beta-schedule: Linear spacing is standard, but denser schedules in regions of rapid distributional change may reduce variance.
  • Proposal f0f_0: A uniform proposal is safest for conservativeness but requires large KK. A base-rate RBM proposal can yield smaller bias but may slightly overestimate unless KK is very large.
  • Reverse chain runs per datum: While a single reverse chain can suffice for large KK, multiple chains further reduce variance (Upadhya et al., 2015).
  • Variance-reduction: Subtracting logf(x)\log f(x) as a control variate substantially reduces estimator variance over the test set (Burda et al., 2014).

5. Empirical Performance and Computational Cost

RAISE, AIS, and CSL were compared by (Upadhya et al., 2015) on MNIST using RBMs with 20, 200, and 500 hidden units. Representative log-likelihood results (average over 500 test points) are shown below:

n hidden AIS CSL RAISE (uniform) RAISE (base-rate)
20 –142.38 –143.58 –145.99 –144.14
200 –112.96 –142.64 –112.46 –109.01
500 –116.46 –154.76 –118.02 –112.04

Interpretation:

  • AIS generally provides the tightest (but potentially over-optimistic) estimates due to its tendency to underestimate logZ\log Z.
  • CSL is highly conservative with significant downward bias unless very large numbers of Gibbs samples are used.
  • RAISE is a lower-bound estimator; for K10000K \geq 10\,000, RAISE's log-likelihood estimates closely approach AIS and the (known for small nn) ground truth, outperforming CSL by a significant margin. A base-rate proposal may slightly overestimate unless KK is high, while uniform is strictly conservative (but slower).
  • Computational cost: RAISE, requiring one full reverse chain per test datum, is two to three orders of magnitude more expensive on MNIST than AIS (which only needs multiple chains per model), and more costly than CSL (Upadhya et al., 2015, Burda et al., 2014).

More extensive experiments on RBMs, DBMs, and DBNs (including Omniglot and larger models, (Burda et al., 2014)) confirm that RAISE typically underestimates test log-likelihood by less than 1 nat relative to AIS, demonstrating that AIS and RAISE tightly bracket the true value. Empirical plots show AIS leveling off to a possibly optimistic value, while RAISE's lower bound rises monotonically to converge just below AIS (Burda et al., 2014).

6. Relationships to CSL and AIS

RAISE synthesizes elements of both CSL and AIS:

  • CSL: Estimates p(v)p(v) by Monte Carlo, but suffers strong downward (conservative) bias due to Jensen's inequality.
  • AIS: Provides unbiased Z^\hat Z, yielding stochastic lower bounds on logZ\log Z, but upper bounds on L\mathcal{L}.
  • RAISE: By casting hp(h)p(vh)\sum_h p(h)p(v|h) as a partition function and applying AIS “in reverse,” RAISE produces an unbiased estimator of p(v)p(v). Jensen’s inequality assures conservativeness: E[logp^(v)]logp(v)E[\log \hat p(v)] \leq \log p(v), e.g., a true stochastic lower bound (Upadhya et al., 2015, Burda et al., 2014).

This lower-bound property holds for any finite KK; as KK \to \infty, the bound converges to the true log-likelihood of the model.

7. Practical Recommendations

RAISE requires only the MCMC transition kernels used in standard AIS, making implementation straightforward for those with an existing AIS codebase. Gibbs sampling is commonly used for these transitions. Using a data base-rate proposal improves accuracy and reduces error for both AIS and RAISE. Subtracting logf(x)\log f(x) as a control variate dramatically reduces variance in empirical settings (Burda et al., 2014).

For RBMs with moderate numbers of hidden units, RAISE is feasible on modern hardware, but for datasets with many test examples or expensive forward/reverse chains, the cost may be substantial. In models with intractable posteriors, additional “diagnostic” heating chains are required for correctness (Burda et al., 2014).

RAISE thus provides a robust, conservative methodology for evaluating generative undirected models, complementing standard AIS with rigorous lower-bound guarantees on log-likelihood estimation (Upadhya et al., 2015, Burda et al., 2014).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Reverse AIS Estimator (RAISE).