2000 character limit reached

Flow Annealed Importance Sampling Bootstrap

Updated 5 August 2025

FAB is a methodology that integrates normalizing flows with annealed importance sampling using a mass-covering α-divergence (α = 2) to overcome mode-seeking bias in complex target distributions.
It employs a bootstrapped approach with AIS chains and optional replay buffers to generate low-variance, representative samples for updating generative models.
Empirical results demonstrate that FAB significantly enhances sampling efficiency and accuracy in applications such as molecular simulation, particle physics, and Bayesian inference.

Flow Annealed Importance Sampling Bootstrap (FAB) is a methodology for training flexible generative models—primarily normalizing flows—on complex target distributions defined by unnormalized densities, such as Boltzmann distributions, Bayesian posteriors, or analytically specified likelihoods in high-dimensional spaces. FAB is designed to overcome the mode-seeking bias and high variance that often plague both standard flow-based maximum likelihood training and traditional importance sampling. The method achieves this by coupling flow models to annealed importance sampling (AIS) via a bootstrapped, mass-covering α-divergence loss (usually with α = 2), utilizing intermediate densities and, when appropriate, replay-buffer techniques for computational efficiency. FAB and its variants have been applied successfully to problems in molecular simulation, particle physics, and latent variable modeling.

1. Foundations: Mass-Covering α-Divergence and Normalizing Flows

At the core of FAB is the use of a mass-covering α-divergence objective, with α = 2 being the standard choice. The α-divergence is defined as

$D_{(\alpha)}(p \Vert q_\theta) = -\frac{1}{\alpha(1-\alpha)} \int p(x)^\alpha q_\theta(x)^{1-\alpha} dx.$

For α = 2, this reduces to

$D_2(p \Vert q_\theta) = \frac{1}{2} \int \frac{p(x)^2}{q_\theta(x)} dx.$

This divergence penalizes under-coverage of the flow (regions where p ≫ q) significantly harder than the reverse KL divergence, resulting in flows that are less prone to collapsing on individual modes of the target (Midgley et al., 2021).

Normalizing flows are used as the parametric family q₍θ₎(x), providing a tractable, invertible transformation between a base measure and a complex surrogate for the target. However, flows trained directly on KL or reverse KL often miss important regions in the target distribution—necessitating techniques that can discover and assign mass to previously underrepresented modes.

2. Methodology: Annealed Importance Sampling Bootstrap

FAB integrates flow models with AIS to overcome the inability of flows to discover low-density but important regions and the challenge of directly sampling from the optimal importance proposal:

Flow initialization: Begin with samples from the current flow q₍θ₎(x).
Bridging distributions: Construct a sequence of densities bridging q₍θ₎ to a surrogate target $\propto p(x)^2/q_\theta(x)$ , motivated by the variance minimization property of the α = 2 divergence.
AIS procedure: Run an AIS chain, typically using MCMC (e.g., HMC when gradients of p are available), to "push" flow samples through intermediate annealed distributions toward the surrogate target. At each step, assign an incremental importance weight, so that at the end of the chain, the overall AIS weight reflects the correction required for imperfect flow samples.
Loss formulation: Use the AIS-transformed samples and their importance weights to estimate the α-divergence loss for flow parameter updates. The standard loss is

$\mathcal{S}(\theta) = -\sum_{i} \frac{w^{AIS}(x_i)}{\sum_j w^{AIS}(x_j)} \log q_\theta(x_i),$

where gradients are typically not back-propagated through the AIS chain (stopping gradients at the sampled xᵢ and weights) (Midgley et al., 2021, Midgley et al., 2022, Kofler et al., 25 Nov 2024).

Replay buffer (optional): Store AIS samples and weights for reuse in multiple gradient steps, greatly reducing the number of target p(x) and ∇p(x) evaluations in settings where these are expensive, as in particle physics or molecular simulations (Kofler et al., 25 Nov 2024).

The key innovation is the bootstrapping loop: as the flow improves, it provides better initialization for AIS, whose transitions in turn generate lower-variance, more representative samples for further flow training.

3. Practical Algorithmic Features

FAB requires a careful choice and implementation of several components:

Choice of α: The default α = 2 minimizes variance of importance weights and promotes mass-covering solutions, but other α (including α < 1 or path-based q-exponential families) remain possible for matching specific applications (Brekelmans et al., 2020, Goshtasbpour et al., 2023).
Transition kernels: Where the target distribution is differentiable, HMC or Langevin dynamics are used within AIS chains to accelerate exploration, especially in high-dimensional spaces (Kofler et al., 25 Nov 2024, Xu et al., 13 Aug 2024).
Adaptive step-size and chain length: Practitioners often tune the number of intermediate steps (M), annealing schedule, and MCMC step sizes to trade off bias, variance, and computational cost (Goshtasbpour et al., 2022, Goshtasbpour et al., 2023, Guo et al., 7 Feb 2025).
Hyperparameter and schedule optimization: Recent work proposes to directly optimize over the interpolation path and annealing schedule, for example using flexible parameterizations or constant-rate progress schedules derived from functional gradient analysis (Goshtasbpour et al., 2022, Goshtasbpour et al., 2023).

The training loop is marked by alternation between the bootstrap sample generation (via AIS) and parameter updates using the importance-weighted loss—potentially using minibatch stochastic optimization and replay buffers.

4. Theoretical Properties and Complexity

The statistical efficiency of FAB, in tandem with the underlying AIS mechanism, is grounded in variance reduction and mass-coverage properties proven for α-divergences. Theoretical results detail the trade-offs and dependencies:

Complexity scaling: For estimating normalizing constants Z or free energies F = –log Z to relative accuracy ε, the oracle complexity of AIS (and therefore FAB when combined with MCMC-based transitions) is

$\widetilde{O}\left(\frac{d\beta^2\mathcal{A}^2}{\varepsilon^4}\right),$

where d is the dimension, β is a smoothness parameter of log p, ε is the relative error, and 𝒜 is the action integral along the interpolation curve connecting the initial and target distributions (Guo et al., 7 Feb 2025).

Path design: The action 𝒜 reflects how "difficult" the annealing is—lower 𝒜 implies lower complexity. Optimal transport and reverse diffusion paths (as opposed to the default geometric mean path) can greatly reduce action and therefore the required computational effort. Recent analyses have shown that reverse diffusion-based annealing paths can substantially improve normalizing constant estimation, especially in multimodal or rough landscapes (Guo et al., 7 Feb 2025).
Variance reduction: Using α-divergence objectives with α = 2 ensures that importance weight variance is minimized, enhancing the effective sample size and reducing the risk of weight degeneracy even when the flow q₍θ₎(x) is initially a poor surrogate (Midgley et al., 2021, Midgley et al., 2022).
Bias properties: When using Rao-Blackwellization or marginalization (as in mAIS), additional variance and bias reductions can be achieved, with direct implications for structured models or flow designs that can exploit tractable subspaces (Yasuda et al., 2022).

5. Applications and Empirical Results

FAB and related methods have demonstrated effectiveness across a wide variety of domains:

Physical and molecular simulation: FAB learns Boltzmann-type distributions directly from unnormalized target densities, without requiring samples from molecular dynamics simulations. On the alanine dipeptide example, FAB achieves accurate Ramachandran plots and low-bias free energy estimates with two orders of magnitude fewer target evaluations than maximum likelihood on MD samples (Midgley et al., 2022).
High-energy physics and differentiable matrix elements: FAB substantially reduces the number of costly matrix element evaluations necessary to achieve high sampling efficiency for event generation in high dimensions. The combination of normalizing flows, Hamiltonian Monte Carlo, and prioritized replay buffer results in high effective sample sizes and unbiased surrogates for physically meaningful distributions (Kofler et al., 25 Nov 2024).
Bayesian inference and latent variable models: In GPLVMs and variational inference, annealed flow-based sampling produces tighter variational lower bounds and lower reconstruction error than mean-field or importance weighted VI, especially in high-dimensional latent spaces (Xu et al., 13 Aug 2024).
Ising models and Boltzmann machines: By merging AIS and variance-reducing conditional integration, FAB-style methods improve expectation estimates in complex or low-temperature statistical mechanical models (Yasuda et al., 2020).

Across these tasks, empirical results consistently highlight substantial improvements in effective sample size, lower RMSE, reduced mode collapse, and unbiased estimator properties compared to standard maximum likelihood, vanilla AIS, or stochastic normalizing flow methods.

FAB is part of a continuum of flow-augmented importance sampling and SMC algorithms:

Annealed Flow Transport Monte Carlo (AFT) provides a rigorous SMC/AIS framework with learned normalizing flows as intermediate transport maps, theoretically underpinning FAB's approach (Arbel et al., 2021).
CRAFT (Continual Repeated Annealed Flow Transport MC) refines the annealing and flow learning procedures with sequential KL minimization over each transport, effectively decomposing the bridging task and improving gradient estimation (Matthews et al., 2022).
Optimization of Annealed Importance Sampling Hyperparameters introduces learnable intermediary distributions and schedule tuning, which can be directly incorporated into the FAB workflow to reduce required annealing steps without sacrificing estimation accuracy (Goshtasbpour et al., 2022).
Constant Rate AIS (CR-AIS) and "q-paths" provide frameworks for analytically optimizing the annealing path and schedule, adaptable for flow-based methods to further control variance and computational cost (Goshtasbpour et al., 2023, Brekelmans et al., 2020).
Reverse Diffusion Samplers for normalizing constant estimation offer a promising direction for defining smoother flow/annealing paths with reduced "action," crucial for high-dimensional and multimodal distributions (Guo et al., 7 Feb 2025).

7. Limitations, Challenges, and Future Directions

Open research questions and technical challenges include:

Path design and schedule optimization: While geometric mean paths are standard, optimal or data-driven interpolation curves (e.g., reverse diffusion or q-paths) may further reduce the action and thus the complexity of FAB (Guo et al., 7 Feb 2025). Adaptive or learned schedules can balance sample quality against the number of intermediate steps (Goshtasbpour et al., 2022).
Replay buffer biases: The replay buffer accelerates training by reducing redundant target evaluations, but may introduce bias unless managed with care (for example, by enforcing proper sample weighting or buffer resets).
Differentiation through AIS: Most implementations block gradients through the AIS chain; differentiable AIS or meta-learning of transition kernels can potentially improve convergence and further reduce variance (Midgley et al., 2022).
Integration with advanced sampling methods: Embedding score-based generative models or leveraging advanced samplers (e.g., Suwa-Todo, belief-propagation-guided MCMC) could enhance performance, especially when basic MCMC transitions are insufficient (Doucet et al., 2022, Yasuda et al., 2020).
Scalability and structure exploitation: Marginalized AIS or Rao-Blackwellized flow architectures can be crucial for problems with partially tractable structure or high-dimensional latent spaces (Yasuda et al., 2022, Xu et al., 13 Aug 2024).

A plausible implication is that future FAB variants will rely increasingly on adaptive schedules, learned or data-dependent annealing paths, and integration with both parametric and nonparametric transport frameworks, expanding applicability to new domains such as lattice field theory, large-scale scientific computing, and structured probabilistic modeling.

In summary, Flow Annealed Importance Sampling Bootstrap (FAB) is a principled and empirically effective framework uniting normalizing flows, annealed/sequential importance sampling, and mass-covering divergence objectives. By iteratively bootstrapping flow models using variance-reducing importance weights from AIS transitions, FAB can produce accurate density surrogates and expectation estimates for intractable, multimodal, or physically constrained target distributions—achieving high sampling efficiency, scalability to high dimensions, and minimal dependence on expensive external sampling or precomputing target data. The approach is underpinned by recent complexity analyses and admits natural incorporation of optimization, marginalization, adaptive scheduling, and reverse-diffusion concepts, making it a contemporary archetype in flow-enhanced Monte Carlo inference (Midgley et al., 2021, Midgley et al., 2022, Kofler et al., 25 Nov 2024, Guo et al., 7 Feb 2025).