Papers
Topics
Authors
Recent
Search
2000 character limit reached

GFlowNet Multi-Mode Sampling

Updated 17 March 2026
  • The paper reviews GFlowNet multi-mode sampling, detailing theoretical principles and algorithmic strategies to achieve reward-proportional, multi-modal sampling.
  • It explains ensemble techniques like Boosted GFlowNets and exploration methods such as Loss-Guided and Thompson Sampling to mitigate self-reinforcing updates and vanishing gradients.
  • Empirical evaluations on tasks like peptide and molecule design confirm significant improvements in mode discovery and distribution accuracy, reducing L1 errors and enhancing diversity.

Generative Flow Networks (GFlowNets) are a family of probabilistic models capable of sampling from highly multimodal distributions over combinatorial or structured spaces by learning policies that yield samples in proportion to a user-specified non-negative reward function. The challenge of effective multi-mode sampling—simultaneously discovering, covering, and proportionally sampling from multiple high-reward basins, especially when some are distant or difficult to reach—has motivated significant advancements in the GFlowNet literature. This article surveys the algorithmic landscape of GFlowNet multi-mode sampling, emphasizing theoretical principles, ensemble and exploration strategies, and empirical benchmarks.

1. GFlowNet Multi-Mode Sampling: Principle and Limitations

The GFlowNet framework models sequential construction of objects using a directed acyclic graph (DAG) of states (partial objects), with a unique source and multiple terminal (complete) states, each assigned a non-negative reward R(x)R(x) (Bengio et al., 2021). The forward policy PFP_F is trained such that the marginal probability of sampling a terminal state xx satisfies PF(x)R(x)P_F^\top(x) \propto R(x). Flow-matching, detailed-balance, and trajectory-balance objectives all guarantee, in the limit of zero training loss, that every high-reward mode (region with R(x)>0R(x) > 0) will be sampled in proportion to its reward mass (Zhang et al., 2022).

However, practical challenges arise due to:

  • Self-reinforcing updates: Early, easy-to-reach modes dominate both trajectory distribution and credit assignment, starving remote or rare modes of gradient signal.
  • Vanishing gradients: As some basins are saturated, newly available modes receive negligible flow updates, leading to mode collapse (concentration on a subset of accessible modes) (Dall'Antonia et al., 12 Nov 2025, Malek et al., 21 May 2025).

Multi-mode sampling thus requires algorithmic and architectural techniques that systematically mitigate this inherent exploration bias, ensuring robust discovery of all significant high-reward basins.

2. Ensemble and Residual Approaches: Boosted GFlowNets

Boosted GFlowNets (BGFNs) address the primary pitfall of self-reinforcing training by sequentially training an ensemble of GFlowNet models, each specializing in under-covered regions via residual rewards (Dall'Antonia et al., 12 Nov 2025). The method operates as follows:

  • At the tt-th stage, the residual reward for xx is defined as Rres(t)(x)=R(x)Rold(t)(x)R_\text{res}^{(t)}(x) = R(x) - R_\text{old}^{(t)}(x), where Rold(t)(x)=i=1t1R^i(x)R_\text{old}^{(t)}(x) = \sum_{i=1}^{t-1} \widehat{R}_i(x) is the cumulative estimated flow of previous t1t-1 models.
  • Each booster is trained with a variant trajectory-balance loss, focusing updates exclusively on unsaturated reward mass.
  • The key no-degradation theorem guarantees that boosters add mass only where the previous ensemble left positive residual, provably maintaining or improving coverage.

Sampling is performed by mixing over the ensemble: first select a trained booster proportionally to its partition function, then sample from its forward policy.

Empirically, BGFN discovers substantially more unique modes and reduces L1L_1 error to the true reward-proportional distribution on toy multimodal grids (e.g., reducing L1L_1 by an order of magnitude on Ring tasks) and on peptide generation benchmarks (e.g., >10×> 10 \times increase in unique high-activity peptides found), with additional boosters ceasing to affect already-covered modes (Dall'Antonia et al., 12 Nov 2025).

3. Exploration Strategies and Auxiliary Guidance

Loss-guided and uncertainty-guided exploration techniques further enhance multi-mode discovery by directly driving exploration toward under-learned or uncertain regions.

  • Loss-Guided GFlowNets (LGGFN) train an auxiliary agent whose reward is augmented with the main agent’s instantaneous trajectory-balance loss: Raux(τ)=R(τ)+λLmain(τ)R_{\rm aux}(\tau) = R(\tau) + \lambda \mathcal{L}_{\rm main}(\tau). This recurrently focuses the auxiliary policy on high-loss, poorly understood regions and supplies the main agent with informative trajectories (Malek et al., 21 May 2025). Benchmark results report >40×>40\times more unique valid modes and a 99%99\% reduction in exploration error on bit sequence generation relative to on-policy training.
  • Thompson Sampling GFlowNets (TS-GFN) maintain and sample from an ensemble of bootstrap heads, with stochastic head updates, effectively implementing posterior sampling over policies (Rector-Brooks et al., 2023). TS-GFN allocates more sampling to epistemically uncertain regions, accelerating coverage of distinct modes—finding 60%\sim 60\% more modes in long bit-sequence tasks and converging nearly 2×2\times faster than standard on-policy sampling.

These approaches mitigate mode starvation by either explicitly incentivizing policies to visit high-loss regions or adaptively balancing exploration and exploitation based on uncertainty.

4. Theoretical Extensions: Controlling Exploration via α-GFNs

A theoretical interpretation of GFlowNet objectives as enforcing Markov chain reversibility has catalyzed α\alpha-GFN variants, which interpolate between pure forward- and backward-policy updates and allow direct tuning of the exploration–exploitation tradeoff (Chen et al., 2 Feb 2026).

  • Standard GFlowNet objectives (e.g., subtrajectory balance) enforce reversibility with respect to the equally-mixed policy P0.5=0.5PF+0.5PBP_{0.5} = 0.5\, P_F + 0.5\, P_B.
  • The α\alpha-GFN generalizes to Pα=αPF+(1α)PBP_\alpha = \alpha P_F + (1-\alpha) P_B, with the α\alpha-SubTB loss weighted accordingly.
  • For α<0.5\alpha < 0.5, training emphasizes exploration by flattening the forward policy; α>0.5\alpha > 0.5 sharpens exploitation, focusing mass on learned high-reward areas.
  • Two-stage annealing schedules (exploration-focused in early training, then annealing to α=0.5\alpha = 0.5) maximize mode discovery before converging to the correct reward-matching stationary flow.

On benchmark problems—set generation, bit sequence, and molecule design—α\alpha-GFN objectives consistently outperform standard GFlowNet variants, discovering up to 10×10\times more modes while preserving reward sample diversity and matching (Chen et al., 2 Feb 2026).

5. Architectural Enhancements: Factorized and Subspace-Aware Models

Architectural modifications to credit assignment and action space structure can substantially enhance multi-mode coverage:

  • Bifurcated GFlowNets (BN) decompose the edge flow into state-value and edge allocation components, F(ss)=F(s)A(ss)F(s \to s') = F(s) A(s'|s), using separate neural heads (2406.01901). The value head propagates credit efficiently to all outgoing edges, while the allocation head determines the transition policy. This factorization, together with strict state flow-matching, enables faster and more robust coverage, especially in large action spaces, and empirically achieves near-perfect multimodal coverage on hypergrid, RNA sequence design, and large-scale molecule generation tasks.
  • Combinatorial Multi-Armed Bandit GFlowNets (CMAB-GFN) partition the action space into compact subspaces via arm-pruning and treat subspace selection as a combinatorial bandit problem (Yu et al., 12 Feb 2026). By cycling across multiple subspaces, CMAB-GFN accelerates discovery of high-reward modes while maintaining diversity, outperforming greedy GFlowNet variants on sequence, molecule, and RNA structure tasks.

6. Extensions to Continuous and Compositional Domains

In continuous spaces, exploratory challenges are amplified by the local connectivity and the risk of missing distant basins:

  • MetaGFN utilizes adapted metadynamics—via kernel density estimation and local biasing in a collective variable space—to fill explored basins and encourage departures to distant, previously unseen areas (Phillips et al., 2024). Integrating the metadynamics bias into GFlowNet training results in improved mode coverage (e.g., lower L1L_1 error, more distant modes discovered) compared to on-policy, noisy, or uncertainty-driven methods in both synthetic and molecular environments.

In compositional latent variable models, GFlowNets are used for amortized inference, sampling from complex posteriors with many modes. Mechanisms such as subtrajectory balance, sleep-phase reverse maximum likelihood, and explorer-on-policy mixtures ensure coverage of all modes, outperforming mean-field variational approaches (Hu et al., 2023).

7. Empirical Evaluation and Best Practices

The effectiveness of GFlowNet multi-mode sampling approaches is evaluated using L1L_1 error between the learned and target reward-proportional distributions, the cumulative number of distinct high-reward modes discovered, and mode diversity metrics (e.g., Tanimoto similarity in molecules) (Dall'Antonia et al., 12 Nov 2025, Malek et al., 21 May 2025, 2406.01901, Chen et al., 2 Feb 2026).

Variants such as prioritized replay, substructure-guided trajectory balance, and explicit entropy regularization further improve sample efficiency and coverage (Shen et al., 2023). Practical guidance includes dynamic scheduling of exploration budgets (e.g., α\alpha-annealing), architectural inductive biases (state–edge factorization), and the strategic combination of on/off-policy updates, auxiliary losses, and exploration heuristics.

A summary of key empirical findings is presented below:

Method Mode Discovery Improvement Notable Benchmarks
BGFN 10×10\times Peptide, 2D grids
LGGFN 40×40\times Bit sequences, Bayesian structures
α\alpha-GFN Up to 10×10\times Set, sequence, molecule
BN 2×2\times (molecules) HyperGrid, RNA, molecule
MetaGFN Best L1L_1/mode-finding Continuous grid, molecular landscapes
CMAB-GFN 2×2\times (RNA), $10/10$ Bit-sequence, molecule, RNA

These results confirm the necessity of residual, auxiliary, exploration-tuned, and factorized techniques for state-of-the-art multi-mode sampling in GFlowNets. Continuous benchmark improvements and new domains suggest ongoing opportunities for algorithmic refinement.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GFlowNet Multi-Mode Sampling.