GFlowNet Multi-Mode Sampling

Updated 17 March 2026

The paper reviews GFlowNet multi-mode sampling, detailing theoretical principles and algorithmic strategies to achieve reward-proportional, multi-modal sampling.
It explains ensemble techniques like Boosted GFlowNets and exploration methods such as Loss-Guided and Thompson Sampling to mitigate self-reinforcing updates and vanishing gradients.
Empirical evaluations on tasks like peptide and molecule design confirm significant improvements in mode discovery and distribution accuracy, reducing L1 errors and enhancing diversity.

Generative Flow Networks (GFlowNets) are a family of probabilistic models capable of sampling from highly multimodal distributions over combinatorial or structured spaces by learning policies that yield samples in proportion to a user-specified non-negative reward function. The challenge of effective multi-mode sampling—simultaneously discovering, covering, and proportionally sampling from multiple high-reward basins, especially when some are distant or difficult to reach—has motivated significant advancements in the GFlowNet literature. This article surveys the algorithmic landscape of GFlowNet multi-mode sampling, emphasizing theoretical principles, ensemble and exploration strategies, and empirical benchmarks.

1. GFlowNet Multi-Mode Sampling: Principle and Limitations

The GFlowNet framework models sequential construction of objects using a directed acyclic graph (DAG) of states (partial objects), with a unique source and multiple terminal (complete) states, each assigned a non-negative reward $R(x)$ (Bengio et al., 2021). The forward policy $P_F$ is trained such that the marginal probability of sampling a terminal state $x$ satisfies $P_F^\top(x) \propto R(x)$ . Flow-matching, detailed-balance, and trajectory-balance objectives all guarantee, in the limit of zero training loss, that every high-reward mode (region with $R(x) > 0$ ) will be sampled in proportion to its reward mass (Zhang et al., 2022).

However, practical challenges arise due to:

Self-reinforcing updates: Early, easy-to-reach modes dominate both trajectory distribution and credit assignment, starving remote or rare modes of gradient signal.
Vanishing gradients: As some basins are saturated, newly available modes receive negligible flow updates, leading to mode collapse (concentration on a subset of accessible modes) (Dall'Antonia et al., 12 Nov 2025, Malek et al., 21 May 2025).

Multi-mode sampling thus requires algorithmic and architectural techniques that systematically mitigate this inherent exploration bias, ensuring robust discovery of all significant high-reward basins.

2. Ensemble and Residual Approaches: Boosted GFlowNets

Boosted GFlowNets (BGFNs) address the primary pitfall of self-reinforcing training by sequentially training an ensemble of GFlowNet models, each specializing in under-covered regions via residual rewards (Dall'Antonia et al., 12 Nov 2025). The method operates as follows:

At the $t$ -th stage, the residual reward for $x$ is defined as $R_\text{res}^{(t)}(x) = R(x) - R_\text{old}^{(t)}(x)$ , where $R_\text{old}^{(t)}(x) = \sum_{i=1}^{t-1} \widehat{R}_i(x)$ is the cumulative estimated flow of previous $t-1$ models.
Each booster is trained with a variant trajectory-balance loss, focusing updates exclusively on unsaturated reward mass.
The key no-degradation theorem guarantees that boosters add mass only where the previous ensemble left positive residual, provably maintaining or improving coverage.

Sampling is performed by mixing over the ensemble: first select a trained booster proportionally to its partition function, then sample from its forward policy.

Empirically, BGFN discovers substantially more unique modes and reduces $L_1$ error to the true reward-proportional distribution on toy multimodal grids (e.g., reducing $L_1$ by an order of magnitude on Ring tasks) and on peptide generation benchmarks (e.g., $> 10 \times$ increase in unique high-activity peptides found), with additional boosters ceasing to affect already-covered modes (Dall'Antonia et al., 12 Nov 2025).

3. Exploration Strategies and Auxiliary Guidance

Loss-guided and uncertainty-guided exploration techniques further enhance multi-mode discovery by directly driving exploration toward under-learned or uncertain regions.

Loss-Guided GFlowNets (LGGFN) train an auxiliary agent whose reward is augmented with the main agent’s instantaneous trajectory-balance loss: $R_{\rm aux}(\tau) = R(\tau) + \lambda \mathcal{L}_{\rm main}(\tau)$ . This recurrently focuses the auxiliary policy on high-loss, poorly understood regions and supplies the main agent with informative trajectories (Malek et al., 21 May 2025). Benchmark results report $>40\times$ more unique valid modes and a $99\%$ reduction in exploration error on bit sequence generation relative to on-policy training.
Thompson Sampling GFlowNets (TS-GFN) maintain and sample from an ensemble of bootstrap heads, with stochastic head updates, effectively implementing posterior sampling over policies (Rector-Brooks et al., 2023). TS-GFN allocates more sampling to epistemically uncertain regions, accelerating coverage of distinct modes—finding $\sim 60\%$ more modes in long bit-sequence tasks and converging nearly $2\times$ faster than standard on-policy sampling.

These approaches mitigate mode starvation by either explicitly incentivizing policies to visit high-loss regions or adaptively balancing exploration and exploitation based on uncertainty.

4. Theoretical Extensions: Controlling Exploration via α-GFNs

A theoretical interpretation of GFlowNet objectives as enforcing Markov chain reversibility has catalyzed $\alpha$ -GFN variants, which interpolate between pure forward- and backward-policy updates and allow direct tuning of the exploration–exploitation tradeoff (Chen et al., 2 Feb 2026).

Standard GFlowNet objectives (e.g., subtrajectory balance) enforce reversibility with respect to the equally-mixed policy $P_{0.5} = 0.5\, P_F + 0.5\, P_B$ .
The $\alpha$ -GFN generalizes to $P_\alpha = \alpha P_F + (1-\alpha) P_B$ , with the $\alpha$ -SubTB loss weighted accordingly.
For $\alpha < 0.5$ , training emphasizes exploration by flattening the forward policy; $\alpha > 0.5$ sharpens exploitation, focusing mass on learned high-reward areas.
Two-stage annealing schedules (exploration-focused in early training, then annealing to $\alpha = 0.5$ ) maximize mode discovery before converging to the correct reward-matching stationary flow.

On benchmark problems—set generation, bit sequence, and molecule design— $\alpha$ -GFN objectives consistently outperform standard GFlowNet variants, discovering up to $10\times$ more modes while preserving reward sample diversity and matching (Chen et al., 2 Feb 2026).

5. Architectural Enhancements: Factorized and Subspace-Aware Models

Architectural modifications to credit assignment and action space structure can substantially enhance multi-mode coverage:

Bifurcated GFlowNets (BN) decompose the edge flow into state-value and edge allocation components, $F(s \to s') = F(s) A(s'|s)$ , using separate neural heads (2406.01901). The value head propagates credit efficiently to all outgoing edges, while the allocation head determines the transition policy. This factorization, together with strict state flow-matching, enables faster and more robust coverage, especially in large action spaces, and empirically achieves near-perfect multimodal coverage on hypergrid, RNA sequence design, and large-scale molecule generation tasks.
Combinatorial Multi-Armed Bandit GFlowNets (CMAB-GFN) partition the action space into compact subspaces via arm-pruning and treat subspace selection as a combinatorial bandit problem (Yu et al., 12 Feb 2026). By cycling across multiple subspaces, CMAB-GFN accelerates discovery of high-reward modes while maintaining diversity, outperforming greedy GFlowNet variants on sequence, molecule, and RNA structure tasks.

6. Extensions to Continuous and Compositional Domains

In continuous spaces, exploratory challenges are amplified by the local connectivity and the risk of missing distant basins:

MetaGFN utilizes adapted metadynamics—via kernel density estimation and local biasing in a collective variable space—to fill explored basins and encourage departures to distant, previously unseen areas (Phillips et al., 2024). Integrating the metadynamics bias into GFlowNet training results in improved mode coverage (e.g., lower $L_1$ error, more distant modes discovered) compared to on-policy, noisy, or uncertainty-driven methods in both synthetic and molecular environments.

In compositional latent variable models, GFlowNets are used for amortized inference, sampling from complex posteriors with many modes. Mechanisms such as subtrajectory balance, sleep-phase reverse maximum likelihood, and explorer-on-policy mixtures ensure coverage of all modes, outperforming mean-field variational approaches (Hu et al., 2023).

7. Empirical Evaluation and Best Practices

The effectiveness of GFlowNet multi-mode sampling approaches is evaluated using $L_1$ error between the learned and target reward-proportional distributions, the cumulative number of distinct high-reward modes discovered, and mode diversity metrics (e.g., Tanimoto similarity in molecules) (Dall'Antonia et al., 12 Nov 2025, Malek et al., 21 May 2025, 2406.01901, Chen et al., 2 Feb 2026).

Variants such as prioritized replay, substructure-guided trajectory balance, and explicit entropy regularization further improve sample efficiency and coverage (Shen et al., 2023). Practical guidance includes dynamic scheduling of exploration budgets (e.g., $\alpha$ -annealing), architectural inductive biases (state–edge factorization), and the strategic combination of on/off-policy updates, auxiliary losses, and exploration heuristics.

A summary of key empirical findings is presented below:

Method	Mode Discovery Improvement	Notable Benchmarks
BGFN	$10\times$	Peptide, 2D grids
LGGFN	$40\times$	Bit sequences, Bayesian structures
$\alpha$ -GFN	Up to $10\times$	Set, sequence, molecule
BN	$2\times$ (molecules)	HyperGrid, RNA, molecule
MetaGFN	Best $L_1$ /mode-finding	Continuous grid, molecular landscapes
CMAB-GFN	$2\times$ (RNA), $10/10$	Bit-sequence, molecule, RNA

These results confirm the necessity of residual, auxiliary, exploration-tuned, and factorized techniques for state-of-the-art multi-mode sampling in GFlowNets. Continuous benchmark improvements and new domains suggest ongoing opportunities for algorithmic refinement.