Generative Flow Networks (GFlowNets)

Updated 5 August 2025

Generative Flow Networks are a probabilistic modeling framework that incrementally builds structures by traversing a directed acyclic graph to sample objects proportional to a reward function.
They unify concepts from reinforcement learning, energy-based modeling, and MCMC by enforcing local flow consistency and enabling rapid, one-shot generation of diverse candidates.
GFlowNets facilitate scalable inference and active learning by using detailed balance and flow-matching constraints, which ensures each generated object reflects its associated reward.

Generative Flow Networks (GFlowNets) are a probabilistic modeling framework designed to represent and sample complex structured objects—such as sets, graphs, or sequences—proportionally to a user-specified nonnegative reward function. Unlike common generative models or Markov Chain Monte Carlo (MCMC) samplers, GFlowNets construct objects incrementally by traversing a directed acyclic graph (DAG) of states, learning a policy that amortizes the sampling procedure and enables one-pass generation of diverse high-reward candidates. The framework enforces local flow consistency constraints that guarantee, in the ideal case, that the marginal distribution over generated terminal objects is exactly proportional to the reward. GFlowNets unify concepts from reinforcement learning, probabilistic inference, and energy-based modeling, with extensions to structured output spaces, marginalization, entropy estimation, and active learning.

1. Theoretical Foundations and Flow-Matching

At the core of GFlowNets is the concept of “flows” defined on a DAG. Each node (state) in the DAG represents a partial or complete object, edges represent constructive transitions (e.g., adding an element to a set), and a unique source and sink designate the initial and terminal states. The model maintains:

A flow function $F$ over trajectories, with associated state flows $F(s)$ and edge flows $F(s \to s')$ .
The “flow-matching” (conservation) constraint at every non-terminal state,

$F(s) = \sum_{s' \in \text{Child}(s)} F(s \to s')$

ensuring the total incoming flow matches the total outgoing flow at $s$ .

For Markovian flows, the trajectory probability factors as $P(\tau) = \prod_{t} P_F(s_{t+1} \mid s_t)$ .
The boundary condition at terminal states,

$F(s \to s_f) = R(s)$

yields terminal distribution

$P_T(s) = \frac{F(s \to s_f)}{F(s_0)} \propto R(s)$

These constraints can be enforced via several equivalent formulations including edge flows, forward/backward transition probabilities, or directly on complete trajectories. The detailed balance condition provides a local constraint: $F(s) \cdot P_F(s'|s) = F(s') \cdot P_B(s|s')$ This balance connects forward and backward policies analogously to detailed balance in MCMC, but within a generative, amortized context.

2. Applications to Structured and Probabilistic Objects

GFlowNets generalize to any compositional object whose construction can be mapped onto a DAG. Salient examples include:

Sets and graphs: Sequential construction corresponds to set addition or edge/node insertions.
Joint distributions: GFlowNets can model distributions over multivariate objects, allowing computation of joint, marginal, and conditional probabilities (e.g., probability of a superset given a subset).
Marginalization and free energy estimation: The flow at intermediate states corresponds to summing (or integrating, in continuous variants) over descendant terminal rewards, enabling estimation of partition functions and free energies.

These features make GFlowNets attractive for generative modeling, approximate inference, and scenarios requiring exploration of diverse high-quality candidates, such as active molecule or sequence design. Because amortized sampling replaces slow MCMC passes with forward generation, GFlowNets can scale to propose large candidate batches efficiently.

3. Comparison with MCMC and Reinforcement Learning

GFlowNets depart from canonical MCMC and reinforcement learning methods in several respects:

Aspect	MCMC	RL	GFlowNet
Diversity	Favors mode mixing, may stall	Greedy; seeks max reward/mode	Samples proportionally to reward; diverse
Sampling	Iterative Markov chains	Maximizes cumulative reward	One-shot generative trajectory
Convergence	Slow for remote modes	Single solution (“mode collapse”)	Multimodal, proportional coverage
Efficiency	Requires many steps per sample	Often fails to sample diversity	Amortized, produces diverse batch quickly

Unlike RL, which seeks reward-maximizing (often deterministic) policies, GFlowNets learn stochastic policies to proportionally cover all high-reward modes. Against MCMC, GFlowNets produce each sample via a single generative trajectory, bypassing the slow mixing and mode-hopping barriers typical in high-dimensional spaces.

4. Extensions: Entropy, Mutual Information, Pareto Frontiers

The base GFlowNet framework admits rich extensions:

Estimation of entropy and mutual information: By transforming rewards using $-R(s)\log R(s)$ and learning auxiliary flows, GFlowNets can compute entropy $H[S]$ and conditional entropy $H[S|T]$ over the distribution of terminal objects.
Pareto GFlowNets: Multi-objective sampling is achieved by conditioning flows on weight vectors representing trade-offs, enabling direct exploration of Pareto frontiers for diverse candidate generation.
Conditional flows: Learning state-conditional or reward-conditional flows generalizes GFlowNets to model distributions and marginalizations involving subsets or supersets (e.g., $P_T(s'|s)$ ).
Support for stochastic environments and continuous or hybrid state/action spaces is attained by generalizing sums to integrals and utilizing continuous Markov kernels, with flow constraints replaced by measure-theoretic analogues.

Intermediate rewards can be incorporated using “return-augmented” GFlowNets, blurring the distinction between terminal and path-based reward structures and generalizing further toward energy-based modeling.

5. Mathematical Formalism

Key mathematical formulations include:

State and edge flows:

$F(s) = \sum_{\tau: s \in \tau} F(\tau), \qquad F(s \to s') = \sum_{\tau: s \to s' \in \tau} F(\tau)$

Terminating probability:

$P_T(s) = \frac{F(s \to s_f)}{Z}, \quad \text{where}~ Z = F(s_0)$

Markov factorization:

$P(\tau) = \prod_t P_F(s_{t+1} | s_t)$

Detailed balance loss (for each edge):

$L_{DB} = \left( \log\frac{\delta + F(s)P_F(s'|s)}{\delta + F(s')P_B(s|s')} \right)^2$

with $\delta$ ensuring numerical stability.

These formulations enable multiple learning objectives: local flow-matching, detailed balance, and trajectory-level constraints, all connected to the guarantee that sampling under the learned policy is proportional to reward.

6. Connections to Policies, Active Learning, and Inference

GFlowNets’ generative policies differ fundamentally from conventional RL agents. By training to match probability to reward (not maximizing it), GFlowNets create diverse candidate sets instead of focusing on a single optimal solution. In policy terms:

Forward transitions define the generative process via $P_F(s'|s)$ .
Backward transitions (unlike RL's value function) enable credit assignment via detailed balance, analogous to RL's Bellman equation.
Active learning: GFlowNets can propose diverse candidates even when reward proxies are imperfect, increasing robustness to model misspecification compared to greedy RL or MCMC sampling.

In energy-based and variational inference frameworks, GFlowNets serve as “amortized inference engines.” For example, given a log-unnormalized energy or reward function, the GFlowNet approximates the posterior distribution over structured objects.

7. Impact, Limitations, and Future Prospects

The GFlowNet paradigm provides a theoretically and algorithmically unified framework for probabilistic sampling, generative modeling, and amortized inference over structured domains. Its advantages include conservation-based generative sampling, support for arbitrary composite object spaces, efficient handling of multimodality, and direct links to entropy and information-theoretic quantities.

However, training involves enforcing local or trajectory-level constraints which can be more involved than running a single MCMC chain or performing policy gradient updates. The frameworks generalize naturally to continuous and hybrid spaces, modular energy functions, and stochastic environments.

Extensions under development include risk-sensitive objectives, meta-learning, proxy-free offline GFlowNet training, and integration with advanced optimization and exploration strategies. Future work may focus on scaling to complex, high-dimensional domains, improving computational efficiency and sample complexity, and broadening the theory to encompass cycles, arbitrary measurable spaces, and hybrid generative settings.

Generative Flow Networks, as introduced and established in “GFlowNet Foundations” (Bengio et al., 2021), represent a principled and flexible approach to probabilistic modeling and generative sampling over structured, multimodal spaces, with broad applications in scientific discovery, variational inference, and active learning.

PDF Markdown Chat (Pro)

References (1)

GFlowNet Foundations (2021)

Follow Topic

Get notified by email when new papers are published related to Generative Flow Networks (GFlowNets).