Comparative performance of GFlowNets vs state-of-the-art RL post-training methods

Determine the performance of GFlowNets-based approaches for post-training diffusion text-to-image models relative to state-of-the-art methods Flow-GRPO and DanceGRPO, using comparable evaluation protocols and metrics for reward-guided alignment and quality.

Background

GFlowNets have been proposed as an alternative MDP-based training framework aiming to sample trajectories with probabilities proportional to rewards, conceptually related to KL-regularized RL.

While Flow-GRPO and DanceGRPO currently represent strong baselines for RL post-training of diffusion models, the authors note that the comparative performance of GFlowNets against these methods has not been established.

References

However, performance vs. mainstream SOTA (Flow-GRPO, DanceGRPO) is unknown.

Finite Difference Flow Optimization for RL Post-Training of Text-to-Image Models  (2603.12893 - McAllister et al., 13 Mar 2026) in Section 2, Previous Work