Open questions in diffusion‑policy reinforcement learning: robustness, scaling, and optimization landscape

Develop reinforcement‑learning algorithms with diffusion policies that are robust to diverse environment characteristics; develop methods that enable diffusion‑policy reinforcement learning to scale to long‑horizon and sparse‑reward tasks; and characterize the optimization landscape of optimizing diffusion policies in reinforcement learning.

Background

The paper proposes a unified taxonomy for reinforcement learning with diffusion policies (DPRL), introduces a modular JAX-based codebase, and presents standardized benchmarks across Gym-Locomotion, DeepMind Control Suite, and IsaacLab. Despite these contributions, the authors emphasize that fundamental questions remain unresolved.

Empirically, no single DPRL method dominates across all tasks, and certain approaches degrade under high action dimensionality or increased diffusion steps. On-policy diffusion-style methods can exhibit instability. These observations motivate the need for robustness across diverse environments, scalability to long-horizon and sparse-reward settings, and a deeper theoretical and empirical understanding of the optimization landscape involved in training diffusion policies.

References

Important open questions remain in the field of DPRL, including designing algorithms robust to diverse environment characteristics, scaling to long-horizon and sparse-reward tasks, and developing a thorough understanding of the diffusion policy optimization landscape.

FlowRL: A Taxonomy and Modular Framework for Reinforcement Learning with Diffusion Policies  (2603.27450 - Gao et al., 29 Mar 2026) in Closing Remarks (Section 6)