Open questions in diffusion‑policy reinforcement learning: robustness, scaling, and optimization landscape
Develop reinforcement‑learning algorithms with diffusion policies that are robust to diverse environment characteristics; develop methods that enable diffusion‑policy reinforcement learning to scale to long‑horizon and sparse‑reward tasks; and characterize the optimization landscape of optimizing diffusion policies in reinforcement learning.
References
Important open questions remain in the field of DPRL, including designing algorithms robust to diverse environment characteristics, scaling to long-horizon and sparse-reward tasks, and developing a thorough understanding of the diffusion policy optimization landscape.
— FlowRL: A Taxonomy and Modular Framework for Reinforcement Learning with Diffusion Policies
(2603.27450 - Gao et al., 29 Mar 2026) in Closing Remarks (Section 6)