Comparative performance of fully asynchronous versus backward-synchronized policy gradient methods
Determine whether fully asynchronous policy gradient algorithms with decoupled training and data collection outperform backward-pass-synchronized methods under a fixed compute budget, accounting for policy lag effects.
References
This makes it unclear whether fully asynchronous methods outperform approaches that do synchronize the backward pass, given a fixed compute budget.
— CaRL: Learning Scalable Planning Policies with Simple Rewards
(2504.17838 - Jaeger et al., 24 Apr 2025) in Appendix, Section 'Related work', subsubsection 'Asynchronous collection'