Comparative performance of fully asynchronous versus backward-synchronized policy gradient methods

Determine whether fully asynchronous policy gradient algorithms with decoupled training and data collection outperform backward-pass-synchronized methods under a fixed compute budget, accounting for policy lag effects.

Background

The paper discusses that fully asynchronous policy gradient methods can train faster but often omit analysis of policy lag, which introduces off-policy effects when training and data collection are decoupled. This raises uncertainty about the true performance benefits compared to synchronized approaches.

The authors’ AC-PPO keeps a synchronized backward pass to avoid policy lag, and they explicitly note that it remains unclear whether fully asynchronous methods are better under equal compute constraints.

References

This makes it unclear whether fully asynchronous methods outperform approaches that do synchronize the backward pass, given a fixed compute budget.

— CaRL: Learning Scalable Planning Policies with Simple Rewards (2504.17838 - Jaeger et al., 24 Apr 2025) in Appendix, Section 'Related work', subsubsection 'Asynchronous collection'

Comparative performance of fully asynchronous versus backward-synchronized policy gradient methods

Sponsor

Background

References

Related Problems