Dice Question Streamline Icon: https://streamlinehq.com

Effectiveness of Shampoo and Apollo Optimizers in Online Deep RL

Establish whether the Shampoo and Apollo optimizers can be configured to achieve strong performance in online deep reinforcement learning, and identify the optimizer properties and configurations that enable effectiveness in this regime.

Information Square Streamline Icon: https://streamlinehq.com

Background

Beyond the proposed Kronecker-factored optimizer, the authors evaluated state-of-the-art optimizers Shampoo and Apollo, which have been successful in large-scale supervised learning. Despite extensive hyperparameter tuning, these optimizers did not yield strong performance in the online deep RL setting.

This explicit inability motivates a concrete unresolved question about whether and how these optimizers can be adapted for online deep RL, and what properties or design choices are necessary for their effectiveness.

References

Despite extensive hyperparameter tuning for both methods, we were unable to achieve strong performance in the online deep RL setting. This suggests that further investigation is needed to understand the key properties required for these optimizers to be effective in this regime.

Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning (2506.15544 - Castanyer et al., 18 Jun 2025) in Appendix, Subsection: Architecture and Optimizer Ablations