Effectiveness of Shampoo and Apollo Optimizers in Online Deep RL
Establish whether the Shampoo and Apollo optimizers can be configured to achieve strong performance in online deep reinforcement learning, and identify the optimizer properties and configurations that enable effectiveness in this regime.
References
Despite extensive hyperparameter tuning for both methods, we were unable to achieve strong performance in the online deep RL setting. This suggests that further investigation is needed to understand the key properties required for these optimizers to be effective in this regime.
— Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning
(2506.15544 - Castanyer et al., 18 Jun 2025) in Appendix, Subsection: Architecture and Optimizer Ablations