Further Scaling of Deep RL Architectures

Determine whether scaling deep reinforcement learning neural network architectures to larger depths and widths beyond those evaluated in this study can maintain stable training and strong performance when using multi-skip residual connections, Layer Normalization, and Kronecker-factored optimization, and characterize the limits of such scaling under practical computational constraints.

Background

The paper diagnoses gradient pathologies that intensify with depth and width in non-stationary regimes such as deep reinforcement learning and proposes two interventions—multi-skip residual connections and Kronecker-factored optimization—to stabilize gradient flow. These interventions consistently improve performance across several agents and environments at the scales tested.

However, the authors note computational constraints that limited exploration of larger architectures and explicitly state that whether further scaling is feasible and effective remains an open question, motivating investigation into scaling limits and behaviors beyond the evaluated sizes.

References

Our study is constrained by computational resources, which limited our ability to explore architectures beyond a certain size. While our interventions show consistent improvements across agents and environments, further scaling remains an open question.

— Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning (2506.15544 - Castanyer et al., 18 Jun 2025) in Section 7 (Discussion), Limitations

Further Scaling of Deep RL Architectures

Background

References

Related Problems