Papers
Topics
Authors
Recent
Search
2000 character limit reached

The Dormant Neuron Phenomenon in Deep Reinforcement Learning

Published 24 Feb 2023 in cs.LG | (2302.12902v2)

Abstract: In this work we identify the dormant neuron phenomenon in deep reinforcement learning, where an agent's network suffers from an increasing number of inactive neurons, thereby affecting network expressivity. We demonstrate the presence of this phenomenon across a variety of algorithms and environments, and highlight its effect on learning. To address this issue, we propose a simple and effective method (ReDo) that Recycles Dormant neurons throughout training. Our experiments demonstrate that ReDo maintains the expressive power of networks by reducing the number of dormant neurons and results in improved performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. An optimistic perspective on offline reinforcement learning. In International Conference on Machine Learning, pp. 104–114. PMLR, 2020.
  2. Deep reinforcement learning at the edge of the statistical precipice. Advances in neural information processing systems, 34:29304–29320, 2021.
  3. The impact of reinitialization on generalization in convolutional neural networks. arXiv preprint arXiv:2109.00267, 2021.
  4. Lifting the veil on hyper-parameters for value-based deep reinforcement learning. In Deep RL Workshop NeurIPS 2021, 2021. URL https://openreview.net/forum?id=Ws4v7nSqqb.
  5. Single-shot pruning for offline reinforcement learning. arXiv preprint arXiv:2112.15579, 2021.
  6. On warm-starting neural network training. Advances in Neural Information Processing Systems, 33:3884–3894, 2020.
  7. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
  8. Autonomous navigation of stratospheric balloons using reinforcement learning. Nature, 588(7836):77–82, 2020.
  9. Why would the brain need dormant neuronal precursors? Frontiers in Neuroscience, 16, 2022.
  10. Functional integration of neuronal precursors in the adult murine piriform cortex. Cerebral cortex, 30(3):1499–1515, 2020.
  11. Interference and generalization in temporal difference learning. In International Conference on Machine Learning, pp. 767–777. PMLR, 2020.
  12. A study on the plasticity of neural networks. CoRR, abs/2106.00042, 2021. URL https://arxiv.org/abs/2106.00042.
  13. Jax: composable transformations of python+ numpy programs. 2018.
  14. Dopamine: A Research Framework for Deep Reinforcement Learning. 2018. URL http://arxiv.org/abs/1812.06110.
  15. Randomized ensembled double q-learning: Learning fast without a model. In International Conference on Learning Representations, 2020.
  16. Nest: A neural network synthesis tool based on a grow-and-prune paradigm. IEEE Transactions on Computers, 68(10):1487–1497, 2019.
  17. Continual backprop: Stochastic gradient descent with persistent randomness. arXiv preprint arXiv:2108.06325, 2021.
  18. Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. In International conference on machine learning, pp. 1407–1416. PMLR, 2018.
  19. Gradmax: Growing neural networks using gradient information. In International Conference on Learning Representations, 2021.
  20. Secant: Self-expert cloning for zero-shot generalization of visual policies. In International Conference on Machine Learning, pp. 3088–3099. PMLR, 2021.
  21. Revisiting fundamentals of experience replay. In International Conference on Machine Learning, pp. 3061–3071. PMLR, 2020.
  22. Diagnosing bottlenecks in deep q-learning algorithms. In International Conference on Machine Learning, pp. 2021–2030. PMLR, 2019.
  23. The state of sparse training in deep reinforcement learning. In International Conference on Machine Learning, pp. 7766–7792. PMLR, 2022.
  24. TF-Agents: A library for reinforcement learning in tensorflow. https://github.com/tensorflow/agents, 2018. URL https://github.com/tensorflow/agents. [Online; accessed 25-June-2019].
  25. An empirical study of implicit regularization in deep offline rl. arXiv preprint arXiv:2207.02099, 2022.
  26. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pp. 1861–1870. PMLR, 2018.
  27. Stabilizing deep q-learning with convnets and vision transformers under data augmentation. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  3680–3693. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper/2021/file/1e0f65eb20acbfb27ee05ddc000b50ec-Paper.pdf.
  28. Array programming with numpy. Nature, 585(7825):357–362, 2020.
  29. Rainbow: Combining improvements in deep reinforcement learning. In Thirty-second AAAI conference on artificial intelligence, 2018.
  30. Deep learning scaling is predictable, empirically. arXiv preprint arXiv:1712.00409, 2017.
  31. Dropout q-functions for doubly efficient reinforcement learning. In International Conference on Learning Representations, 2021.
  32. Hunter, J. D. Matplotlib: A 2d graphics environment. Computing in science & engineering, 9(03):90–95, 2007.
  33. Transient non-stationarity and generalisation in deep reinforcement learning. In International Conference on Learning Representations, 2020.
  34. When to trust your model: Model-based policy optimization. Advances in Neural Information Processing Systems, 32, 2019.
  35. Model based reinforcement learning for atari. In International Conference on Learning Representations, 2019.
  36. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
  37. Adam: A method for stochastic optimization. In Bengio, Y. and LeCun, Y. (eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL http://arxiv.org/abs/1412.6980.
  38. A survey of generalisation in deep reinforcement learning. arXiv preprint arXiv:2111.09794, 2021.
  39. Learning multiple layers of features from tiny images. 2009.
  40. Implicit under-parameterization inhibits data-efficient deep reinforcement learning. In International Conference on Learning Representations, 2021a.
  41. Dr3: Value-based deep reinforcement learning requires explicit regularization. In International Conference on Learning Representations, 2021b.
  42. Rifle: Backpropagation in depth for deep transfer learning through re-initializing the fully-connected layer. In International Conference on Machine Learning, pp. 6010–6019. PMLR, 2020.
  43. Continuous control with deep reinforcement learning. In ICLR (Poster), 2016.
  44. Lin, L.-J. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine learning, 8(3):293–321, 1992.
  45. Understanding and preventing capacity loss in reinforcement learning. In International Conference on Learning Representations, 2021.
  46. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
  47. The primacy bias in deep reinforcement learning. In International Conference on Machine Learning, pp. 16828–16847. PMLR, 2022.
  48. Oliphant, T. E. Python for scientific computing. Computing in Science & Engineering, 9(3):10–20, 2007. doi: 10.1109/MCSE.2007.58.
  49. Puterman, M. L. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
  50. Cellular plasticity in the adult murine piriform cortex: continuous maturation of dormant precursors into excitatory neurons. Cerebral Cortex, 28(7):2610–2621, 2018.
  51. The impact of neural network overparameterization on gradient confusion and stochastic gradient descent. In International conference on machine learning, pp. 8469–8479. PMLR, 2020.
  52. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
  53. Dynamic sparse training for deep reinforcement learning. In International Joint Conference on Artificial Intelligence, 2022.
  54. Sutton, R. S. Learning to predict by the methods of temporal differences. Machine learning, 3(1):9–44, 1988.
  55. Reinforcement learning: An introduction. MIT press, 2018.
  56. Knowledge evolution in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  12843–12852, 2021.
  57. Rlx2: Training a sparse deep reinforcement learning model from scratch. arXiv preprint arXiv:2205.15043, 2022.
  58. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pp.  5026–5033. IEEE, 2012.
  59. Deep reinforcement learning and the deadly triad. CoRR, abs/1812.02648, 2018. URL http://arxiv.org/abs/1812.02648.
  60. When to use parametric models in reinforcement learning? Advances in Neural Information Processing Systems, 32, 2019.
  61. Python reference manual. Centrum voor Wiskunde en Informatica Amsterdam, 1995.
  62. Improving generalization in reinforcement learning with mixture regularization. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  7968–7978. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/file/5a751d6a0b6ef05cfe51b86e5d1458e6-Paper.pdf.
  63. Splitting steepest descent for growing neural architectures. Advances in neural information processing systems, 32, 2019.
  64. Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=GY6-6sTvGaf.
  65. Lifelong learning with dynamically expandable networks. In International Conference on Learning Representations, 2018.
  66. When does re-initialization work? arXiv preprint arXiv:2206.10011, 2022.
  67. Scaling vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  12104–12113, 2022.
  68. Online incremental feature learning with denoising autoencoders. In Artificial intelligence and statistics, pp.  1453–1461. PMLR, 2012.
  69. Fortuitous forgetting in connectionist networks. In International Conference on Learning Representations, 2021.
Citations (71)

Summary

  • The paper introduces ReDo, a method to recycle dormant neurons and enhance learning by preserving network capacity in deep RL.
  • It finds that non-stationary targets and high replay ratios foster widespread neuron dormancy across diverse RL algorithms and environments.
  • The study demonstrates that reactivating inactive neurons maintains network expressivity, offering promising avenues for optimizing reinforcement learning models.

The Dormant Neuron Phenomenon in Deep Reinforcement Learning

Introduction

Recent advancements in deep neural networks have been pivotal in enhancing the performance of reinforcement learning (RL) agents in complex decision-making tasks. Despite their success, a phenomenon termed as the dormant neuron phenomenon has been observed, where a significant number of neurons within an agent's network become inactive or dormant over time, potentially hindering the network's expressivity and the agent's learning ability. This summary underscores the findings, implications, and the proposed solution, Recycling Dormant neurons (ReDo), aimed at mitigating this phenomenon to maintain network expressivity and improve RL agent performance.

Dormant Neuron Phenomenon

The dormant neuron phenomenon is characterized by an increasing number of neurons within a network that show little to no activation during training. This under-utilization of the network's capacity is linked to the unique training dynamics of RL, particularly the non-stationarity of its data, which contrasts with the relatively stable data landscape seen in supervised learning settings.

  • Presence Across Algorithms and Domains: This phenomenon is not constrained to a particular algorithm or environment but is widespread across various algorithms (e.g., DQN, DrQ(ϵ\epsilon), SAC) and environments (Arcade Learning Environment, MuJoCo suite).
  • Exacerbation by Non-Stationarity and High Replay Ratio: Investigations suggest that the phenomenon is exacerbated by the non-stationary nature of targets in RL and by higher replay ratios. A higher replay ratio increases the rate at which neurons become dormant, subsequently leading to decreased performance.
  • Dormancy Leads to Reduced Learning Ability: An increasing number of dormant neurons directly infers a loss in the capacity to learn or adapt to new tasks, emphasized by a network's degraded ability to fit new data or targets compared to freshly initialized networks.

Proposed Solution: Recycling Dormant Neurons (ReDo)

ReDo is a simple yet effective technique designed to tackle the issue of dormant neurons by periodically reactivating them during training. This process involves identifying Ï„\tau-dormant neurons and reinitializing their incoming weights while setting their outgoing weights to zero. This strategy aims at maintaining the network's expressivity without significantly altering its output, therefore preserving the learned knowledge.

  • Efficacy in Reducing Dormant Neurons: ReDo demonstrated a significant reduction in the number of dormant neurons across various settings, thereby maintaining the network's capacity.
  • Improved Performance: By mitigating the dormant neuron phenomenon, ReDo has shown improved performance across diverse algorithms and environments, underlining the effectiveness of recycling dormant neurons in enhancing the expressivity and learning capability of RL networks.

Theoretical Implications and Future Directions

The identification and addressing of the dormant neuron phenomenon have several theoretical implications. It suggests a need for a nuanced understanding of how deep neural networks behave under the unique training dynamics of RL, especially concerning neuron utilization and network expressivity. Additionally, this research paves the way for further explorations into network architectures and optimization techniques tailored for reinforcement learning.

Furthermore, while ReDo presents a promising approach to recycling dormant neurons, future research could explore adaptive thresholding for identifying dormant neurons, incorporating neuron recycling directly into the optimization process, and thorough analysis on the relationship between network architecture complexity, task complexity, and the dormant neuron phenomenon.

Conclusion

The dormant neuron phenomenon represents a critical challenge in the utilization of neural networks for reinforcement learning. Through a combinational approach of empirical evidence and innovative solutions like ReDo, this work contributes significantly to our understanding of network dynamics in RL and opens new avenues for research in creating more efficient and expressive RL agents.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 6 likes about this paper.