The Dormant Neuron Phenomenon in Deep Reinforcement Learning

Published 24 Feb 2023 in cs.LG | (2302.12902v2)

Abstract: In this work we identify the dormant neuron phenomenon in deep reinforcement learning, where an agent's network suffers from an increasing number of inactive neurons, thereby affecting network expressivity. We demonstrate the presence of this phenomenon across a variety of algorithms and environments, and highlight its effect on learning. To address this issue, we propose a simple and effective method (ReDo) that Recycles Dormant neurons throughout training. Our experiments demonstrate that ReDo maintains the expressive power of networks by reducing the number of dormant neurons and results in improved performance.

Abstract PDF HTML Upgrade to Chat

Authors (4)

References (69)

Citations (71)

View on Semantic Scholar

Summary

The paper introduces ReDo, a method to recycle dormant neurons and enhance learning by preserving network capacity in deep RL.
It finds that non-stationary targets and high replay ratios foster widespread neuron dormancy across diverse RL algorithms and environments.
The study demonstrates that reactivating inactive neurons maintains network expressivity, offering promising avenues for optimizing reinforcement learning models.

The Dormant Neuron Phenomenon in Deep Reinforcement Learning

Introduction

Recent advancements in deep neural networks have been pivotal in enhancing the performance of reinforcement learning (RL) agents in complex decision-making tasks. Despite their success, a phenomenon termed as the dormant neuron phenomenon has been observed, where a significant number of neurons within an agent's network become inactive or dormant over time, potentially hindering the network's expressivity and the agent's learning ability. This summary underscores the findings, implications, and the proposed solution, Recycling Dormant neurons (ReDo), aimed at mitigating this phenomenon to maintain network expressivity and improve RL agent performance.

Dormant Neuron Phenomenon

The dormant neuron phenomenon is characterized by an increasing number of neurons within a network that show little to no activation during training. This under-utilization of the network's capacity is linked to the unique training dynamics of RL, particularly the non-stationarity of its data, which contrasts with the relatively stable data landscape seen in supervised learning settings.

Presence Across Algorithms and Domains: This phenomenon is not constrained to a particular algorithm or environment but is widespread across various algorithms (e.g., DQN, DrQ( $\epsilon$ ), SAC) and environments (Arcade Learning Environment, MuJoCo suite).
Exacerbation by Non-Stationarity and High Replay Ratio: Investigations suggest that the phenomenon is exacerbated by the non-stationary nature of targets in RL and by higher replay ratios. A higher replay ratio increases the rate at which neurons become dormant, subsequently leading to decreased performance.
Dormancy Leads to Reduced Learning Ability: An increasing number of dormant neurons directly infers a loss in the capacity to learn or adapt to new tasks, emphasized by a network's degraded ability to fit new data or targets compared to freshly initialized networks.

Proposed Solution: Recycling Dormant Neurons (ReDo)

ReDo is a simple yet effective technique designed to tackle the issue of dormant neurons by periodically reactivating them during training. This process involves identifying $\tau$ -dormant neurons and reinitializing their incoming weights while setting their outgoing weights to zero. This strategy aims at maintaining the network's expressivity without significantly altering its output, therefore preserving the learned knowledge.

Efficacy in Reducing Dormant Neurons: ReDo demonstrated a significant reduction in the number of dormant neurons across various settings, thereby maintaining the network's capacity.
Improved Performance: By mitigating the dormant neuron phenomenon, ReDo has shown improved performance across diverse algorithms and environments, underlining the effectiveness of recycling dormant neurons in enhancing the expressivity and learning capability of RL networks.

Theoretical Implications and Future Directions

The identification and addressing of the dormant neuron phenomenon have several theoretical implications. It suggests a need for a nuanced understanding of how deep neural networks behave under the unique training dynamics of RL, especially concerning neuron utilization and network expressivity. Additionally, this research paves the way for further explorations into network architectures and optimization techniques tailored for reinforcement learning.

Furthermore, while ReDo presents a promising approach to recycling dormant neurons, future research could explore adaptive thresholding for identifying dormant neurons, incorporating neuron recycling directly into the optimization process, and thorough analysis on the relationship between network architecture complexity, task complexity, and the dormant neuron phenomenon.

Conclusion

The dormant neuron phenomenon represents a critical challenge in the utilization of neural networks for reinforcement learning. Through a combinational approach of empirical evidence and innovative solutions like ReDo, this work contributes significantly to our understanding of network dynamics in RL and opens new avenues for research in creating more efficient and expressive RL agents.

Markdown Report Issue