Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The State of Sparse Training in Deep Reinforcement Learning (2206.10369v1)

Published 17 Jun 2022 in cs.LG and cs.AI

Abstract: The use of sparse neural networks has seen rapid growth in recent years, particularly in computer vision. Their appeal stems largely from the reduced number of parameters required to train and store, as well as in an increase in learning efficiency. Somewhat surprisingly, there have been very few efforts exploring their use in Deep Reinforcement Learning (DRL). In this work we perform a systematic investigation into applying a number of existing sparse training techniques on a variety of DRL agents and environments. Our results corroborate the findings from sparse training in the computer vision domain - sparse networks perform better than dense networks for the same parameter count - in the DRL domain. We provide detailed analyses on how the various components in DRL are affected by the use of sparse networks and conclude by suggesting promising avenues for improving the effectiveness of sparse training methods, as well as for advancing their use in DRL.

Sparse Training in Deep Reinforcement Learning

The focus of this paper is on sparse neural networks within the domain of Deep Reinforcement Learning (DRL). Sparse networks, characterized by having fewer connections compared to dense networks, have gained recognition for their efficiency and reduced resource requirements. Traditionally, sparse networks have shown success in computer vision, achieving performance levels comparable to dense ones with significantly fewer parameters. This paper explores whether these advantages translate to DRL, a field that primarily focuses on algorithmic improvements rather than architectural advancements.

Sparse training, including methods like Rigged Lottery (RigL) and Sparse Evolutionary Training (SET), allows networks to maintain dynamism during training by adjusting connectivity. Pruning, starting with dense networks and reducing connections based on weight magnitude, sets the benchmark for sparse networks that aim for comparable efficiency metrics without dense computations.

Key Findings

  1. Performance Superiority: Sparse networks, compared to their dense counterparts with the same parameter count, consistently demonstrate better performance across various DRL settings. Sparse networks trained via dynamic sparse training techniques show advantages over static sparse setups, although pruning remains the strongest performer in terms of comparison.
  2. Degree of Sparsity: Sparse networks can be trained effectively with up to 80-90% fewer connections without compromising performance relative to dense networks. This level of sparsity is maintained across robust environments, though sparse networks’ performance can vary based on environment complexity.
  3. Gradient Signal Utilization: RigL, which selects new connections based on gradient signals, shows limited applicability within DRL compared to its efficacy in computer vision tasks. This emphasizes potential challenges related to signal-to-noise ratios during training.

Implications for Architecture

Sparse neural networks suggest shifts in how DRL architectures can be optimized for both performance and resource efficiency. Notably, uneven distribution of sparsity across network layers enhances performance, specifically through methods like Erdos Renyi Kernel (ERK) distributions. The allocation of parameters between actor and critic networks, a critical aspect in actor-critic setups, highlights a tendency for critic networks to require more resources to maximize DRL capacity.

Practical Sensitivities & Robustness

Weight decay and specific initialization strategies show a limited impact on sparse network outcomes, with the sparsity distribution having more substantial implications. Sparse networks exhibit robustness to noise and can outperform dense architectures in scenarios with observational uncertainty. Interestingly, smaller sparse networks display consistent resilience, a crucial aspect of robustness in DRL environments.

Future Research Opportunities

The exploration of sparse networks in DRL opens avenues for improving network topologies and training algorithms under constraints. Investigations into methods that reduce reliance on dense initializations, enhance signal quality during sparse connection growth, and optimize sparsity distribution contribute to the theoretical advancement of DRL architectures. Understanding the generalization properties of sparse networks in noisy environments further strengthens DRL’s impact in real-world applications.

In summary, the paper illuminates the potential for sparse networks to significantly alter the landscape of DRL research and application. By reducing computational burden while maintaining or enhancing performance, sparse architectures pave the way for more efficient and scalable reinforcement learning systems, broadening future research horizons.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Laura Graesser (13 papers)
  2. Utku Evci (25 papers)
  3. Erich Elsen (28 papers)
  4. Pablo Samuel Castro (54 papers)
Citations (30)
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com