Differentiable Architecture Search for Reinforcement Learning (2106.02229v4)

Published 4 Jun 2021 in cs.LG, cs.AI, and cs.CV

Abstract: In this paper, we investigate the fundamental question: To what extent are gradient-based neural architecture search (NAS) techniques applicable to RL? Using the original DARTS as a convenient baseline, we discover that the discrete architectures found can achieve up to 250% performance compared to manual architecture designs on both discrete and continuous action space environments across off-policy and on-policy RL algorithms, at only 3x more computation time. Furthermore, through numerous ablation studies, we systematically verify that not only does DARTS correctly upweight operations during its supernet phrase, but also gradually improves resulting discrete cells up to 30x more efficiently than random search, suggesting DARTS is surprisingly an effective tool for improving architectures in RL.

Citations (4)

View on Semantic Scholar

Summary

The paper investigates applying Differentiable Architecture Search (DARTS) to reinforcement learning (RL), addressing challenges like RL's non-stationary data and loss dynamics.
Empirical evaluations show RL-DARTS achieves up to 250% performance improvement over hand-tuned baselines with only 3x additional computational cost in various RL environments.
The work suggests NAS can optimize RL policies for efficiency and performance, paving the way for tailoring architectures in large-scale RL systems like robot learning and memory-augmented agents.

Differentiable Architecture Search for Reinforcement Learning

The paper "Differentiable Architecture Search for Reinforcement Learning" investigates the applicability of gradient-based neural architecture search (NAS) methods, particularly Differentiable Architecture Search (DARTS), in the domain of reinforcement learning (RL). Historically, NAS has shown significant success in supervised learning, driven by the intrinsic correlation between architectures and loss functions in static datasets. However, RL presents unique challenges due to its dynamic, non-stationary nature, which raises questions about the effectiveness of loss-driven architecture search.

Key Contributions and Methodology

RL vs SL Loss Dynamics: The paper identifies crucial differences between supervised learning (SL) and RL regarding loss functions, emphasizing that RL's performance objectives are not directly represented in the loss functions. Unlike SL, where loss minimization aligns strongly with performance metrics like accuracy, in RL, optimizing the loss does not guarantee policy performance. Two main concerns are highlighted: the non-stationary nature of RL datasets and the presence of auxiliary losses in RL, which may not contribute positively to policy performance when weighted equally with architecture parameters.
Algorithm Implementation: The authors present an RL-DARTS framework, integrating a supernet search process into RL pipelines. The method involves joint optimization of network weights and architecture parameters (α) to find high-performing architectures, followed by a post-training discretization for final evaluation. This workflow is analogous to the implementation of DARTS in SL but tailored to address RL's dynamic environment representations.
Empirical Evaluations: Across multiple RL environments such as Procgen and DM-Control, the paper evaluates the discovered architectures against hand-tuned baselines and random search. Remarkable gains, up to 250% improvement over manual architectures, have been reported, accompanied by significant computation efficiency where the architecture search requires only 3x more computational effort than standard training.
Ablation Studies: Extensive ablation analyses are presented to validate the effectiveness and robustness of DARTS in RL. These studies confirm that the supernet correctly emphasizes beneficial operations in the training phase, with performance trajectories of discrete cells improving consistently over training iterations. However, it also acknowledges situations where the methodology may fail, particularly when supernets do not converge to optimal topologies.

Implications and Future Directions

The findings challenge previous assumptions on the limitations of NAS within RL contexts, suggesting a paradigm where architecture search could play a significant role in tailoring RL policies for efficiency and performance. The paper speculates on opportunities to further enhance RL-DARTS through better supernet training strategies (e.g., hyperparameter tuning, data augmentation) and refined discretization processes to mitigate integrality gaps between continuous and discrete architecture representations.

Looking forward, this work inspires potential advancements in large-scale RL systems, like robot learning and memory-augmented agents. The application of NAS in RL could optimize models for broader adaptability and generalization. Moreover, exploring other gradient-based NAS variants and integrating advanced regularization techniques could lead to even more robust automation of architecture discovery in reinforcement settings.

In conclusion, "Differentiable Architecture Search for Reinforcement Learning" provides substantial evidence that despite reinforcement learning's intricate nature, DARTS can effectively contribute to building better neural architectures, paving the way for future research and development in automated design within RL domains.

PDF Markdown

Related Papers

YouTube

Show All Videos