Neural Optimizer Search with Reinforcement Learning (1709.07417v2)

Published 21 Sep 2017 in cs.AI, cs.LG, and stat.ML

Abstract: We present an approach to automate the process of discovering optimization methods, with a focus on deep learning architectures. We train a Recurrent Neural Network controller to generate a string in a domain specific language that describes a mathematical update equation based on a list of primitive functions, such as the gradient, running average of the gradient, etc. The controller is trained with Reinforcement Learning to maximize the performance of a model after a few epochs. On CIFAR-10, our method discovers several update rules that are better than many commonly used optimizers, such as Adam, RMSProp, or SGD with and without Momentum on a ConvNet model. We introduce two new optimizers, named PowerSign and AddSign, which we show transfer well and improve training on a variety of different tasks and architectures, including ImageNet classification and Google's neural machine translation system.

Citations (378)

View on Semantic Scholar

Summary

The paper introduces a method using a Reinforcement Learning-trained RNN controller to automatically search for and discover novel optimization algorithms for deep learning architectures.
This approach successfully discovers new optimizers, such as PowerSign and AddSign, which outperform traditional methods like Adam and SGD on various benchmarks and real-world tasks like ImageNet and NMT.
A key finding is the transferability of the discovered optimizers across different network architectures and tasks without requiring further adaptation, highlighting the potential for automated, adaptable optimization design.

Neural Optimizer Search with Reinforcement Learning

The paper, "Neural Optimizer Search with Reinforcement Learning," presents a methodological advancement in the automation of discovering optimization methods specifically tailored for deep learning architectures. The authors leverage a Recurrent Neural Network (RNN) controller, trained with Reinforcement Learning (RL), to generate mathematical update equations that potentially surpass several established optimization methods including Adam, RMSProp, and SGD with Momentum.

Core Methodology

The RNN controller forms the crux of their approach, where it autonomously crafts optimization update rules expressed as strings in a domain-specific language (DSL). These strings sequentially describe the construction of update equations using primitive functions such as gradients and running averages. The controller's aim is to maximize model performance post-application of these derived updates over a fixed number of training epochs, a task for which it is trained using policy gradients afforded by PPO (Proximal Policy Optimization) methods.

Through extensive experiments conducted on the CIFAR-10 dataset using a ConvNet model, the method repeatedly discovers update rules that exhibit superior performance relative to several canonical optimizers. Notably, the paper introduces two optimizers, PowerSign and AddSign, characterized by unique equations and demonstrates their enhanced scalability across multiple neural network tasks and architectures. These optimizers are evaluated through practical applications to ImageNet classification models and Google's neural machine translation system, showcasing consistently improved performance metrics.

Implications and Transferability

This research underscores the potential for highly automated optimizer design in machine learning, departing from traditional hand-crafted approaches that may inadequately tackle the complexities inherent in non-convex optimization problems typical of neural networks. A particular strength of the proposed method lies in its transferability; the developed update equations can be readily applied to distinct learning tasks beyond their search environments without the necessity of further adaptation or re-training.

Empirical Findings

The proposed method is put through rigorous empirical testing. On the CIFAR-10 dataset, discovered update rules such as PowerSign and AddSign not only demonstrate efficacy but also affirm cross-task and cross-architecture robustness. PowerSign, for instance, is evaluated on the Rosenbrock function, yielding competitive results, and further trials on Wide ResNet architectures on CIFAR-10 reveal enhancements over standard methodologies, achieving a best test accuracy of 94.4% without any learning rate decay.

Moreover, in real-world applications on state-of-the-art architectures such as MobileNet for ImageNet classification and GNMT for translation tasks, the proposed optimizers outperformed baseline configurations. Notably, in Google's neural machine translation application, PowerSign improved BLEU scores by a margin comparable to advancements typically seen in substantial model redesigns or data enhancements.

Future Directions

The findings prompt several pathways for future exploration. Foremost, expanding the breadth of optimization primitives could forge new frontiers in optimizer discovery, while further tailoring and scaling reinforcement learning frameworks may refine both efficiency and output quality. Additionally, integrating these methods in constrained optimization environments, such as on-device or limited-memory applications, where efficiency is paramount, holds significant promise.

Overall, the results articulate the potential for automating foundational machine learning processes through reinforcement learning, heralding optimizers that are not only high-performing but also adaptable across diverse learning landscapes. As research into neural optimizer search progresses, expanded application and tuning may unlock new efficiencies in computational learning systems.

PDF Markdown