DARTS: Differentiable Architecture Search (1806.09055v2)

Published 24 Jun 2018 in cs.LG, cs.CL, cs.CV, and stat.ML

Abstract: This paper addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. Unlike conventional approaches of applying evolution or reinforcement learning over a discrete and non-differentiable search space, our method is based on the continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent. Extensive experiments on CIFAR-10, ImageNet, Penn Treebank and WikiText-2 show that our algorithm excels in discovering high-performance convolutional architectures for image classification and recurrent architectures for LLMing, while being orders of magnitude faster than state-of-the-art non-differentiable techniques. Our implementation has been made publicly available to facilitate further research on efficient architecture search algorithms.

Authors (3)

Hanxiao Liu (35 papers)
Karen Simonyan (54 papers)
Yiming Yang (152 papers)

Citations (4,125)

View on Semantic Scholar

Summary

The paper introduces a novel approach to neural architecture search by framing it as a differentiable problem, enabling efficient gradient-based optimization.
It achieved competitive performance on CIFAR-10 and Penn Treebank, reducing GPU computation from thousands of days to just 1.5 days.
The discovered architectures transfer well to larger datasets like ImageNet and WikiText-2, demonstrating the method’s scalability and effectiveness.

An Overview of DARTS: Differentiable Architecture Search

The paper "DARTS: Differentiable Architecture Search" by Hanxiao Liu, Karen Simonyan, and Yiming Yang introduces a novel, efficient method for neural architecture search (NAS) by framing the problem in a differentiable manner, allowing for gradient-based optimization.

Introduction and Background

Neural architecture search has traditionally involved computationally intensive methods such as reinforcement learning (RL) and evolutionary algorithms to identify optimal neural network architectures. These conventional approaches necessitate intensive resource consumption, exemplified by the 2000 GPU days required for RL-based NAS and 3150 GPU days for evolutionary techniques to achieve state-of-the-art architectures for tasks like CIFAR-10 and ImageNet. Previous attempts to expedite this process involved structural constraints, performance predictors, and weight sharing across models but did not effectively resolve the scalability issue inherent in treating architecture search as a discrete, non-differentiable optimization problem.

DARTS approaches architecture search differently by introducing continuous relaxation of the search space, thereby making it amenable to gradient-based optimization. This method circumvents the inefficient black-box search paradigm, providing a mechanism to optimize neural architectures directly with respect to performance metrics using gradient descent, allowing for substantial reductions in computational overhead.

Methodology

Search Space and Continuous Relaxation

DARTS models a neural network architecture as a directed acyclic graph (DAG). Nodes represent latent feature representations, and directed edges correspond to operations like convolutions. The architecture search problem is relaxed by expressing the selection of operations on each edge as a continuous relaxation through a softmax function over all candidate operations. This allows the architecture and its weights to be jointly optimized in a differentiable manner.

Bilevel Optimization

The optimization in DARTS is formulated as a bilevel problem: the lower level minimizes the training loss with respect to the network weights for a given architecture, while the upper level minimizes the validation loss with respect to the architecture parameters. This formulation allows for an efficient search process by leveraging the gradient-based optimization of the continuous architecture parameters. The method incorporates an approximation technique to handle the complexity of computing the architecture gradient efficiently.

Experimental Results

Image Classification on CIFAR-10

DARTS was tested extensively on the CIFAR-10 dataset, showing competitive performance with state-of-the-art methods while using significantly fewer computational resources. Specifically, a convolutional cell discovered by DARTS achieved a test error rate of 2.76% with 3.3M parameters, comparable to methods requiring thousands of GPU days. The search process for DARTS, by contrast, took only 1.5 GPU days.

LLMing on Penn Treebank

Similarly, for the task of LLMing on the Penn Treebank dataset, DARTS discovered recurrent cells that outperformed existing models, achieving a test perplexity of 55.7. This result surpassed traditionally tuned LSTMs and other automatically searched architectures, demonstrating the efficacy of DARTS in identifying high-performance recurrent structures in an efficient manner.

Transferability to ImageNet and WikiText-2

The robustness of DARTS was further validated by transferring the discovered convolutional and recurrent cells to larger datasets (ImageNet for image classification and WikiText-2 for LLMing). The cells maintained competitive performance, with the convolutional cell achieving a top-1 error of 26.7% on ImageNet, underscoring the potential generalizability of architectures found using DARTS.

Implications and Future Work

The success of DARTS in various tasks highlights the potential of differentiable architecture search to significantly reduce the computational burden traditionally associated with NAS, making it more accessible for resource-constrained environments. The methodology presented could inspire further research into continuous optimization techniques for other hyperparameter tuning tasks. Future work could explore enhanced mechanisms for discrete architecture derivation, potentially through annealing techniques or advanced performance-aware selection schemes.

Conclusion

DARTS represents a significant advancement in the field of neural architecture search, demonstrating that continuous relaxation and gradient-based optimization can yield highly competitive neural architectures with a fraction of the computational expenditure required by previous methods. This approach opens new avenues for efficient and scalable NAS, potentially democratizing access to state-of-the-art neural network designs.

PDF Markdown

Related Papers

YouTube

Show All Videos