Papers
Topics
Authors
Recent
2000 character limit reached

Searching for A Robust Neural Architecture in Four GPU Hours

Published 10 Oct 2019 in cs.CV | (1910.04465v2)

Abstract: Conventional neural architecture search (NAS) approaches are based on reinforcement learning or evolutionary strategy, which take more than 3000 GPU hours to find a good model on CIFAR-10. We propose an efficient NAS approach learning to search by gradient descent. Our approach represents the search space as a directed acyclic graph (DAG). This DAG contains billions of sub-graphs, each of which indicates a kind of neural architecture. To avoid traversing all the possibilities of the sub-graphs, we develop a differentiable sampler over the DAG. This sampler is learnable and optimized by the validation loss after training the sampled architecture. In this way, our approach can be trained in an end-to-end fashion by gradient descent, named Gradient-based search using Differentiable Architecture Sampler (GDAS). In experiments, we can finish one searching procedure in four GPU hours on CIFAR-10, and the discovered model obtains a test error of 2.82\% with only 2.5M parameters, which is on par with the state-of-the-art. Code is publicly available on GitHub: https://github.com/D-X-Y/NAS-Projects.

Citations (613)

Summary

  • The paper presents GDAS, a gradient-based NAS method that cuts search time from thousands to only four GPU hours.
  • It employs a differentiable DAG for efficient sampling, achieving a 2.82% test error on CIFAR-10 with just 2.5M parameters.
  • The approach enables effective transfer learning across datasets, democratizing neural architecture search for broader research use.

Overview of "Searching for A Robust Neural Architecture in Four GPU Hours"

The paper "Searching for A Robust Neural Architecture in Four GPU Hours" presents an efficient approach to Neural Architecture Search (NAS), significantly reducing the computational time required for discovering effective neural architectures. The authors propose a new method named Gradient-based search using Differentiable Architecture Sampler (GDAS) that leverages gradient descent to search for robust neural architectures within a mere four GPU hours, compared to the traditional methods which can take over 3000 GPU hours.

Methodology

The GDAS method introduces a novel framework for NAS by representing the search space as a Directed Acyclic Graph (DAG). Each sub-graph within this DAG corresponds to a potential neural architecture. The innovative aspect of GDAS is the use of a differentiable sampler over the DAG, which allows for efficient sampling of architectures optimized by validation loss. Unlike traditional methods that rely heavily on reinforcement learning or evolutionary strategies, GDAS uses a gradient-based optimization, allowing for more efficient and targeted architecture search.

Key Insights

  • Differentiability and Efficiency: By making the sampling process differentiable, GDAS allows for end-to-end training using gradient descent. This approach provides immediate feedback, optimizing the search process more efficiently than the delayed rewards in reinforcement learning or evolutionary algorithms.
  • Computational Cost: GDAS completes the search process in four GPU hours on CIFAR-10, a significant improvement over existing methods. This reduction in computational cost makes NAS accessible without massive computational resources, opening the door for broader adoption in the research community.

Experimental Results

The experiments conducted on CIFAR-10 demonstrate that GDAS can discover architectures comparable to state-of-the-art models, achieving a test error of 2.82% with only 2.5M parameters. Additionally, GDAS models trained on CIFAR and PTB (Penn Treebank) can be effectively transferred to other datasets like ImageNet and WikiText-2, showcasing the robustness and generalization capabilities of the discovered architectures.

Implications and Future Work

The practical implications of GDAS are noteworthy as it democratizes neural architecture search by vastly reducing the barrier of computational expense. Theoretically, GDAS shifts the paradigm of NAS towards more scalable and efficient methods, potentially encouraging further innovations in differentiable NAS approaches.

Speculatively, future developments in AI could focus on refining the differentiability of NAS processes further, potentially integrating with continuous learning paradigms. Additionally, the possibility of applying GDAS directly to large-scale datasets like ImageNet without a pre-training stage on smaller datasets could be explored.

Conclusion

The paper successfully addresses the computational limitations of current NAS techniques by introducing GDAS, setting a new standard for efficiency in neural architecture discovery. This method not only achieves competitive accuracy on well-established benchmarks but also significantly reduces the resources required for NAS, highlighting its potential to impact both academic research and practical applications in the field of AI.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.