Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Searching for A Robust Neural Architecture in Four GPU Hours (1910.04465v2)

Published 10 Oct 2019 in cs.CV

Abstract: Conventional neural architecture search (NAS) approaches are based on reinforcement learning or evolutionary strategy, which take more than 3000 GPU hours to find a good model on CIFAR-10. We propose an efficient NAS approach learning to search by gradient descent. Our approach represents the search space as a directed acyclic graph (DAG). This DAG contains billions of sub-graphs, each of which indicates a kind of neural architecture. To avoid traversing all the possibilities of the sub-graphs, we develop a differentiable sampler over the DAG. This sampler is learnable and optimized by the validation loss after training the sampled architecture. In this way, our approach can be trained in an end-to-end fashion by gradient descent, named Gradient-based search using Differentiable Architecture Sampler (GDAS). In experiments, we can finish one searching procedure in four GPU hours on CIFAR-10, and the discovered model obtains a test error of 2.82\% with only 2.5M parameters, which is on par with the state-of-the-art. Code is publicly available on GitHub: https://github.com/D-X-Y/NAS-Projects.

Overview of "Searching for A Robust Neural Architecture in Four GPU Hours"

The paper "Searching for A Robust Neural Architecture in Four GPU Hours" presents an efficient approach to Neural Architecture Search (NAS), significantly reducing the computational time required for discovering effective neural architectures. The authors propose a new method named Gradient-based search using Differentiable Architecture Sampler (GDAS) that leverages gradient descent to search for robust neural architectures within a mere four GPU hours, compared to the traditional methods which can take over 3000 GPU hours.

Methodology

The GDAS method introduces a novel framework for NAS by representing the search space as a Directed Acyclic Graph (DAG). Each sub-graph within this DAG corresponds to a potential neural architecture. The innovative aspect of GDAS is the use of a differentiable sampler over the DAG, which allows for efficient sampling of architectures optimized by validation loss. Unlike traditional methods that rely heavily on reinforcement learning or evolutionary strategies, GDAS uses a gradient-based optimization, allowing for more efficient and targeted architecture search.

Key Insights

  • Differentiability and Efficiency: By making the sampling process differentiable, GDAS allows for end-to-end training using gradient descent. This approach provides immediate feedback, optimizing the search process more efficiently than the delayed rewards in reinforcement learning or evolutionary algorithms.
  • Computational Cost: GDAS completes the search process in four GPU hours on CIFAR-10, a significant improvement over existing methods. This reduction in computational cost makes NAS accessible without massive computational resources, opening the door for broader adoption in the research community.

Experimental Results

The experiments conducted on CIFAR-10 demonstrate that GDAS can discover architectures comparable to state-of-the-art models, achieving a test error of 2.82% with only 2.5M parameters. Additionally, GDAS models trained on CIFAR and PTB (Penn Treebank) can be effectively transferred to other datasets like ImageNet and WikiText-2, showcasing the robustness and generalization capabilities of the discovered architectures.

Implications and Future Work

The practical implications of GDAS are noteworthy as it democratizes neural architecture search by vastly reducing the barrier of computational expense. Theoretically, GDAS shifts the paradigm of NAS towards more scalable and efficient methods, potentially encouraging further innovations in differentiable NAS approaches.

Speculatively, future developments in AI could focus on refining the differentiability of NAS processes further, potentially integrating with continuous learning paradigms. Additionally, the possibility of applying GDAS directly to large-scale datasets like ImageNet without a pre-training stage on smaller datasets could be explored.

Conclusion

The paper successfully addresses the computational limitations of current NAS techniques by introducing GDAS, setting a new standard for efficiency in neural architecture discovery. This method not only achieves competitive accuracy on well-established benchmarks but also significantly reduces the resources required for NAS, highlighting its potential to impact both academic research and practical applications in the field of AI.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Xuanyi Dong (28 papers)
  2. Yi Yang (856 papers)
Citations (613)
Github Logo Streamline Icon: https://streamlinehq.com