Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective (2102.11535v4)

Published 23 Feb 2021 in cs.CV and cs.LG

Abstract: Neural Architecture Search (NAS) has been explosively studied to automate the discovery of top-performer neural networks. Current works require heavy training of supernet or intensive architecture evaluations, thus suffering from heavy resource consumption and often incurring search bias due to truncated training or approximations. Can we select the best neural architectures without involving any training and eliminate a drastic portion of the search cost? We provide an affirmative answer, by proposing a novel framework called training-free neural architecture search (TE-NAS). TE-NAS ranks architectures by analyzing the spectrum of the neural tangent kernel (NTK) and the number of linear regions in the input space. Both are motivated by recent theory advances in deep networks and can be computed without any training and any label. We show that: (1) these two measurements imply the trainability and expressivity of a neural network; (2) they strongly correlate with the network's test accuracy. Further on, we design a pruning-based NAS mechanism to achieve a more flexible and superior trade-off between the trainability and expressivity during the search. In NAS-Bench-201 and DARTS search spaces, TE-NAS completes high-quality search but only costs 0.5 and 4 GPU hours with one 1080Ti on CIFAR-10 and ImageNet, respectively. We hope our work inspires more attempts in bridging the theoretical findings of deep networks and practical impacts in real NAS applications. Code is available at: https://github.com/VITA-Group/TENAS.

PDF Abstract

Training-Free Neural Architecture Search: Evaluating Networks Without Training

The research paper titled "Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective" proposes a seminal approach in neural architecture search (NAS), named TE-NAS, that significantly advances the efficiency of discovering high-performance neural networks. This approach diverges from the conventional reliance on extensive training and evaluation processes, thus drastically reducing the resource demands of NAS.

Neural Architecture Search (NAS) has emerged as a critical method for discovering optimal network structures within a predefined search space. Traditional methods such as reinforcement learning or genetic algorithms often necessitate extensive computational resources, making them impractical for many applications. This paper addresses this limitation by introducing a training-free approach that utilizes theoretically inspired indicators to assess network architectures without resorting to any model training.

Methodological Innovation and Approach

TE-NAS leverages two key theoretical insights to rank candidate architectures: the analysis of the neural tangent kernel (NTK) spectrum and the count of linear regions in the network’s input space. These metrics are chosen for their purported correlation with a network's trainability and expressivity, respectively. The NTK condition number measures a network's trainability by assessing how effectively it can be optimized using gradient descent. Meanwhile, the number of linear regions serves as a surrogate for the network's expressivity, reflecting the complexity of functions it can model.

The crux of this research is to use these two measurements to evaluate and rank architectures in a label-free and training-free manner. The authors demonstrate that minimizing the NTK condition number and maximizing the linear regions are indicators strongly correlated with improved test accuracy. This is validated empirically across different search spaces like NAS-Bench-201 and DARTS, showing TE-NAS significantly reduces the search time (0.5 to 4 GPU hours for searches on CIFAR-10 and ImageNet) compared to traditional methods.

Experimental Results

The TE-NAS framework outperforms existing approaches on key benchmark datasets such as CIFAR-10, CIFAR-100, and ImageNet. Remarkably, the framework identifies superior architectures without any training, achieving test accuracy that is on par with other state-of-the-art methods, but at a fraction of the computational expense. For instance, on NAS-Bench-201, TE-NAS achieves impressive test accuracy while incurring only 1558 GPU seconds, showcasing an efficiency that redefines current benchmark standards.

Theoretical and Practical Implications

The implications of this research are profound both theoretically and practically. Theoretically, TE-NAS bridges a significant gap between deep learning theory and NAS practice by applying theoretical findings from the NTK and network expressivity directly to NAS applications. Practically, this reduction in resource requirements democratizes the accessibility of NAS, allowing broader adoption across various domains with limited computational capacity.

Future Directions

The introduction of TE-NAS could catalyze further research into training-free NAS approaches, inviting exploration into other theoretical measures that might provide similar or better evaluation capabilities for neural architectures. Additionally, the scalability of this approach across different network forms, tasks, or domains poses intriguing questions for subsequent investigations.

In conclusion, by sidestepping the traditional heavy-training paradigm for NAS, this paper opens up new possibilities for efficient and effective architecture search, contributing valuable insights and a robust framework to the field of neural architecture optimization.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Wuyang Chen (32 papers)
Xinyu Gong (21 papers)
Zhangyang Wang (374 papers)

Citations (213)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - VITA-Group/TENAS: [ICLR 2021] "Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective" by Wuyang Chen, Xinyu Gong, Zhangyang Wang (167 stars)