Neural Architecture Search without Training (2006.04647v3)

Published 8 Jun 2020 in cs.LG, cs.CV, and stat.ML

Abstract: The time and effort involved in hand-designing deep neural networks is immense. This has prompted the development of Neural Architecture Search (NAS) techniques to automate this design. However, NAS algorithms tend to be slow and expensive; they need to train vast numbers of candidate networks to inform the search process. This could be alleviated if we could partially predict a network's trained accuracy from its initial state. In this work, we examine the overlap of activations between datapoints in untrained networks and motivate how this can give a measure which is usefully indicative of a network's trained performance. We incorporate this measure into a simple algorithm that allows us to search for powerful networks without any training in a matter of seconds on a single GPU, and verify its effectiveness on NAS-Bench-101, NAS-Bench-201, NATS-Bench, and Network Design Spaces. Our approach can be readily combined with more expensive search methods; we examine a simple adaptation of regularised evolutionary search. Code for reproducing our experiments is available at https://github.com/BayesWatch/nas-without-training.

PDF Abstract

Neural Architecture Search without Training

The paper "Neural Architecture Search without Training" explores an innovative approach to Neural Architecture Search (NAS) by proposing a methodology to evaluate neural networks at their initial state, thereby circumventing the traditional and computationally expensive network training process. This represents a significant advancement in making NAS more accessible and efficient.

Overview and Methodology

Neural Architecture Search has been instrumental in automating the design of neural network architectures, but existing NAS techniques are computationally intensive due to the necessity of training numerous candidate architectures. The authors address this limitation by proposing a novel approach that predicts the performance of architectures before they are trained. This is achieved by examining the activation patterns of untrained networks.

The fundamental premise of the paper is the correlation between the overlap of activations in untrained networks and the network's performance post-training. The authors introduce a scoring mechanism based on the Hamming distance between activation patterns, forming a kernel matrix, denoted as $K_H$ . This approach provides a measure indicative of a network’s potential performance without requiring training.

Implementation and Results

The scoring system proposed by the authors was integrated into a simple search algorithm named NASWOT (Neural Architecture Search Without Training). The algorithm rapidly evaluates networks in diverse search spaces such as NAS-Bench-101, NAS-Bench-201, and NATS-Bench, using a single GPU. The results demonstrated that NASWOT can identify high-performing networks within seconds.

The evaluation across various benchmarks showed positive correlations between the proposed score and the trained accuracy, particularly in search spaces like NAS-Bench-201 and NDS-DARTS. Additionally, ablation studies confirmed the robustness of their approach concerning different data inputs and initializations.

The practicality of the methodology is further demonstrated by its integration with existing NAS algorithms in a strategy called Assisted Regularised Evolutionary Algorithm (AREA). AREA uses the NASWOT score to enhance the starting population in an evolutionary search, achieving competitive results with traditional techniques.

Implications and Future Work

The potential implications of this research are substantial. By eliminating the need for extensive training, the proposed approach offers a more resource-efficient and faster alternative to current NAS methodologies. This efficiency enables the real-time adaptation of architectures to different tasks and hardware requirements, broadening the applicability of NAS in practical scenarios.

While the scope of this work is presently confined to convolutional neural networks for image classification, it opens avenues for extending this methodology to other network types and tasks. Future work could explore more intricate network structures or further refine the scoring mechanisms to increase predictive accuracy.

In conclusion, this paper's insights present a transformative approach to neural architecture design, providing a foundation for further explorations into training-free NAS methodologies. The integration of these methods could significantly lower the cost and time barriers in neural network design, promoting widespread accessibility and usage.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Joseph Mellor (3 papers)
Jack Turner (9 papers)
Amos Storkey (75 papers)
Elliot J. Crowley (27 papers)

Citations (342)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - BayesWatch/nas-without-training: Code for Neural Architecture Search without Training (ICML 2021) (469 stars)

Tweets

https://twitter.com/aicrumb/status/1755376171720605759

YouTube

Show All Videos