EAT-NAS: Elastic Architecture Transfer for Accelerating Large-scale Neural Architecture Search (1901.05884v3)

Published 17 Jan 2019 in cs.CV

Abstract: Neural architecture search (NAS) methods have been proposed to release human experts from tedious architecture engineering. However, most current methods are constrained in small-scale search due to the issue of computational resources. Meanwhile, directly applying architectures searched on small datasets to large datasets often bears no performance guarantee. This limitation impedes the wide use of NAS on large-scale tasks. To overcome this obstacle, we propose an elastic architecture transfer mechanism for accelerating large-scale neural architecture search (EAT-NAS). In our implementations, architectures are first searched on a small dataset, e.g., CIFAR-10. The best one is chosen as the basic architecture. The search process on the large dataset, e.g., ImageNet, is initialized with the basic architecture as the seed. The large-scale search process is accelerated with the help of the basic architecture. What we propose is not only a NAS method but a mechanism for architecture-level transfer. In our experiments, we obtain two final models EATNet-A and EATNet-B that achieve competitive accuracies, 74.7% and 74.2% on ImageNet, respectively, which also surpass the models searched from scratch on ImageNet under the same settings. For the computational cost, EAT-NAS takes only less than 5 days on 8 TITAN X GPUs, which is significantly less than the computational consumption of the state-of-the-art large-scale NAS methods.

PDF Abstract

Overview of EAT-NAS: Elastic Architecture Transfer for Accelerating Large-scale Neural Architecture Search

The paper presents "EAT-NAS," a novel approach aimed at addressing the significant computational challenges inherent in large-scale Neural Architecture Search (NAS). While NAS has shown promise in automating neural architecture design, a key limitation remains: its reliance on immense computational resources, particularly when scaling from small to large datasets. The recursive evaluations and processes typically require excessive computational time and resources, which inhibit the deployment and practical usage of NAS methodologies in large-scale settings such as ImageNet. EAT-NAS addresses this by employing an "elastic architecture transfer" that transitions architectures optimized on small datasets like CIFAR-10 to larger datasets efficiently.

Method and Framework

The core method in EAT-NAS involves a two-stage process:

Initial Search on a Small Dataset: First, neural architectures are searched on a smaller dataset like CIFAR-10. The NAS leverages an evolutionary algorithm to identify optimal architectures based on a predefined search space characterized by varying convolution operations, kernel sizes, and network widths and depths.
Transfer and Further Optimization on a Large Dataset: The optimal architecture from the first stage acts as a seed for further search on a more extensive dataset like ImageNet. This process is expedited via an architecture perturbation function, which generates new architecture populations using the seed architecture as a baseline, thus transferring the 'knowledge' from small-scale to large-scale tasks.

The paper highlights a unique contribution of EAT-NAS, which is its innovative approach in the form of architecture-level transfer combined with an evolutionary algorithm, leading to efficient search processes without handcrafted adjustments. Moreover, EAT-NAS incorporates parameter sharing methods to explore and fine-tune the scales of architectures (depth and width) and employs an architecture perturbation function that allows for dynamic adaptability in the architecture primitives.

Results and Efficacy

In practical terms, two architectures, EATNet-A and EATNet-B, emerged from their process, achieving accuracies of 74.7% and 74.2%, respectively, on ImageNet. These results are significant as they surpass models generated by NAS methods that begin their search purely from scratch for such datasets and achieve this using orders of magnitude less computational cost in terms of GPU hours (less than five days on 8 NVIDIA TITAN X GPUs as opposed to over a month on hundreds of GPUs as in previous works).

Implications and Future Work

The implications of EAT-NAS are twofold:

Practical Efficiency: This model of knowledge transfer represents a practical step-forward in enabling NAS methods to be applied to large-scale datasets with significantly reduced computational demands. It suggests that architectures optimized on smaller datasets can effectively scale with correct transfer mechanisms, potentially broadening the adoption of NAS approaches in various domains.
Theoretical Exploration: The elastic nature of architecture transfer, as detailed in the paper, opens avenues for theoretical exploration into NAS optimization. This paper can be extended to incorporate other types of optimizations, such as reinforcement learning or gradient-based methods, expanding its applicability across diverse tasks including but not limited to object detection or semantic segmentation.

In conclusion, the EAT-NAS method presents a compelling case for optimizing the deployment of NAS methodologies across varying scales of datasets, providing effective solutions to some profound challenges in deep learning. The core byproduct of integrating elastic architecture transfer indicates a positive shift towards more resource-efficient machine learning practices. Future improvements might focus on integrating this elastic mechanism into different NAS techniques and exploring its performance across other vision-centric AI tasks.

PDF Markdown Bookmark Chat (Pro)

Authors (8)

Jiemin Fang (33 papers)
Yukang Chen (43 papers)
Xinbang Zhang (6 papers)
Qian Zhang (308 papers)
Chang Huang (46 papers)
Gaofeng Meng (41 papers)
Wenyu Liu (146 papers)
Xinggang Wang (163 papers)

Citations (23)

View on Semantic Scholar

EAT-NAS: Elastic Architecture Transfer for Accelerating Large-scale Neural Architecture Search (1901.05884v3)

Overview of EAT-NAS: Elastic Architecture Transfer for Accelerating Large-scale Neural Architecture Search

Method and Framework

Results and Efficacy

Implications and Future Work

Related Papers

GitHub

YouTube