One-Shot Neural Architecture Search via Self-Evaluated Template Network

Published 13 Oct 2019 in cs.CV | (1910.05733v4)

Abstract: Neural architecture search (NAS) aims to automate the search procedure of architecture instead of manual design. Even if recent NAS approaches finish the search within days, lengthy training is still required for a specific architecture candidate to get the parameters for its accurate evaluation. Recently one-shot NAS methods are proposed to largely squeeze the tedious training process by sharing parameters across candidates. In this way, the parameters for each candidate can be directly extracted from the shared parameters instead of training them from scratch. However, they have no sense of which candidate will perform better until evaluation so that the candidates to evaluate are randomly sampled and the top-1 candidate is considered the best. In this paper, we propose a Self-Evaluated Template Network (SETN) to improve the quality of the architecture candidates for evaluation so that it is more likely to cover competitive candidates. SETN consists of two components: (1) an evaluator, which learns to indicate the probability of each individual architecture being likely to have a lower validation loss. The candidates for evaluation can thus be selectively sampled according to this evaluator. (2) a template network, which shares parameters among all candidates to amortize the training cost of generated candidates. In experiments, the architecture found by SETN achieves state-of-the-art performance on CIFAR and ImageNet benchmarks within comparable computation costs. Code is publicly available on GitHub: https://github.com/D-X-Y/AutoDL-Projects.

Abstract PDF Upgrade to Chat

Authors (2)

Citations (173)

View on Semantic Scholar

Summary

The paper introduces SETN, a one-shot neural architecture search method that reduces computational costs by sharing parameters among candidates.
It leverages an evaluator to estimate validation loss probabilities, enabling efficient and selective candidate sampling over random methods.
Experimental results on CIFAR-10, CIFAR-100, and ImageNet demonstrate state-of-the-art performance with a search time of only 1.8 GPU days.

One-Shot Neural Architecture Search via Self-Evaluated Template Network: An Overview

The paper "One-Shot Neural Architecture Search via Self-Evaluated Template Network" addresses the computational inefficiencies commonly associated with neural architecture search (NAS) by introducing the Self-Evaluated Template Network (SETN). The authors propose a novel methodology to enhance the selection of promising architectures, aiming to reduce the time and resources typically required for NAS while maintaining or improving the performance of the sampled neural networks.

Motivation and Approach

Traditional NAS methods often rely on techniques such as reinforcement learning (RL) and evolutionary algorithms, demanding substantial computational resources, sometimes necessitating thousands of GPU days. These methods typically involve the exhaustive evaluation of numerous architectures, a process that is not feasible for many practical applications due to its high cost.

SETN is conceptualized to circumvent these challenges by implementing a one-shot search strategy. This strategy involves sharing parameters across multiple architecture candidates, thus eliminating the need to train each candidate independently from scratch. SETN innovatively incorporates an evaluator to improve the quality of the candidate architectures selected for assessment. The evaluator predicts the likelihood that a given architecture will achieve lower validation loss, allowing for a more informed and efficient sampling of candidates compared to previous stochastic or random approaches.

Methodological Insights

SETN is constructed around a template network, which contains the full parameter space of potential candidate architectures. It consists of two primary components:

Evaluator: This component learns to estimate the probability that a candidate architecture will lead to reduced validation loss. By focusing on these probabilities, it allows for selective sampling, ensuring that only promising candidates are considered.
Template Network: Here, parameters are shared and optimized jointly via a stochastic strategy that samples architectures uniformly. This structure attempts to ensure that every candidate has an equal chance of being refined during the optimization process, thereby reducing any potential bias in training.

The training process alternates between optimizing the template network's parameters and refining the evaluator's architecture encoding parameters. This dual-optimization seeks to minimize validation loss, ultimately yielding architectures with potentially better generalization capabilities.

Experimental Evaluation

SETN is benchmarked against existing state-of-the-art NAS methodologies on standard datasets such as CIFAR-10, CIFAR-100, and ImageNet. Notably, the architecture discovered by SETN on CIFAR-10 exhibits state-of-the-art performance with a test error of 2.69% and achieves comparable results on CIFAR-100 and ImageNet, all within a significantly reduced computation time of just 1.8 GPU days.

The paper also explores SETN's scalability by expanding the search space and confirms that the approach remains efficient even as the complexity increases. Evaluations indicate that the selective sampling enabled by the evaluator results in notably better candidate architectures than those identified through random sampling methods.

Implications and Future Directions

The SETN framework offers a compelling solution to the efficiency challenges of neural architecture search, demonstrating that it is feasible to identify high-performance neural networks with markedly reduced computational expenditure. This development has practical implications, particularly in environments with limited computational resources, and positions SETN as a valuable tool for future research in efficient model discovery.

Looking forward, the authors suggest exploring more sophisticated training strategies to further refine the evaluator's accuracy and enhance the overall robustness of the search process. Additionally, expanding the methodology to accommodate even larger and more complex search spaces could yield architectures with broader applicability and enhanced performance.

In conclusion, the SETN presents a significant step forward in reducing the computational toll of NAS while maintaining high standards of performance, offering a balanced approach to architecture discovery in contemporary AI research.

Markdown Report Issue