DSNAS: Direct Neural Architecture Search without Parameter Retraining (2002.09128v2)

Published 21 Feb 2020 in cs.LG and stat.ML

Abstract: If NAS methods are solutions, what is the problem? Most existing NAS methods require two-stage parameter optimization. However, performance of the same architecture in the two stages correlates poorly. In this work, we propose a new problem definition for NAS, task-specific end-to-end, based on this observation. We argue that given a computer vision task for which a NAS method is expected, this definition can reduce the vaguely-defined NAS evaluation to i) accuracy of this task and ii) the total computation consumed to finally obtain a model with satisfying accuracy. Seeing that most existing methods do not solve this problem directly, we propose DSNAS, an efficient differentiable NAS framework that simultaneously optimizes architecture and parameters with a low-biased Monte Carlo estimate. Child networks derived from DSNAS can be deployed directly without parameter retraining. Comparing with two-stage methods, DSNAS successfully discovers networks with comparable accuracy (74.4%) on ImageNet in 420 GPU hours, reducing the total time by more than 34%. Our implementation is available at https://github.com/SNAS-Series/SNAS-Series.

Citations (125)

View on Semantic Scholar

Summary

The paper introduces DSNAS, a novel end-to-end framework for Neural Architecture Search that eliminates the typical parameter retraining stage using a differentiable approach and novel search gradient.
DSNAS achieves comparable ImageNet accuracy (74.4%) while reducing computational resources by 34% compared to existing NAS methods by instantiating only sampled subnetworks.
This single-stage method makes Neural Architecture Search significantly more computationally feasible and opens avenues for integrating topology search or expanding to other domains beyond computer vision.

Direct Neural Architecture Search without Parameter Retraining: A Critical Analysis

The authors present Discrete Stochastic Neural Architecture Search (DSNAS), an innovative framework in the domain of Neural Architecture Search (NAS), which aims to streamline the typically resource-intensive process of determining optimal neural network architectures for specific computer vision tasks. Unlike most existing NAS methodologies, which typically rely on a two-stage approach involving separate parameter optimization and retraining phases, DSNAS proposes an end-to-end strategy that mitigates the need for parameter retraining, thereby enhancing computational efficiency.

Theoretical and Practical Implications

The paper tackles a critical issue in NAS: the low correlation between architecture performance in the search and retraining phases in conventional two-stage methods. The authors redefine the NAS problem as task-specific end-to-end architecture optimization, emphasizing the retrieval of ready-to-deploy networks in a single-stage process. This transition aims to rectify ambiguous NAS evaluation metrics, focusing solely on task-specific accuracy and computational efficiency.

The introduction of DSNAS itself leverages a differentiable NAS framework that integrates the advantages of stochastic search and efficiency-focused differentiable methods. By employing a novel search gradient, DSNAS eliminates the necessity for architecture retraining post-search. The authors further introduce the concept of low-biased Monte Carlo estimation to enhance the precision and robustness of architecture and parameter optimization concurrently.

Methodology and Results

The framework's efficacy is demonstrated through experiments on the ImageNet classification task. DSNAS achieves a top-1 accuracy of 74.4% which is comparable to existing methods, while significantly reducing the required computational resources by 34%. This efficiency is achieved without compromising the task-specific accuracy of the derived architectures, marking a significant stride in NAS methodologies. The single-path implementation of DSNAS exemplifies its power, showcasing robust performance with substantial reductions in computational overhead typically associated with NAS frameworks.

A breakdown of the framework's computational requirements indicates substantial improvements over previous differentiable NAS techniques. Specifically, the proposed method operates with reduced memory and computational complexity, as only the sampled subnetwork is instantiated during the optimization process, unlike methods that demand the entire parent network's instantiation.

Future Directions

The formulation of task-specific end-to-end NAS and the development of DSNAS open up numerous avenues for future research. This includes the integration of DSNAS with other techniques, such as random wiring solutions to explore joint searches of topology, operations, and parameters. Additionally, extending the framework to other domains beyond computer vision can further solidify its utility and robustness in diverse machine learning applications.

Moving forward, the research community could benefit from extending the DSNAS framework to incorporate network scale adjustments, akin to what's been explored in EfficientNet, to push the boundaries of accuracy without significantly increasing complexity. Moreover, exploring the potential synergies between DSNAS and hyperparameter optimization frameworks could offer more holistic solutions to model training and deployment challenges.

In summary, DSNAS represents a considerable advancement in making NAS more computationally feasible and efficient, particularly for practitioners aiming for task-specific neural network deployment. The insights and methodologies introduced in this paper lay the groundwork for more efficient and precise NAS applications, potentially accelerating the development and application of machine learning models across various industries.

PDF Markdown

Related Papers

GitHub

GitHub - SNAS-Series/SNAS-Series: This repo contains the PyTorch implementation of the SNAS-Series papers (145 stars)