SNAS: Stochastic Neural Architecture Search (1812.09926v3)

Published 24 Dec 2018 in cs.LG, cs.AI, and stat.ML

Abstract: We propose Stochastic Neural Architecture Search (SNAS), an economical end-to-end solution to Neural Architecture Search (NAS) that trains neural operation parameters and architecture distribution parameters in same round of back-propagation, while maintaining the completeness and differentiability of the NAS pipeline. In this work, NAS is reformulated as an optimization problem on parameters of a joint distribution for the search space in a cell. To leverage the gradient information in generic differentiable loss for architecture search, a novel search gradient is proposed. We prove that this search gradient optimizes the same objective as reinforcement-learning-based NAS, but assigns credits to structural decisions more efficiently. This credit assignment is further augmented with locally decomposable reward to enforce a resource-efficient constraint. In experiments on CIFAR-10, SNAS takes less epochs to find a cell architecture with state-of-the-art accuracy than non-differentiable evolution-based and reinforcement-learning-based NAS, which is also transferable to ImageNet. It is also shown that child networks of SNAS can maintain the validation accuracy in searching, with which attention-based NAS requires parameter retraining to compete, exhibiting potentials to stride towards efficient NAS on big datasets. We have released our implementation at https://github.com/SNAS-Series/SNAS-Series.

Authors (4)

Sirui Xie (19 papers)
Hehui Zheng (10 papers)
Chunxiao Liu (53 papers)
Liang Lin (319 papers)

Citations (903)

View on Semantic Scholar

Summary

The paper presents a stochastic NAS method that uses gradient-based optimization over a joint distribution to update both operation and architecture parameters.
Experimental results on CIFAR-10 demonstrate a test error of 2.85% with only 2.8M parameters, outperforming traditional deterministic and RL-based approaches.
The approach reduces computational cost by completing the search in 32 hours on a single GPU, with architectures that reliably transfer to ImageNet.

SNAS: Stochastic Neural Architecture Search

The paper "SNAS: stochastic neural architecture search" introduces a new method for Neural Architecture Search (NAS) that aims to achieve high efficiency and performance by leveraging gradient-based optimization within a fully differentiable framework. This method, termed Stochastic Neural Architecture Search (SNAS), reformulates NAS as an optimization problem over the parameters of a joint distribution for the search space in a neural cell.

SNAS addresses the computational inefficiencies and biases found in previous NAS methods, particularly those based on evolutionary algorithms such as NEAT and reinforcement learning-based approaches like ENAS. Traditional methods such as DARTS fail to maintain performance consistency between the derived and parent networks due to inherent biases in their deterministic attention mechanisms. This inconsistency often necessitates additional parameter retraining to achieve acceptable performance levels. SNAS proposes a stochastic model that updates neural operation parameters and architecture distribution parameters simultaneously during back-propagation.

Methodology

SNAS models NAS as an optimization problem over a joint distribution of possible neural operations, transforming the problem into a differentiable one by employing concrete distributions and the reparameterization trick. This approach allows for gradient-based optimization of neural architecture parameters:

Search Space Representation: The search space, represented as a Directed Acyclic Graph (DAG) in each cell, is parameterized by a fully factorizable joint distribution with operations selected via one-hot random variables.
Optimization Objective: The training loss is treated as the reward, thus optimizing the expectation of the loss over the distribution of possible architectures.
Gradient Calculation: Gradients are calculated both for the operation parameters and the architecture parameters, allowing unified optimization.

SNAS introduces a novel type of gradient, termed the 'search gradient,' which effectively leverages the gradient information in the generic differentiable loss function used for architecture search. The search gradient assigns credits to structural decisions in a more efficient manner than reinforcement learning-based NAS methods like ENAS.

Experiments and Results

Experiments were conducted on the CIFAR-10 dataset to validate the proposed method:

Architecture Discovery on CIFAR-10: SNAS demonstrated superior performance by achieving a test error rate of 2.85% with only 2.8 million parameters, outperforming first-order DARTS and ENAS. It also maintained high validation accuracy during the search process, thus avoiding the need for parameter retraining seen in DARTS.
Efficiency: The search process was significantly faster than traditional methods, taking only 32 hours on a single GPU as opposed to the thousands of GPU days required by evolutionary methods.
Transferability: The discovered architectures on CIFAR-10, when transferred and evaluated on the ImageNet dataset, achieved competitive performance with state-of-the-art methods, further demonstrating the robustness and efficiency of SNAS.

Implications and Future Work

SNAS provides a highly efficient and less-biased NAS framework. The key practical benefit is the substantial reduction in computational resources and time required to discover high-performing neural architectures, making SNAS an attractive approach for large-scale NAS applications. Theoretically, the use of a stochastic approach combined with gradient-based optimization advances the understanding of efficient NAS design, potentially opening new pathways for further optimization and improvement.

Future research may focus on extending SNAS to more complex tasks beyond image classification, such as object detection and segmentation on large datasets. Additionally, exploring more sophisticated factorizations of the architecture distribution and further reducing the computational overhead of the search process could yield even greater efficiency and performance improvements in NAS.

PDF Markdown

Related Papers

GitHub

GitHub - SNAS-Series/SNAS-Series: This repo contains the PyTorch implementation of the SNAS-Series papers (148 stars)