- The paper presents a stochastic NAS method that uses gradient-based optimization over a joint distribution to update both operation and architecture parameters.
- Experimental results on CIFAR-10 demonstrate a test error of 2.85% with only 2.8M parameters, outperforming traditional deterministic and RL-based approaches.
- The approach reduces computational cost by completing the search in 32 hours on a single GPU, with architectures that reliably transfer to ImageNet.
SNAS: Stochastic Neural Architecture Search
The paper "SNAS: stochastic neural architecture search" introduces a new method for Neural Architecture Search (NAS) that aims to achieve high efficiency and performance by leveraging gradient-based optimization within a fully differentiable framework. This method, termed Stochastic Neural Architecture Search (SNAS), reformulates NAS as an optimization problem over the parameters of a joint distribution for the search space in a neural cell.
SNAS addresses the computational inefficiencies and biases found in previous NAS methods, particularly those based on evolutionary algorithms such as NEAT and reinforcement learning-based approaches like ENAS. Traditional methods such as DARTS fail to maintain performance consistency between the derived and parent networks due to inherent biases in their deterministic attention mechanisms. This inconsistency often necessitates additional parameter retraining to achieve acceptable performance levels. SNAS proposes a stochastic model that updates neural operation parameters and architecture distribution parameters simultaneously during back-propagation.
Methodology
SNAS models NAS as an optimization problem over a joint distribution of possible neural operations, transforming the problem into a differentiable one by employing concrete distributions and the reparameterization trick. This approach allows for gradient-based optimization of neural architecture parameters:
- Search Space Representation: The search space, represented as a Directed Acyclic Graph (DAG) in each cell, is parameterized by a fully factorizable joint distribution with operations selected via one-hot random variables.
- Optimization Objective: The training loss is treated as the reward, thus optimizing the expectation of the loss over the distribution of possible architectures.
- Gradient Calculation: Gradients are calculated both for the operation parameters and the architecture parameters, allowing unified optimization.
SNAS introduces a novel type of gradient, termed the 'search gradient,' which effectively leverages the gradient information in the generic differentiable loss function used for architecture search. The search gradient assigns credits to structural decisions in a more efficient manner than reinforcement learning-based NAS methods like ENAS.
Experiments and Results
Experiments were conducted on the CIFAR-10 dataset to validate the proposed method:
- Architecture Discovery on CIFAR-10: SNAS demonstrated superior performance by achieving a test error rate of 2.85% with only 2.8 million parameters, outperforming first-order DARTS and ENAS. It also maintained high validation accuracy during the search process, thus avoiding the need for parameter retraining seen in DARTS.
- Efficiency: The search process was significantly faster than traditional methods, taking only 32 hours on a single GPU as opposed to the thousands of GPU days required by evolutionary methods.
- Transferability: The discovered architectures on CIFAR-10, when transferred and evaluated on the ImageNet dataset, achieved competitive performance with state-of-the-art methods, further demonstrating the robustness and efficiency of SNAS.
Implications and Future Work
SNAS provides a highly efficient and less-biased NAS framework. The key practical benefit is the substantial reduction in computational resources and time required to discover high-performing neural architectures, making SNAS an attractive approach for large-scale NAS applications. Theoretically, the use of a stochastic approach combined with gradient-based optimization advances the understanding of efficient NAS design, potentially opening new pathways for further optimization and improvement.
Future research may focus on extending SNAS to more complex tasks beyond image classification, such as object detection and segmentation on large datasets. Additionally, exploring more sophisticated factorizations of the architecture distribution and further reducing the computational overhead of the search process could yield even greater efficiency and performance improvements in NAS.