Simple And Efficient Architecture Search for Convolutional Neural Networks (1711.04528v1)

Published 13 Nov 2017 in stat.ML, cs.AI, and cs.LG

Abstract: Neural networks have recently had a lot of success for many tasks. However, neural network architectures that perform well are still typically designed manually by experts in a cumbersome trial-and-error process. We propose a new method to automatically search for well-performing CNN architectures based on a simple hill climbing procedure whose operators apply network morphisms, followed by short optimization runs by cosine annealing. Surprisingly, this simple method yields competitive results, despite only requiring resources in the same order of magnitude as training a single network. E.g., on CIFAR-10, our method designs and trains networks with an error rate below 6% in only 12 hours on a single GPU; training for one day reduces this error further, to almost 5%.

Citations (221)

View on Semantic Scholar

Summary

The paper presents Neural Architecture Search by Hill-climbing (NASH), which uses network morphisms to iteratively evolve CNN architectures.
It employs a hill climbing strategy with cosine annealing and SGDR for efficient training, yielding competitive error rates on benchmarks.
The approach significantly reduces computational resources, democratizing neural architecture design and inspiring new research in efficient search techniques.

An Overview of "Simple and Efficient Architecture Search for Convolutional Neural Networks"

The paper "Simple and Efficient Architecture Search for Convolutional Neural Networks" presents a method for automating the search for effective convolutional neural network (CNN) architectures, addressing a core challenge in the design and optimization of neural networks. Unlike conventional, manually-intensive design processes, this method utilizes a straightforward hill climbing algorithm, combined with network morphisms and cosine annealing for efficient exploration and optimization of network architectures.

Key Contributions and Methodology

The authors propose a method called Neural Architecture Search by Hill-climbing (NASH), which leverages network morphisms to create a search space for neural architectures that can be efficiently explored. The process begins with a simple network and involves iterative application of network morphisms to generate new network configurations. The key contribution of the paper is the demonstration that this approach can yield competitive results with significantly lower computational requirements compared to prior methods. The novel aspects of the method are:

Network Morphisms: The paper extends the concept of network morphisms as transformations to alter neural network architectures while preserving their functionality. These transformations allow the network to be widened, deepened, or connected via skip connections, optimizing both network performance and architecture search space.
Hill Climbing Algorithm: NASH employs a hill climbing strategy wherein child networks produced by morphisms are briefly trained, and the search proceeds with the network that shows the most promise in terms of performance.
Cosine Annealing with SGDR: The proposed approach uses SGDR (Stochastic Gradient Descent with Restarts) for efficient training during the optimization runs. The cosine learning rate schedule contributes to the anytime performance, facilitating rapid evaluation of architectures during search.
Efficiency and Results: Demonstrating efficiency, NASH searches and trains networks on the CIFAR-10 dataset with less than 6% error in approximately 12 hours using a single GPU, achieving further improvements with prolonged training. Comparable results are achieved on CIFAR-100.

Performance and Implications

The experimental results indicate that NASH is capable of producing architectures with competitive error rates compared to state-of-the-art manually engineered architectures and other automated architecture search methods. Notably, it does so with considerably fewer resources, which is a significant practical advantage in expanding research accessibility and broadening application in various domains.

Future Directions and Theoretical Implications

The paper opens several avenues for future research. Exploring more sophisticated evolutionary strategies alongside network morphisms could enhance the search process. Additionally, integrating better performance prediction mechanisms could further reduce evaluation overhead. Future work could also elaborate on incorporating hypernetworks for predictive weight sharing or integrating approaches like Hyperband for resource allocation.

Theoretically, the simplification and efficiency demonstrated by NASH could inspire further research into the convergence properties and optimal conditions of search algorithms in discrete and conditional spaces. Additionally, the principles of architecture morphology by morphisms could extend to other neural network forms beyond CNNs.

Conclusion

This paper makes significant strides in making the architecture search process for CNNs more accessible and cost-effective without compromising on performance. It not only challenges the conventional paradigms of extensive resource-dependent search methods, but it also sets a new standard for efficiency in neural architecture search. As a foundation for future research, NASH presents a promising approach for democratizing architecture design and optimizing neural network performance across varied tasks and datasets.

PDF Markdown