Multinomial Distribution Learning for Effective Neural Architecture Search (1905.07529v3)

Published 18 May 2019 in cs.LG, cs.CV, and cs.NE

Abstract: Architectures obtained by Neural Architecture Search (NAS) have achieved highly competitive performance in various computer vision tasks. However, the prohibitive computation demand of forward-backward propagation in deep neural networks and searching algorithms makes it difficult to apply NAS in practice. In this paper, we propose a Multinomial Distribution Learning for extremely effective NAS,which considers the search space as a joint multinomial distribution, i.e., the operation between two nodes is sampled from this distribution, and the optimal network structure is obtained by the operations with the most likely probability in this distribution. Therefore, NAS can be transformed to a multinomial distribution learning problem, i.e., the distribution is optimized to have a high expectation of the performance. Besides, a hypothesis that the performance ranking is consistent in every training epoch is proposed and demonstrated to further accelerate the learning process. Experiments on CIFAR10 and ImageNet demonstrate the effectiveness of our method. On CIFAR-10, the structure searched by our method achieves 2.55% test error, while being 6.0x (only 4 GPU hours on GTX1080Ti) faster compared with state-of-the-art NAS algorithms. On ImageNet, our model achieves 75.2% top1 accuracy under MobileNet settings (MobileNet V1/V2), while being 1.2x faster with measured GPU latency. Test code with pre-trained models are available at https://github.com/tanglang96/MDENAS

PDF Abstract

Overview of Multinomial Distribution Learning for Effective Neural Architecture Search

The paper presents a novel approach to Neural Architecture Search (NAS) by introducing a technique termed as Multinomial Distribution Learning (MDL). NAS, in general, aims to discover high-performance neural network architectures for a given dataset, typically through a computationally intensive search process. The authors address this challenge by reconceptualizing NAS as a learning problem over a multinomial distribution, where the goal is to identify optimal network structures through the most probable operations in this distribution.

Key Contributions

Multinomial Distribution Framework: The authors propose treating the operations within a neural architecture search space as samples from a joint multinomial distribution. Instead of relying on exhaustive search or reinforcement learning approaches, which often involve substantial computational overheads, this method focuses on learning the distribution directly to optimize the architecture's performance.
Performance Ranking Hypothesis: A central hypothesis posited by the paper is that the performance ranking of architectural candidates is stable across training epochs. By leveraging this hypothesis, the authors propose an accelerated evaluation process where architectures are ranked early in the training phase, reducing the necessity for complete convergence of models before evaluation.
Efficiency and Effectiveness: The proposed method demonstrates significant improvements in both computational efficiency and model performance when compared to existing NAS approaches. Notably, the algorithm achieves a CIFAR-10 test error rate of 2.55% using only 4 GPU hours for searching, which is a substantial improvement over previous state-of-the-art NAS methods requiring multiple times more computational resources.
Network Transferability: The research further explores the transferability of the discovered architectures to other datasets like ImageNet. The discovered model achieves competitive accuracy on ImageNet (75.2% top-1) under mobile settings, highlighting its robustness across tasks and conditions.

Experimental Validation

The proposed MDL for NAS is validated through extensive experimentation on CIFAR-10. The results demonstrate that the multinomial distribution approach not only reduces search time and computational resource usage, but also achieves competitive or even superior performance compared to manually designed or previously searched architectures. Further experiments on ImageNet show that the architectures discovered on CIFAR-10 generalize well to larger and more complex datasets, illustrating the efficacy of the learned architecture.

Implications and Future Directions

The contribution of this research is significant in the context of automated machine learning, where reducing computational burden without compromising performance is a critical goal. By framing NAS as a distribution optimization problem, this work opens up new avenues for developing efficient and scalable NAS algorithms.

Future developments might focus on refining the distribution learning model to further enhance its efficiency and accuracy. The current approach, while effective, could be extended to more diverse tasks and network types beyond convolutional architectures. Additionally, integrating this methodology with other optimization frameworks, perhaps those tailored more for specific domains or architectural quirks, could yield even more efficient solutions.

In conclusion, this paper offers a promising and efficient approach to neural architecture search by reimagining how architectures are evaluated and optimized within the search process. The successful validation on benchmark datasets further underlines the potential impact of this method for real-world AI applications, advocating for broader adoption and adaptation across diverse machine learning challenges.