BayesNAS: A Bayesian Approach for Neural Architecture Search (1905.04919v2)

Published 13 May 2019 in cs.LG and stat.ML

Abstract: One-Shot Neural Architecture Search (NAS) is a promising method to significantly reduce search time without any separate training. It can be treated as a Network Compression problem on the architecture parameters from an over-parameterized network. However, there are two issues associated with most one-shot NAS methods. First, dependencies between a node and its predecessors and successors are often disregarded which result in improper treatment over zero operations. Second, architecture parameters pruning based on their magnitude is questionable. In this paper, we employ the classic Bayesian learning approach to alleviate these two issues by modeling architecture parameters using hierarchical automatic relevance determination (HARD) priors. Unlike other NAS methods, we train the over-parameterized network for only one epoch then update the architecture. Impressively, this enabled us to find the architecture on CIFAR-10 within only 0.2 GPU days using a single GPU. Competitive performance can be also achieved by transferring to ImageNet. As a byproduct, our approach can be applied directly to compress convolutional neural networks by enforcing structural sparsity which achieves extremely sparse networks without accuracy deterioration.

Authors (4)

Hongpeng Zhou (6 papers)
Minghao Yang (12 papers)
Jun Wang (991 papers)
Wei Pan (149 papers)

Citations (188)

View on Semantic Scholar

Summary

Bayesian Approach for Neural Architecture Search: A Comprehensive Analysis of BayesNAS

The paper "BayesNAS: A Bayesian Approach for Neural Architecture Search" addresses key challenges within the domain of Neural Architecture Search (NAS) by introducing a novel Bayesian learning approach. The researchers aim to optimize neural architecture efficiently, mitigating common issues identified in existing one-shot NAS methodologies. More specifically, they focus on problems such as the oversight of dependencies between nodes and the questionable practice of pruning architecture parameters based solely on magnitude.

Core Methodology

The central innovation of the paper is the application of Bayesian learning, fundamentally characterized by the introduction of hierarchical automatic relevance determination (HARD) priors to model architecture parameters. This approach allows for a principled method of handling zero operations and dependency issues, which are prevalent pitfalls in conventional one-shot NAS approaches. The researchers propose training an over-parameterized network for a single epoch before updating the architecture, a strategy that significantly reduces computational demands to just 0.2 GPU days when tested on the CIFAR-10 dataset using a single GPU.

Technical Contributions and Results

The contributions of BayesNAS are noteworthy in several aspects:

Bayesian Framework: BayesNAS represents the first Bayesian methodology applied to one-shot NAS. The hierarchical sparse priors not only assist in addressing parameter dependencies but also encourage model sparsity. This enhancement ensures that derived networks remain connected and valid post-pruning.
Performance and Efficiency: The proposed Bayesian-based algorithm is formulated to operate as an iteratively re-weighted $\ell_1$ optimization algorithm, encompassing flexibility and reduced complexity. With fast Hessian computation, BayesNAS enables efficient approximation and optimization for large networks, setting a new benchmark for computational efficiency in NAS.
Empirical Success: The empirical results on CIFAR-10 and subsequent transferability to ImageNet showcase BayesNAS’ capability to derive sparse architectures with competitive performance metrics. The method achieves a test error rate on CIFAR-10 that is close to state-of-the-art techniques, using architectures with fewer parameters than traditional approaches.
Network Compression: As a corollary, the framework inherently allows for convolutional neural network compression by enforcing structural sparsity. This leads to highly sparse networks without compromising on accuracy, demonstrating the applicability of BayesNAS beyond NAS to enhance neural network efficiency on resource-constrained devices.

Implications and Future Directions

The theoretical and empirical findings in BayesNAS underscore significant implications for both NAS and broader network optimization fields. By embedding a Bayesian approach, the model not only resolves key architectural dependency constraints but also introduces a metric grounded in uncertainty estimation for parameter pruning—eschewing the inadequacies of magnitude-based metrics.

Looking forward, BayesNAS opens multiple avenues for further exploration. Expanding the search space, optimizing Hessian computation methods for scale, and exploring the adaptation of Bayesian principles in other NAS frameworks could yield considerable advancements. Additionally, the integration of BayesNAS into broader AI systems, such as automated machine learning pipelines, is a prospect worth investigating to harness its full potential.

In conclusion, BayesNAS marks a pivotal step towards more efficient, less resource-intensive, and architecturally-sound neural network design, emphasizing the crucial role of Bayesian inference in evolving one-shot NAS methodologies.

PDF Markdown

Related Papers

Find Related Papers