DNA Family: Boosting Weight-Sharing NAS with Block-Wise Supervisions (2403.01326v1)

Published 2 Mar 2024 in cs.CV

Abstract: Neural Architecture Search (NAS), aiming at automatically designing neural architectures by machines, has been considered a key step toward automatic machine learning. One notable NAS branch is the weight-sharing NAS, which significantly improves search efficiency and allows NAS algorithms to run on ordinary computers. Despite receiving high expectations, this category of methods suffers from low search effectiveness. By employing a generalization boundedness tool, we demonstrate that the devil behind this drawback is the untrustworthy architecture rating with the oversized search space of the possible architectures. Addressing this problem, we modularize a large search space into blocks with small search spaces and develop a family of models with the distilling neural architecture (DNA) techniques. These proposed models, namely a DNA family, are capable of resolving multiple dilemmas of the weight-sharing NAS, such as scalability, efficiency, and multi-modal compatibility. Our proposed DNA models can rate all architecture candidates, as opposed to previous works that can only access a subsearch space using heuristic algorithms. Moreover, under a certain computational complexity constraint, our method can seek architectures with different depths and widths. Extensive experimental evaluations show that our models achieve state-of-the-art top-1 accuracy of 78.9% and 83.6% on ImageNet for a mobile convolutional network and a small vision transformer, respectively. Additionally, we provide in-depth empirical analysis and insights into neural architecture ratings. Codes available: \url{https://github.com/changlin31/DNA}.

PDF HTML Abstract

DNA Family: A New Perspective for Enhancing Weight-Sharing NAS with Block-Wise Supervision

Introduction

Neural Architecture Search (NAS) methodologies have been pivotal in automating the design of neural network architectures, a core step towards achieving complete machine-led innovation in model generation. Among the numerous approaches in NAS, weight-sharing NAS has emerged as a promising direction due to its notable efficiency improvements. This technique allows for a significant reduction in computational resource requirements, permitting research endeavors on more modest hardware setups. However, this approach is plagued by a key drawback: the effectiveness of the search process is compromised, primarily due to the unreliable architecture rating, a direct consequence of the overwhelmingly large search space. To tackle this, the introduction of the DNA family, inclusive of the Distilling Neural Architecture (DNA), DNA+, and DNA++, proposes a novel methodology capable of enhancing the weight-sharing NAS framework through block-wise supervision. Through a generalized boundedness analysis, this research underscores the importance of modularizing the search space into smaller blocks to improve search efficiency and reliability.

The Drawback of Weight-Sharing NAS

The core challenge in weight-sharing NAS is identified as the inaccurate architecture rating within a vast search space. The generalization boundedness tool demonstrates that as the search space broadens, the supernet's ability to generalize diminishes, leading to unreliable architecture assessments. This finding specifies that the root of ineffective searches in weight-sharing NAS is primarily the oversized search space. Consequently, the paper suggests a strategic modularization of the search space into blocks as a pivotal solution to this hurdle.

The DNA Family: Addressing NAS Challenges

The proposed solution leverages distilling neural architecture techniques to optimize the block-wise representation of the search space, thereby reducing its size significantly. The DNA family consists of three models, each offering unique advantages in addressing specific issues:

DNA: Utilizes traditional supervised learning with distillation techniques to efficiently train multiple student supernets simultaneously.
DNA+: Incorporates a progressive learning approach that keeps updating the teacher network, offering better scalability by adapting to the improved capacity of the search model iteratively.
DNA++: Embraces self-supervised learning to optimize the teacher network and student supernets jointly, allowing a more versatile compatibility across various architectural designs.

The modularization of the search space permits these models to evaluate all candidate architectures, overcoming the limitations of previous methods that only explored sub-search spaces.

Empirical Analysis and Insights

The DNA family has been extensively evaluated against state-of-the-art benchmarks, showcasing impressive performance improvements, especially in the context of mobile convolutional networks and small vision transformers. Moreover, this work delves deeper into the field of neural architecture ratings, presenting a comprehensive analysis that highlights the cause of inefficiencies in conventional weight-sharing NAS approaches.

Future Directions in AI and NAS

The introduction of the DNA family not only marks a significant stride in NAS research but also opens avenues for future developments in AI. By addressing the scalability and effectiveness of the architecture search process, this method paves the way for more nuanced and efficient machine learning models across various domains. The synthesis of block-wise supervision with distillation techniques presents a robust framework for enhancing model discovery without compromising on computational efficiency.

In essence, the DNA family embodies a strategic breakthrough in the continual quest for automated machine learning, promising a new era of innovation and efficiency in model design.

PDF Markdown Bookmark Chat (Pro)

References (104)

Authors (8)

Guangrun Wang (43 papers)
Changlin Li (28 papers)
Liuchun Yuan (5 papers)
Jiefeng Peng (8 papers)
Xiaoyu Xian (10 papers)
Xiaodan Liang (318 papers)
Xiaojun Chang (148 papers)
Liang Lin (318 papers)

Citations (1)

View on Semantic Scholar