Can weight sharing outperform random architecture search? An investigation with TuNAS

Published 13 Aug 2020 in cs.LG, cs.CV, and stat.ML | (2008.06120v1)

Abstract: Efficient Neural Architecture Search methods based on weight sharing have shown good promise in democratizing Neural Architecture Search for computer vision models. There is, however, an ongoing debate whether these efficient methods are significantly better than random search. Here we perform a thorough comparison between efficient and random search methods on a family of progressively larger and more challenging search spaces for image classification and detection on ImageNet and COCO. While the efficacies of both methods are problem-dependent, our experiments demonstrate that there are large, realistic tasks where efficient search methods can provide substantial gains over random search. In addition, we propose and evaluate techniques which improve the quality of searched architectures and reduce the need for manual hyper-parameter tuning. Source code and experiment data are available at https://github.com/google-research/google-research/tree/master/tunas

Abstract PDF Upgrade to Chat

Citations (117)

View on Semantic Scholar

Summary

The paper demonstrates that weight sharing in TuNAS significantly improves search efficiency and model accuracy compared to random search in large, complex search spaces.
The methodology integrates reinforcement learning and technique warmups, proving versatile across tasks like image classification and object detection.
The study reveals that optimizing output filter sizes and employing absolute reward functions advance latency control and reduce hyper-parameter tuning needs.

The paper "Can weight sharing outperform random architecture search? An investigation with TuNAS" by Gabriel Bender and colleagues addresses a critical question in the field of Neural Architecture Search (NAS): Can efficient NAS methods based on weight sharing truly outperform random search techniques in generating high-quality architectures? The study provides a comprehensive empirical analysis using TuNAS—an implementation of NAS with weight sharing—and demonstrates its efficacy over random search, particularly in large and complex search spaces.

Overview of the Research

Neural Architecture Search aims to automate the design of neural networks, optimizing for factors such as accuracy and inference latency. Traditional NAS methods were computationally expensive, necessitating the development of cost-effective alternatives like weight-sharing approaches. These methods, by sharing weights across different candidate networks, offer a significant reduction in search cost. However, their superiority over simpler random search methods has been uncertain, motivating the authors to rigorously investigate their relative effectiveness.

Methodology

The authors of this paper deploy TuNAS, a NAS method leveraging reinforcement learning to navigate the architecture search space efficiently. They evaluate TuNAS across three progressively complex search spaces, altering dimensions such as kernel size and filter count, including extensive experimentation on image classification tasks like ImageNet and COCO. The research highlights the intrinsic complexity of these search spaces and provides quantitative benchmarks to assert the efficacy of weight-sharing NAS strategies.

Key Findings

Superiority in Large Search Spaces: TuNAS consistently outperformed random search in large search spaces. Notably, TuNAS demonstrated significant accuracy improvements in complex settings, such as the MobileNetV3-Like space, indicating that weight sharing can be particularly beneficial for larger, realistic search spaces.
Robustness across Domains: The study extends TuNAS beyond image classification, exploring its applicability in object detection tasks on the COCO dataset. The method's adaptability indicates its potential utility across various domains within computer vision.
Importance of Output Filter Sizes: The paper shows that searching over output filter sizes significantly enhances model accuracy, highlighting an often-overlooked aspect in the literature that traditionally focuses on other architectural parameters.
Efficient Learning: The introduction of op and filter warmups in TuNAS is shown to improve weight training, subsequently leading to higher quality models. These techniques ensure that all components of a candidate architecture receive adequate gradient updates, reducing bias in the search process.
Latency Control with Absolute Value Rewards: The use of an absolute reward function effectively controls model latency while maximizing accuracy. This approach mitigates the requirement for extensive hyper-parameter tuning, a notable practical advantage over existing reward functions.

Implications and Future Directions

The clear advantage of weight-sharing based NAS methods demonstrated by TuNAS could potentially steer future NAS research towards exploring increasingly complex and diverse search spaces. The provision of adaptable solutions across varying hardware platforms and inference time constraints also makes TuNAS particularly promising for applications demanding real-time processing. Furthermore, the study presents a solid foundation for integrating these techniques into broader AI systems beyond computer vision, encouraging exploration into diverse AI-driven fields.

Future research could deepen understanding by analyzing the interaction of various hyper-parameters specifically tailored to other application domains. Additionally, extending the study to explore enhancements in gradient-based optimization techniques in NAS could yield further improvements in efficiency and model performance.

In conclusion, the investigation conducted with TuNAS substantially advances the understanding of weight-sharing strategies in NAS, firmly establishing their potential to surpass random search methods and achieve state-of-the-art results in neural network design.

Markdown