NAAS: Neural Accelerator Architecture Search

Published 27 May 2021 in cs.LG and cs.AR | (2105.13258v1)

Abstract: Data-driven, automatic design space exploration of neural accelerator architecture is desirable for specialization and productivity. Previous frameworks focus on sizing the numerical architectural hyper-parameters while neglect searching the PE connectivities and compiler mappings. To tackle this challenge, we propose Neural Accelerator Architecture Search (NAAS) which holistically searches the neural network architecture, accelerator architecture, and compiler mapping in one optimization loop. NAAS composes highly matched architectures together with efficient mapping. As a data-driven approach, NAAS rivals the human design Eyeriss by 4.4x EDP reduction with 2.7% accuracy improvement on ImageNet under the same computation resource, and offers 1.4x to 3.5x EDP reduction than only sizing the architectural hyper-parameters.

Abstract PDF Upgrade to Chat

Citations (55)

View on Semantic Scholar

Summary

The paper introduces NAAS, a data-driven framework that holistically searches neural network architecture, accelerator architecture, and compiler mapping in a single optimization loop.
NAAS leverages an importance-based encoding method to numerically represent design parameters and employs an evolution strategy to optimize for the Energy-Delay Product.
Experiments demonstrate that NAAS achieves significant speed and energy efficiency improvements across various hardware platforms, showing strong adaptability for specialized hardware co-design.

NAAS: Neural Accelerator Architecture Search

In the contemporary landscape of neural network optimization, the quest for high-performance and energy-efficient execution has motivated the exploration of neural accelerator architecture design. The paper "NAAS: Neural Accelerator Architecture Search" introduces a data-driven approach to automatically explore the design space of neural accelerator architectures, tackling the triad of neural network design, accelerator design, and compiler mapping. This multi-faceted exploration is paramount for enabling both specialization and acceleration.

Overview

The paper presents NAAS (Neural Accelerator Architecture Search), a framework addressing the complexities inherent in co-designing neural architectures and hardware accelerators. Unlike previous frameworks focused primarily on sizing numerical architectural hyper-parameters, NAAS holistically searches neural network architecture, accelerator architecture, and compiler mapping within a singular optimization loop. This advanced approach aims to significantly enhance computational efficiency and performance through optimal matching of architectures and efficient mapping strategies.

Methodology

Design Space and Encoding:

The accelerator design encompasses architectural sizing parameters like the number of processing elements (PEs) and memory buffer sizes, alongside connectivity parameters, such as array shape and PE inter-connections. NAAS expands the design space by incorporating connectivity parameters, thus facilitating exploration beyond numerical attributes.

The paper leverages an innovative encoding method that converts non-numerical parameters (e.g., loop order and PE parallelism choices) into a numerical format suitable for optimization. This is achieved via an importance-based encoding method that sorts dimensions by generated importance values, influencing the parallelism and loop execution order.

Evolutionary Search:

NAAS utilizes an evolution strategy to optimize designs based on the Energy-Delay Product (EDP), balancing latency and energy efficiency. The optimization process involves sampling candidate solutions, evaluating them against predefined benchmarks, and iteratively refining the solution pool based on performance metrics.

Compiler Mapping Optimization:

Compiler mapping optimization, treated as a separate search task for each layer, focuses on execution order and tiling sizes. It employs a similar importance-based encoding for loop dimension orderings, promoting efficient data locality management during mapping.

Results and Implications

Experiments conducted within the paper demonstrate substantial improvements in speed and energy savings across various hardware platforms, including EdgeTPU, Eyeriss, and NVDLA configurations. NAAS achieves notable performance gains with specific benchmarks, offering architectural designs tailored to diverse neural network models and hardware constraints. The integration with Once-For-All NAS further enhances model accuracy while reducing energy-delay products compared to traditional designs.

The framework's ability to seamlessly integrate NAS with hardware design exploration signifies a pivotal advancement, potentially catalyzing future developments in AI that require specialized hardware solutions for efficient neural network execution.

Conclusion and Future Directions

The paper substantiates NAAS as a potent tool for comprehensive neural accelerator architecture co-design, markedly enhancing computational utilization and optimization efficiency. Its contribution persists in its low search cost and robust adaptability across hardware and neural architecture variations. Future research could build upon NAAS by investigating additional neural architecture variants, extending the framework's applicability to emerging models and workloads in AI.

Overall, NAAS offers a promising methodological approach for researchers and practitioners aiming to push the boundaries of neural network acceleration and specialized hardware design.

Markdown