- The paper introduces NAAS, a data-driven framework that holistically searches neural network architecture, accelerator architecture, and compiler mapping in a single optimization loop.
- NAAS leverages an importance-based encoding method to numerically represent design parameters and employs an evolution strategy to optimize for the Energy-Delay Product.
- Experiments demonstrate that NAAS achieves significant speed and energy efficiency improvements across various hardware platforms, showing strong adaptability for specialized hardware co-design.
NAAS: Neural Accelerator Architecture Search
In the contemporary landscape of neural network optimization, the quest for high-performance and energy-efficient execution has motivated the exploration of neural accelerator architecture design. The paper "NAAS: Neural Accelerator Architecture Search" introduces a data-driven approach to automatically explore the design space of neural accelerator architectures, tackling the triad of neural network design, accelerator design, and compiler mapping. This multi-faceted exploration is paramount for enabling both specialization and acceleration.
Overview
The paper presents NAAS (Neural Accelerator Architecture Search), a framework addressing the complexities inherent in co-designing neural architectures and hardware accelerators. Unlike previous frameworks focused primarily on sizing numerical architectural hyper-parameters, NAAS holistically searches neural network architecture, accelerator architecture, and compiler mapping within a singular optimization loop. This advanced approach aims to significantly enhance computational efficiency and performance through optimal matching of architectures and efficient mapping strategies.
Methodology
Design Space and Encoding:
The accelerator design encompasses architectural sizing parameters like the number of processing elements (PEs) and memory buffer sizes, alongside connectivity parameters, such as array shape and PE inter-connections. NAAS expands the design space by incorporating connectivity parameters, thus facilitating exploration beyond numerical attributes.
The paper leverages an innovative encoding method that converts non-numerical parameters (e.g., loop order and PE parallelism choices) into a numerical format suitable for optimization. This is achieved via an importance-based encoding method that sorts dimensions by generated importance values, influencing the parallelism and loop execution order.
Evolutionary Search:
NAAS utilizes an evolution strategy to optimize designs based on the Energy-Delay Product (EDP), balancing latency and energy efficiency. The optimization process involves sampling candidate solutions, evaluating them against predefined benchmarks, and iteratively refining the solution pool based on performance metrics.
Compiler Mapping Optimization:
Compiler mapping optimization, treated as a separate search task for each layer, focuses on execution order and tiling sizes. It employs a similar importance-based encoding for loop dimension orderings, promoting efficient data locality management during mapping.
Results and Implications
Experiments conducted within the paper demonstrate substantial improvements in speed and energy savings across various hardware platforms, including EdgeTPU, Eyeriss, and NVDLA configurations. NAAS achieves notable performance gains with specific benchmarks, offering architectural designs tailored to diverse neural network models and hardware constraints. The integration with Once-For-All NAS further enhances model accuracy while reducing energy-delay products compared to traditional designs.
The framework's ability to seamlessly integrate NAS with hardware design exploration signifies a pivotal advancement, potentially catalyzing future developments in AI that require specialized hardware solutions for efficient neural network execution.
Conclusion and Future Directions
The paper substantiates NAAS as a potent tool for comprehensive neural accelerator architecture co-design, markedly enhancing computational utilization and optimization efficiency. Its contribution persists in its low search cost and robust adaptability across hardware and neural architecture variations. Future research could build upon NAAS by investigating additional neural architecture variants, extending the framework's applicability to emerging models and workloads in AI.
Overall, NAAS offers a promising methodological approach for researchers and practitioners aiming to push the boundaries of neural network acceleration and specialized hardware design.