Restoring the Broken Covenant Between Compilers and Deep Learning Accelerators (2310.17912v1)

Published 27 Oct 2023 in cs.DC

Abstract: Deep learning accelerators address the computational demands of Deep Neural Networks (DNNs), departing from the traditional Von Neumann execution model. They leverage specialized hardware to align with the application domain's structure. Compilers for these accelerators face distinct challenges compared to those for general-purpose processors. These challenges include exposing and managing more micro-architectural features, handling software-managed scratch pads for on-chip storage, explicitly managing data movement, and matching DNN layers with varying hardware capabilities. These complexities necessitate a new approach to compiler design, as traditional compilers mainly focused on generating fine-grained instruction sequences while abstracting micro-architecture details. This paper introduces the Architecture Covenant Graph (ACG), an abstract representation of an architectural structure's components and their programmable capabilities. By enabling the compiler to work with the ACG, it allows for adaptable compilation workflows when making changes to accelerator design, reducing the need for a complete compiler redevelopment. Codelets, which express DNN operation functionality and evolve into execution mappings on the ACG, are key to this process. The Covenant compiler efficiently targets diverse deep learning accelerators, achieving 93.8% performance compared to state-of-the-art, hand-tuned DNN layer implementations when compiling 14 DNN layers from various models on two different architectures.

Citations (2)

View on Semantic Scholar

Summary

The paper introduces the Covenant compiler that bridges traditional compilers with specialized deep learning accelerators using the Architecture Covenant Graph.
It employs Codelets to abstract DNN operations, enabling automated code generation that rivals manually-tuned schedules on Qualcomm HVX and other accelerators.
The framework offers a unified approach to compiling for diverse DL hardware, potentially reducing development costs and inspiring future AI hardware-software co-design.

Summary of "Restoring the Broken Covenant Between Compilers and Deep Learning Accelerators"

The paper entitled "Restoring the Broken Covenant Between Compilers and Deep Learning Accelerators" presents a novel approach to address the challenges faced in the compilation process for deep learning accelerators. This work introduces an innovative compilation framework, called the Covenant compiler, which integrates an abstract structure named the Architecture Covenant Graph (ACG) into its workflow. The ACG is designed to bridge the gap between traditional compilers and the demands of modern, highly-specialized neural network accelerators.

Key Contributions and Methodology

The authors identify four primary challenges in compiling for deep learning accelerators that deviate significantly from conventional general-purpose processors, such as the rigid adherence to the Von Neumann model. They propose a solution that hinges on the ACG, which serves as an abstraction capturing key architectural features as a graph of compute units, memory, and interconnections. This abstract representation allows the compiler to access otherwise opaque architectural details crucial for efficient scheduling and code generation for neural network tasks.

In addition to the ACG, the paper introduces Codelets, an abstraction for DNN operations used in compiling the neural network models into executable code. Codelets are designed to be architecture-independent and are transformed into hardware-specific schedules during the compilation process.

Evaluation and Results

The Covenant compiler is evaluated on two distinct architectures: Qualcomm's Hexagon Vector eXtensions (HVX) and an open-source DNN accelerator. The authors demonstrate the compiler's robustness by compiling a collection of 14 DNN layers from various models, including transformers, neural recommender systems, and convolutional networks. The results are compelling, showing that the automated approach achieves 93.8% of the performance of manually tuned TVM schedules for HVX, and significantly outperforms Qualcomm's nnlib by 31.3% on certain layers. Additionally, a remarkable 182x performance improvement is realized on the open-source accelerator compared to a CPU baseline.

Implications and Speculation on Future Development

The introduction of the ACG provides a path towards unified compilation strategies that cater to the diverse landscape of emerging DNN accelerator architectures. By integrating architectural insights directly into the compilation process, the framework potentially reduces the prohibitive costs associated with developing custom backends for each new accelerator. Furthermore, the flexibility of Covenant could foster innovations in AI model deployment by simplifying the process of optimizing models for novel hardware platforms.

This research may also inspire future developments in AI hardware and software co-design, as the modularity and adaptability of this compiler framework align well with the rapid evolution of accelerator technologies. As machine learning models continue to grow in complexity and size, the efficiency of the compilation process will become increasingly critical, and adaptive frameworks like Covenant could serve as a foundational tool.

Overall, this paper makes a significant contribution to the field by addressing critical challenges at the intersection of compiler technology and specialized neural network hardware, setting the stage for future exploration and refinement of compilation techniques in this domain.

PDF Markdown

Related Papers

YouTube

Show All Videos