Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Deep Learning Compiler: A Comprehensive Survey (2002.03794v4)

Published 6 Feb 2020 in cs.DC, cs.LG, and cs.PF

Abstract: The difficulty of deploying various deep learning (DL) models on diverse DL hardware has boosted the research and development of DL compilers in the community. Several DL compilers have been proposed from both industry and academia such as Tensorflow XLA and TVM. Similarly, the DL compilers take the DL models described in different DL frameworks as input, and then generate optimized codes for diverse DL hardware as output. However, none of the existing survey has analyzed the unique design architecture of the DL compilers comprehensively. In this paper, we perform a comprehensive survey of existing DL compilers by dissecting the commonly adopted design in details, with emphasis on the DL oriented multi-level IRs, and frontend/backend optimizations. Specifically, we provide a comprehensive comparison among existing DL compilers from various aspects. In addition, we present detailed analysis on the design of multi-level IRs and illustrate the commonly adopted optimization techniques. Finally, several insights are highlighted as the potential research directions of DL compiler. This is the first survey paper focusing on the design architecture of DL compilers, which we hope can pave the road for future research towards DL compiler.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Mingzhen Li (11 papers)
  2. Yi Liu (543 papers)
  3. Xiaoyan Liu (22 papers)
  4. Qingxiao Sun (4 papers)
  5. Xin You (12 papers)
  6. Hailong Yang (27 papers)
  7. Zhongzhi Luan (21 papers)
  8. Lin Gan (30 papers)
  9. Guangwen Yang (40 papers)
  10. Depei Qian (17 papers)
Citations (156)

Summary

Overview of "The Deep Learning Compiler: A Comprehensive Survey"

The paper "The Deep Learning Compiler: A Comprehensive Survey" comprehensively examines the landscape of deep learning (DL) compilers, which are pivotal in bridging the gap between diverse DL models and heterogeneous DL hardware. By dissecting several existing DL compilers such as Tensorflow XLA, TVM, Glow, and nGraph, the paper provides valuable insights into their design architectures, highlighting multi-level intermediate representations (IRs) and optimization strategies.

Design Components

The paper delineates the DL compiler into several core components: the frontend, backend, and multi-level IRs. The frontend focuses on transforming and optimizing the computation graph which is a representation of the DL model. This includes node-level, block-level, and dataflow-level optimizations that reduce redundancies and improve efficiency. Backend optimizations involve generating efficient hardware-specific code from the lower-level IR and often utilize well-established libraries and techniques for performance enhancement.

Intermediate Representations

A critical contribution of the paper is its detailed discussion of multi-level IRs which serve as the backbone of the compilation process. The high-level IR, also known as graph IR, represents the computation and control flow abstracted from hardware intricacies, whereas the low-level IR is tailored for hardware-specific optimization and code generation. The paper points out that techniques like Halide-based IR and polyhedral models are prevalent in representing the low-level IR in DL compilers, enabling hardware-specific optimizations.

Compiler Optimizations

The survey identifies diverse optimizations employed at both the frontend and backend stages:

  • Frontend Optimizations: Include operator fusion, algebraic simplification, and layout transformation. These aim to simplify and optimize the computation graph before hardware-specific processing.
  • Backend Optimizations: Focus on generating efficient code tailored to specific hardware characteristics. Techniques such as memory allocation and parallelization are discussed as essential strategies in this context.

Quantitative Analysis

The paper provides performance evaluations showing how DL compilers optimize CNN models across diverse hardware configurations. It highlights that TVM shows superior performance on GPUs, especially after auto-tuning, while nGraph performs efficiently on CPU using DNNL optimizations. It also notes the complexity of decoupling frontend and backend optimizations for a consistent comparison across different compilers.

Future Directions

The paper concludes with a discussion on future research directions, emphasizing the need for enhancements in dynamic shape support, advanced auto-tuning, and privacy protection. It suggests that future DL compilers need to integrate more closely with the polyhedral model to manage sparse tensors and to expand optimizations to cover pre- and post-processing operations in DL workloads.

Implications

The implications of this research span both practical and theoretical domains. Practically, the survey provides explicit guidance on the selection of DL compilers according to specific application needs and hardware targets. Theoretically, the exploration of IR design and optimization strategies provides a foundation for further advancements in compiler techniques, potentially influencing the future development of both DL frameworks and hardware architectures.

In conclusion, the survey paper by Li et al. serves as a crucial reference for the ongoing development and optimization of DL compilers, helping bridge the gaps between evolving DL models and increasingly heterogeneous hardware landscapes. Such work not only aids current system designs but also sets the stage for future innovations in the growing field of artificial intelligence.

Youtube Logo Streamline Icon: https://streamlinehq.com