Glow: Graph Lowering Compiler Techniques for Neural Networks (1805.00907v3)

Published 2 May 2018 in cs.PL

Abstract: This paper presents the design of Glow, a machine learning compiler for heterogeneous hardware. It is a pragmatic approach to compilation that enables the generation of highly optimized code for multiple targets. Glow lowers the traditional neural network dataflow graph into a two-phase strongly-typed intermediate representation. The high-level intermediate representation allows the optimizer to perform domain-specific optimizations. The lower-level instruction-based address-only intermediate representation allows the compiler to perform memory-related optimizations, such as instruction scheduling, static memory allocation and copy elimination. At the lowest level, the optimizer performs machine-specific code generation to take advantage of specialized hardware features. Glow features a lowering phase which enables the compiler to support a high number of input operators as well as a large number of hardware targets by eliminating the need to implement all operators on all targets. The lowering phase is designed to reduce the input space and allow new hardware backends to focus on a small number of linear algebra primitives.

Citations (273)

View on Semantic Scholar

Summary

The paper introduces a dual intermediate representation system that enables both high-level graph transformations and low-level hardware-specific code generation.
The paper demonstrates up to 2.7x performance improvement over TensorFlow and significant gains over TVM through effective node lowering and optimizations.
The compiler supports profile-guided quantization and strategic runtime partitioning to efficiently utilize diverse hardware accelerators.

Overview of Glow: Graph Lowering Compiler Techniques for Neural Networks

The paper presents Glow, an open-source machine learning compiler designed for heterogeneous hardware. Glow's architecture aims to facilitate optimized code generation for multiple hardware targets through a pragmatic approach to compilation. At its core, Glow utilizes a two-phase intermediate representation (IR) strategy to optimize and lower neural network dataflow graphs into efficient machine code.

Key Concepts and Techniques

Intermediate Representation (IR): Glow introduces a two-tier IR system, comprising a high-level node-based IR and a low-level instruction-based IR. This bifurcation allows for both domain-specific optimizations and hardware-specific code generation. The high-level IR supports general graph transformations and optimizations specific to neural networks, while the low-level IR is tailored for machine-specific enhancements such as instruction scheduling and memory allocation.

Node Lowering: The pivotal "node lowering" phase in Glow's compilation process translates high-level operations into simpler linear algebra primitives. This helps reduce the complexity associated with supporting a wide range of operators across various hardware backends. The lowering phase not only simplifies backend implementation but also allows for effective graph-level optimizations before target-specific adaptations.

Quantization: Glow supports profile-guided quantization, converting floating-point operations into integer ones to leverage efficient hardware execution. This conversion is supported by the identification of numeric ranges for different parts of the network, making it feasible to maintain accuracy while minimizing computational load.

Runtime and Device Management: The runtime system of Glow is designed to handle model partitioning, device management, and parallel execution across multiple hardware accelerators. A comprehensive partitioning strategy is employed to split networks efficiently based on memory and computational constraints, facilitating optimal device utilization.

Empirical Evaluation

The performance evaluation contrasts Glow's capabilities against TensorFlow and TVM, highlighting significant improvements on tasks such as Resnet50 and VGG19. Glow achieves up to 2.7x and 1.3x faster performance compared to TensorFlow-1.7 and TVM respectively. These gains can be attributed to Glow's direct convolution implementations and shape-aware optimizations, which avoid common overheads associated with other frameworks.

Implications and Future Directions

The architecture and methodology of Glow have several implications for the future of AI model compilation. By reducing the complexity of supporting diverse hardware, Glow provides a scalable solution that could facilitate rapid adoption across new platforms. This architectural flexibility may enable more efficient utilization of emerging domain-specific architectures (DSAs), furthering the energy efficiency and performance gains of machine learning workloads.

The potential integration with widely used frameworks such as PyTorch and future extensions—such as training-based quantization and enhanced partitioning algorithms—could expand Glow's applicability, addressing both inferencing and training phases efficiently.

In summary, the paper delineates Glow as a compelling advancement in machine learning compilers, supporting heterogeneous execution environments through a methodical and scalable approach to graph compilation and optimization.

PDF Markdown

Related Papers

GitHub

GitHub - pytorch/glow: Compiler for Neural Network hardware accelerators (3,312 stars)