Papers
Topics
Authors
Recent
2000 character limit reached

Compiling ONNX Neural Network Models Using MLIR

Published 19 Aug 2020 in cs.PL and cs.LG | (2008.08272v2)

Abstract: Deep neural network models are becoming increasingly popular and have been used in various tasks such as computer vision, speech recognition, and natural language processing. Machine learning models are commonly trained in a resource-rich environment and then deployed in a distinct environment such as high availability machines or edge devices. To assist the portability of models, the open-source community has proposed the Open Neural Network Exchange (ONNX) standard. In this paper, we present a high-level, preliminary report on our onnx-mlir compiler, which generates code for the inference of deep neural network models described in the ONNX format. Onnx-mlir is an open-source compiler implemented using the Multi-Level Intermediate Representation (MLIR) infrastructure recently integrated in the LLVM project. Onnx-mlir relies on the MLIR concept of dialects to implement its functionality. We propose here two new dialects: (1) an ONNX specific dialect that encodes the ONNX standard semantics, and (2) a loop-based dialect to provide for a common lowering point for all ONNX dialect operations. Each intermediate representation facilitates its own characteristic set of graph-level and loop-based optimizations respectively. We illustrate our approach by following several models through the proposed representations and we include some early optimization work and performance results.

Citations (48)

Summary

  • The paper introduces the onnx-mlir compiler that transforms ONNX models into MLIR representations for efficient native code generation.
  • It employs ONNX and KRNL dialects to enable graph-level and loop-based optimizations such as shape inference and constant propagation.
  • Preliminary experiments on MNIST and ResNet50 demonstrate rapid execution on simpler models and effective deployment on complex architectures across diverse hardware.

Compiling ONNX Neural Network Models Using MLIR

The paper "Compiling ONNX Neural Network Models Using MLIR" (2008.08272) discusses the implementation and potential of the onnx-mlir compiler. This compiler is designed to translate ONNX models into native code using the Multi-Level Intermediate Representation (MLIR) infrastructure, which is part of the LLVM project. The compiler aims to facilitate the deployment of deep neural network models on various hardware platforms by optimizing models for different execution environments while maintaining compatibility with the ONNX standard.

MLIR and ONNX Dialects

The onnx-mlir compiler uses MLIR to define and implement operations through dialects. Two primary dialects are introduced in the paper:

  1. ONNX Dialect: This dialect encodes operations defined in the ONNX standard directly into MLIR. It provides a structure that allows operations to maintain semantic integrity while enabling optimizations at the graph level.
  2. KRNL Dialect: Designed to implement loop-based operations and optimizations. The KRNL dialect acts as an intermediate representation that allows operations to be further optimized and lowered into affine loops or LLVM IR for efficient execution on target hardware.

Compiler Architecture

The architecture of onnx-mlir is meticulously structured into abstraction levels, each responsible for specific optimizations and transformations:

  • High-Level Representation: Initially, ONNX models are represented in high-level MLIR using the ONNX dialect.
  • Intermediate Representation (KRNL): High-level operations are lowered into the KRNL dialect for loop-based optimizations, such as tiling, when they are more amenable to polyhedral model-based analyses and transformations.
  • Affine/Standard Dialect: Further lowering is performed to represent operations using the affine and standard dialects, enabling the use of MLIR's built-in optimizations.
  • LLVM Targeting: The final level involves conversion into LLVM IR, ready for native code generation.

Optimization Techniques

Several optimization techniques are discussed:

  • Operation Decomposition: Certain operations can be decomposed into simpler operations that MLIR can optimize more effectively.
  • Shape Inference: Inferring tensor shapes at compile-time to enable more aggressive optimizations downstream.
  • Graph Rewriting: Utilizing DRRs (Declarative Rewriting Rules) in MLIR to optimize operation graphs, such as fusing matrix multiplication and addition into a single optimized GEMM operation.
  • Constant Propagation: Statically evaluating operations with constant inputs to eliminate unnecessary computations at runtime.

Experimental Results

The paper presents preliminary results, showing successful compilation and execution of models like MNIST and ResNet50 on IBM Power Systems. Compilation and inference times are reported, demonstrating the feasibility of onnx-mlir in practical scenarios, with MNIST exhibiting rapid execution times while ResNet50 showcases the compiler's capability to handle more complex architectures, albeit with longer inference times due to lack of advanced optimizations like loop fusion and SIMD.

Future Work and Implications

The onnx-mlir project has significant implications for making ONNX models more portable and efficient across different hardware platforms. Future work includes incorporating optimizations such as polyhedral transformations, enabling better utilization of hardware accelerators, and expanding support across different computing architectures.

In conclusion, the onnx-mlir opens up new avenues for efficient deployment of neural network models by leveraging the flexibility and optimization prowess of MLIR. It is a step forward towards more adaptable and high-performance AI systems, promising improvements in efficiency and portability with further developments.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.