Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards a high-performance AI compiler with upstream MLIR (2404.15204v1)

Published 15 Apr 2024 in cs.PL, cs.AI, cs.AR, cs.DC, and cs.LG

Abstract: This work proposes a compilation flow using open-source compiler passes to build a framework to achieve ninja performance from a generic linear algebra high-level abstraction. We demonstrate this flow with a proof-of-concept MLIR project that uses input IR in Linalg-on-Tensor from TensorFlow and PyTorch, performs cache-level optimizations and lowering to micro-kernels for efficient vectorization, achieving over 90% of the performance of ninja-written equivalent programs. The contributions of this work include: (1) Packing primitives on the tensor dialect and passes for cache-aware distribution of tensors (single and multi-core) and type-aware instructions (VNNI, BFDOT, BFMMLA), including propagation of shapes across the entire function; (2) A linear algebra pipeline, including tile, fuse and bufferization strategies to get model-level IR into hardware friendly tile calls; (3) A mechanism for micro-kernel lowering to an open source library that supports various CPUs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Renato Golin (1 paper)
  2. Lorenzo Chelini (6 papers)
  3. Adam Siemieniuk (1 paper)
  4. Kavitha Madhu (1 paper)
  5. Niranjan Hasabnis (21 papers)
  6. Hans Pabst (10 papers)
  7. Evangelos Georganas (18 papers)
  8. Alexander Heinecke (21 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.