Towards a high-performance AI compiler with upstream MLIR (2404.15204v1)

Published 15 Apr 2024 in cs.PL, cs.AI, cs.AR, cs.DC, and cs.LG

Abstract: This work proposes a compilation flow using open-source compiler passes to build a framework to achieve ninja performance from a generic linear algebra high-level abstraction. We demonstrate this flow with a proof-of-concept MLIR project that uses input IR in Linalg-on-Tensor from TensorFlow and PyTorch, performs cache-level optimizations and lowering to micro-kernels for efficient vectorization, achieving over 90% of the performance of ninja-written equivalent programs. The contributions of this work include: (1) Packing primitives on the tensor dialect and passes for cache-aware distribution of tensors (single and multi-core) and type-aware instructions (VNNI, BFDOT, BFMMLA), including propagation of shapes across the entire function; (2) A linear algebra pipeline, including tile, fuse and bufferization strategies to get model-level IR into hardware friendly tile calls; (3) A mechanism for micro-kernel lowering to an open source library that supports various CPUs.

Authors (8)

Renato Golin (1 paper)
Lorenzo Chelini (6 papers)
Adam Siemieniuk (1 paper)
Kavitha Madhu (1 paper)
Niranjan Hasabnis (21 papers)
Hans Pabst (10 papers)
Evangelos Georganas (18 papers)
Alexander Heinecke (21 papers)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/HPCPapers/status/1783013577940439440

https://twitter.com/WWVY/status/1783007463114952988

Towards a high-performance AI compiler with upstream MLIR (2404.15204v1)

Summary

Related Papers

Tweets