Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Tackling the Matrix Multiplication Micro-kernel Generation with Exo (2310.17408v2)

Published 26 Oct 2023 in cs.MS, cs.CL, and cs.PF

Abstract: The optimization of the matrix multiplication (or GEMM) has been a need during the last decades. This operation is considered the flagship of current linear algebra libraries such as BLIS, OpenBLAS, or Intel OneAPI because of its widespread use in a large variety of scientific applications. The GEMM is usually implemented following the GotoBLAS philosophy, which tiles the GEMM operands and uses a series of nested loops for performance improvement. These approaches extract the maximum computational power of the architectures through small pieces of hardware-oriented, high-performance code called micro-kernel. However, this approach forces developers to generate, with a non-negligible effort, a dedicated micro-kernel for each new hardware. In this work, we present a step-by-step procedure for generating micro-kernels with the Exo compiler that performs close to (or even better than) manually developed microkernels written with intrinsic functions or assembly language. Our solution also improves the portability of the generated code, since a hardware target is fully specified by a concise library-based description of its instructions.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Adrián Castelló (7 papers)
  2. Julian Bellavita (4 papers)
  3. Grace Dinh (7 papers)
  4. Yuka Ikarashi (3 papers)
  5. Héctor Martínez (6 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.