Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 92 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 32 tok/s
GPT-5 High 40 tok/s Pro
GPT-4o 83 tok/s
GPT OSS 120B 467 tok/s Pro
Kimi K2 197 tok/s Pro
2000 character limit reached

Integration of a systolic array based hardware accelerator into a DNN operator auto-tuning framework (2212.03034v1)

Published 6 Dec 2022 in cs.LG, cs.PF, and cs.PL

Abstract: The deployment of neural networks on heterogeneous SoCs coupled with custom accelerators is a challenging task because of the lack of end-to-end software tools provided for these systems. Moreover, the already available low level schedules and mapping strategies provided by the accelerator developers for typical tensor operations are not necessarily the best possible ones for each particular use case. This is why frameworks which automatically test the performance of the generated code on a specific hardware configuration are of special interest. In this work, the integration between the code generation framework TVM and the systolic array-based accelerator Gemmini is presented. A generic schedule to offload the GEneral Matrix Multiply (GEMM) tensor operation onto Gemmini is detailed, and its suitability is tested by executing the AutoTVM tuning process on it. Our generated code achieves a peak throughput of 46 giga-operations per second (GOPs) under a 100 MHz clock on a Xilinx ZCU102 FPGA, outperforming previous work. Furthermore, the code generated by this integration was able to surpass the default hand-tuned schedules provided by the Gemmini developers in real-world workloads.

Citations (5)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.