Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 102 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 30 tok/s
GPT-5 High 27 tok/s Pro
GPT-4o 110 tok/s
GPT OSS 120B 475 tok/s Pro
Kimi K2 203 tok/s Pro
2000 character limit reached

Revealing Untapped DSP Optimization Potentials for FPGA-Based Systolic Matrix Engines (2409.03508v1)

Published 5 Sep 2024 in cs.AR

Abstract: Systolic architectures are widely embraced by neural network accelerators for their superior performance in highly parallelized computation. The DSP48E2s serve as dedicated arithmetic blocks in Xilinx Ultrascale series FPGAs and constitute a fundamental component in FPGA-based systolic matrix engines. Harnessing the full potential of DSP48E2s in architectural design can result in significant performance enhancements for systolic architectures on Ultrascale series FPGAs. This paper unveils several previously untapped DSP optimization techniques capable of further enhancing FPGA-based systolic matrix engines. We apply these techniques to two well-known systolic architectures: Google TPUv1 and Xilinx Vitis AI DPU. With the proposed techniques, our design achieves substantial resource and power reduction compared to the open-source TPUv1 FPGA implementation and the Vitis AI DPU implementation in the same parallelism setting. We also demonstrate the applicability of our techniques to neuromorphic hardware for supporting spiking neural network acceleration.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.