Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Irredundant and Compressed Data Layout to Optimize Bandwidth Utilization of FPGA Accelerators (2401.12071v1)

Published 22 Jan 2024 in cs.AR

Abstract: Memory bandwidth is known to be a performance bottleneck for FPGA accelerators, especially when they deal with large multi-dimensional data-sets. A large body of work focuses on reducing of off-chip transfers, but few authors try to improve the efficiency of transfers. This paper addresses the later issue by proposing (i) a compiler-based approach to accelerator's data layout to maximize contiguous access to off-chip memory, and (ii) data packing and runtime compression techniques that take advantage of this layout to further improve memory performance. We show that our approach can decrease the I/O cycles up to $7\times$ compared to un-optimized memory accesses.

Summary

We haven't generated a summary for this paper yet.