Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators (2305.18332v1)

Published 24 May 2023 in cs.DC, cs.AR, cs.LG, cs.SY, and eess.SY

Abstract: We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications focused on exploring distributing scheduling optimizations for Deep Learning (DL) workloads to obtain the best performance regarding latency and power efficiency. Our cluster was modular throughout the experiment, and we have implementations that consist of up to 12 Zynq-7020 chip-based boards as well as 5 UltraScale+ MPSoC FPGA boards connected through an ethernet switch, and the cluster will evaluate configurable Deep Learning Accelerator (DLA) Versatile Tensor Accelerator (VTA). This adaptable distributed architecture is distinguished by its capacity to evaluate and manage neural network workloads in numerous configurations which enables users to conduct multiple experiments tailored to their specific application needs. The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the computation graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Hans Johnson (6 papers)
  2. Tianyang Fang (1 paper)
  3. Alejandro Perez-Vicente (1 paper)
  4. Jafar Saniie (4 papers)