Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Holistic Approach for Optimizing DSP Block Utilization of a CNN implementation on FPGA (1703.09779v1)

Published 21 Mar 2017 in cs.CV

Abstract: Deep Neural Networks are becoming the de-facto standard models for image understanding, and more generally for computer vision tasks. As they involve highly parallelizable computations, CNN are well suited to current fine grain programmable logic devices. Thus, multiple CNN accelerators have been successfully implemented on FPGAs. Unfortunately, FPGA resources such as logic elements or DSP units remain limited. This work presents a holistic method relying on approximate computing and design space exploration to optimize the DSP block utilization of a CNN implementation on an FPGA. This method was tested when implementing a reconfigurable OCR convolutional neural network on an Altera Stratix V device and varying both data representation and CNN topology in order to find the best combination in terms of DSP block utilization and classification accuracy. This exploration generated dataflow architectures of 76 CNN topologies with 5 different fixed point representation. Most efficient implementation performs 883 classifications/sec at 256 x 256 resolution using 8% of the available DSP blocks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Kamel Abdelouahab (5 papers)
  2. Maxime Pelcat (16 papers)
  3. François Berry (6 papers)
  4. Jean-Charles Quinton (3 papers)
  5. Jocelyn Serot (6 papers)
  6. Cedric Bourrasset (2 papers)
Citations (21)

Summary

We haven't generated a summary for this paper yet.