A Survey of FPGA-Based Neural Network Accelerator (1712.08934v3)

Published 24 Dec 2017 in cs.AR

Abstract: Recent researches on neural network have shown significant advantage in machine learning over traditional algorithms based on handcrafted features and models. Neural network is now widely adopted in regions like image, speech and video recognition. But the high computation and storage complexity of neural network inference poses great difficulty on its application. CPU platforms are hard to offer enough computation capacity. GPU platforms are the first choice for neural network process because of its high computation capacity and easy to use development frameworks. On the other hand, FPGA-based neural network inference accelerator is becoming a research topic. With specifically designed hardware, FPGA is the next possible solution to surpass GPU in speed and energy efficiency. Various FPGA-based accelerator designs have been proposed with software and hardware optimization techniques to achieve high speed and energy efficiency. In this paper, we give an overview of previous work on neural network inference accelerators based on FPGA and summarize the main techniques used. An investigation from software to hardware, from circuit level to system level is carried out to complete analysis of FPGA-based neural network inference accelerator design and serves as a guide to future work.

Authors (5)

Kaiyuan Guo (2 papers)
Shulin Zeng (6 papers)
Jincheng Yu (31 papers)
Yu Wang (939 papers)
Huazhong Yang (80 papers)

Citations (163)

View on Semantic Scholar

Summary

FPGA-Based Neural Network Inference Accelerators: A Survey

This paper provides a meticulous survey of FPGA-based neural network inference accelerators, examining the techniques and approaches that have been proposed to address the challenges of computational complexity and storage requirements inherent in neural network models. The authors meticulously explore the landscape of FPGA-based designs, highlighting the efficacy of these accelerators in potentially surpassing GPU-based solutions in both speed and energy efficiency.

Overview of Neural Network Computational Challenges

Neural networks, especially models like CNNs and RNNs, have been shown to perform significantly better than traditional algorithms in various machine learning domains, such as image, speech, and video recognition. However, these improvements come with high computational and storage complexity, which poses substantial challenges for inference tasks. For instance, state-of-the-art CNN models require up to 39 billion FLOPs for a single $224\times224$ image classification. The large model size and computational demands make CPUs inadequate for both high-performance cloud applications and low-power mobile applications.

FPGA Architecture as a Solution

FPGAs are presented in the paper as a viable alternative to GPUs for neural network acceleration due to their capacity for tailored hardware design that exploits the inherent parallelism of neural networks, potentially achieving superior energy efficiency. Despite the lower operational frequency compared to CPUs and GPUs, FPGAs exhibit promising opportunities for implementing high-speed and flexible designs optimized for specific neural network models.

Techniques for FPGA-Based NN Acceleration

The paper outlines several strategies and techniques used in the design of FPGA-based neural network accelerators:

Model Compression: Quantization and pruning methods significantly reduce the number of operations and data size without major accuracy loss, thereby improving hardware efficiency. Linear quantization is shown to maintain accuracy with a bit-width reduction to 8-bit.
Hardware-level Optimizations: Exploring low bit-width computation units and leveraging fast convolution algorithms such as Winograd and FFT can enhance computational speed and reduce energy consumption. Specific implementations show a reduction in the number of multiplications required, elevating efficiency.
Loop Unrolling Strategies: Effective parallelization techniques using loop unrolling strategies aid in maximizing the FPGA resource utilization for various neural network layers, especially convolution and fully connected layers.
System-level Approaches: The utilization of roofline models and optimized memory access patterns are crucial for improving the achievable performance ceilings of FPGA designs. Techniques like loop tiling and cross-layer scheduling are emphasized to enhance data reuse and reduce memory bandwidth demands.

Evaluation and Implications

Comparative evaluations with state-of-the-art designs reveal that FPGA accelerators—especially those leveraging techniques such as quantization and fast convolution—can achieve significantly better energy efficiency than GPU counterparts. Certain designs demonstrate energy efficiency improvements up to $10\times$ greater than GPUs, although scalability remains a challenge for larger network models.

Future Perspectives and Design Automation

The authors discuss the importance of design automation for FPGA-based accelerators, which includes both hardware and software automation approaches. These approaches seek to streamline the transition from network models to efficient hardware implementations, allowing for flexible and efficient execution in diverse scenarios. An emphasis is placed on the need for future work to focus on algorithmic research that can support bit-width minimization without compromising model accuracy and investigate methods for scaling up FPGA systems to handle larger and more complex neural networks.

Overall, this paper serves as a comprehensive guide to understanding the current state and future directions of FPGA-based neural network inference accelerators, setting the stage for continued advancements in AI hardware solutions.

PDF Markdown