EdgeDRNN: Enabling Low-latency Recurrent Neural Network Edge Inference (1912.12193v2)

Published 22 Dec 2019 in eess.SP

Abstract: This paper presents a Gated Recurrent Unit (GRU) based recurrent neural network (RNN) accelerator called EdgeDRNN designed for portable edge computing. EdgeDRNN adopts the spiking neural network inspired delta network algorithm to exploit temporal sparsity in RNNs. It reduces off-chip memory access by a factor of up to 10x with tolerable accuracy loss. Experimental results on a 10 million parameter 2-layer GRU-RNN, with weights stored in DRAM, show that EdgeDRNN computes them in under 0.5 ms. With 2.42 W wall plug power on an entry level USB powered FPGA board, it achieves latency comparable with a 92 W Nvidia 1080 GPU. It outperforms NVIDIA Jetson Nano, Jetson TX2 and Intel Neural Compute Stick 2 in latency by 6X. For a batch size of 1, EdgeDRNN achieves a mean effective throughput of 20.2 GOp/s and a wall plug power efficiency that is over 4X higher than all other platforms.

Citations (19)

View on Semantic Scholar

Summary

The paper introduces EdgeDRNN, an FPGA accelerator that uses a DeltaGRU algorithm exploiting temporal sparsity to reduce RNN memory access for edge inference.
Implemented on an FPGA, EdgeDRNN achieves low latency and 2.42 W power consumption, demonstrating over 4x higher power efficiency than similar systems.
EdgeDRNN provides a flexible framework for deploying complex RNN models in low-power, low-latency edge environments, crucial for real-time applications like robotics and IoT.

Insight into EdgeDRNN: Low-latency RNN Edge Inference

The paper "EdgeDRNN: Enabling Low-latency Recurrent Neural Network Edge Inference" introduces EdgeDRNN, an FPGA-based accelerator that enhances the computational efficiency of RNNs, specifically Gated Recurrent Unit (GRU) networks, for edge computing applications. EdgeDRNN is designed to exploit temporal sparsity through a spiking neural network-inspired delta network algorithm, drastically reducing memory access without significant accuracy loss. This advancement addresses the latency and power constraints typical of edge devices, making EdgeDRNN a compelling option for real-time applications in environments with limited computational resources.

Key Contributions

Algorithmic Efficiency Through Temporal Sparsity:
- The DeltaGRU algorithm leverages temporal sparsity by only updating neural network states when significant changes are detected. By doing so, the operations required for maintaining network states are reduced by an order of magnitude, leading to up to 100x fewer memory accesses compared to traditional GRU implementations.
FPGA-based Hardware Implementation:
- Implemented on the MiniZed development board, the EdgeDRNN accelerator uses a vector processing element array optimized to work with the memory bandwidth constraints of edge devices. This design choice allows for efficient RNN inference at low latencies suitable for real-time edge computing applications.
Power and Latency Efficiency:
- EdgeDRNN operates with a power consumption of 2.42 W, achieving latency comparable to high-performance GPUs such as the NVIDIA GTX 1080 but with significantly reduced power requirements. Its on-chip power efficiency surpasses traditional FPGA and GPU counterparts, offering over 4x higher efficiency than similar systems.

Experimental Results

The research demonstrates EdgeDRNN's superior performance through various tests using the TIDIGITS dataset for automatic speech recognition tasks. Notably, a 2-layer GRU with 768 hidden units (2L-768H) achieves a mean effective throughput of 20.2 GOp/s with an acceptable word error rate increase of only 0.53%. This balance between power efficiency and accuracy suggests that EdgeDRNN can feasibly replace more power-hungry and less efficient edge solutions.

Implications and Future Directions

The EdgeDRNN provides a robust framework for deploying complex RNN-based models in low power, low latency edge environments. The inherent ability to trade off between accuracy and latency through configurable delta thresholds empowers developers to tailor device performance to specific application needs. This flexibility is crucial for applications in robotics, human-computer interaction, and internet-of-things (IoT) devices where real-time processing is imperative.

Looking into the future, EdgeDRNN's approach can inspire further exploration into asymmetric quantization techniques, advanced sparsity models, and adaptive algorithms that further enhance the balance of computational load and accuracy. Moreover, similar strategies could be applied to other neural network models, expanding their usability in diverse edge computing scenarios.

In conclusion, EdgeDRNN exemplifies the potential of integrating hardware-efficient RNN accelerator designs into edge computing solutions, propelling further research and deployment in real-time machine learning applications. The impact of this work is poised to extend well beyond its initial implementation, influencing future edge computing architectures and neural network processing methodologies.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos