A 71.2-$μ$W Speech Recognition Accelerator with Recurrent Spiking Neural Network (2503.21337v1)

Published 27 Mar 2025 in cs.AR, cs.AI, and eess.AS

Abstract: This paper introduces a 71.2-$\mu$W speech recognition accelerator designed for edge devices' real-time applications, emphasizing an ultra low power design. Achieved through algorithm and hardware co-optimizations, we propose a compact recurrent spiking neural network with two recurrent layers, one fully connected layer, and a low time step (1 or 2). The 2.79-MB model undergoes pruning and 4-bit fixed-point quantization, shrinking it by 96.42\% to 0.1 MB. On the hardware front, we take advantage of \textit{mixed-level pruning}, \textit{zero-skipping} and \textit{merged spike} techniques, reducing complexity by 90.49\% to 13.86 MMAC/S. The \textit{parallel time-step execution} addresses inter-time-step data dependencies and enables weight buffer power savings through weight sharing. Capitalizing on the sparse spike activity, an input broadcasting scheme eliminates zero computations, further saving power. Implemented on the TSMC 28-nm process, the design operates in real time at 100 kHz, consuming 71.2 $\mu$W, surpassing state-of-the-art designs. At 500 MHz, it has 28.41 TOPS/W and 1903.11 GOPS/mm$^2$ in energy and area efficiency, respectively.

Summary

A 71.2- $\mu$ W Speech Recognition Accelerator with Recurrent Spiking Neural Network

The paper presents a novel ultra-low power speech recognition accelerator tailored for real-time applications on edge devices. This design leverages a recurrent spiking neural network (RSNN) optimized at both algorithmic and hardware levels, achieving higher performance and efficiency compared to existing designs. Through rigorous model compression strategies and strategic architectural enhancements, this research addresses the challenges associated with deploying speech recognition capabilities in resource-constrained environments.

Technical Overview

The proposed approach centers around a compact RSNN architecture comprising two recurrent layers, one fully connected layer, and a low time step configuration. The RSNN model employs spike signals, resulting in computationally efficient single-bit operations. An essential innovation is the reduction of model size by 96.42% using structured pruning, unstructured pruning, and 4-bit fixed-point quantization. Additional hardware optimizations, including mixed-level pruning, zero-skipping, and merged spike techniques, further lessen complexity by 90.49%.

Numerical Results and Claims

The implementation achieves significant energy and area efficiency, with metrics recorded at 28.41 TOPS/W and 1903.11 GOPS/mm $^2$ respectively within a 28-nm TSMC process node. The design operates at 100 kHz consuming a minimal 71.2 $\mu$ W, outperforming state-of-the-art on energy consumption and area footprint. Furthermore, the accelerator supports parallel time-steps and merged spike techniques to optimize weight sharing and operational cycles, proving instrumental for power savings.

Theoretical and Practical Implications

Theoretical implications of this research lie in demonstrating the feasibility and advantages of RSNN models for speech applications, particularly with low time steps and spike outputs. The inherent sparsity and simplified computation dynamics offered by spiking models facilitate ultra-low power designs, pivotal for IoT deployments requiring 24/7 operation. On the practical front, these innovations promise widespread application in mobile devices, enabling more efficient on-device processing conducive to real-time and immersive human-machine interactions.

Future Directions

Future developments might explore the adaptability of RSNN frameworks for various acoustic environments, further enhancing their robustness and applicability. Moreover, integrating RSNN models with attention mechanisms or transformer architectures might present opportunities for improved accuracy and processing efficiency. Leveraging advances in both neuromorphic computing and spiking neural architectures, continued research will explore these synergies, potentially unveiling novel pathways for compact, scalable, and energy-efficient AI systems.

By optimizing the balance between complexity, power consumption, and real-time performance, this paper represents a significant stride toward enabling sophisticated speech recognition functionalities on ubiquitous computing platforms.

Related Papers

YouTube

Show All Videos