A 71.2-μW Speech Recognition Accelerator with Recurrent Spiking Neural Network
The paper presents a novel ultra-low power speech recognition accelerator tailored for real-time applications on edge devices. This design leverages a recurrent spiking neural network (RSNN) optimized at both algorithmic and hardware levels, achieving higher performance and efficiency compared to existing designs. Through rigorous model compression strategies and strategic architectural enhancements, this research addresses the challenges associated with deploying speech recognition capabilities in resource-constrained environments.
Technical Overview
The proposed approach centers around a compact RSNN architecture comprising two recurrent layers, one fully connected layer, and a low time step configuration. The RSNN model employs spike signals, resulting in computationally efficient single-bit operations. An essential innovation is the reduction of model size by 96.42% using structured pruning, unstructured pruning, and 4-bit fixed-point quantization. Additional hardware optimizations, including mixed-level pruning, zero-skipping, and merged spike techniques, further lessen complexity by 90.49%.
Numerical Results and Claims
The implementation achieves significant energy and area efficiency, with metrics recorded at 28.41 TOPS/W and 1903.11 GOPS/mm2 respectively within a 28-nm TSMC process node. The design operates at 100 kHz consuming a minimal 71.2 μW, outperforming state-of-the-art on energy consumption and area footprint. Furthermore, the accelerator supports parallel time-steps and merged spike techniques to optimize weight sharing and operational cycles, proving instrumental for power savings.
Theoretical and Practical Implications
Theoretical implications of this research lie in demonstrating the feasibility and advantages of RSNN models for speech applications, particularly with low time steps and spike outputs. The inherent sparsity and simplified computation dynamics offered by spiking models facilitate ultra-low power designs, pivotal for IoT deployments requiring 24/7 operation. On the practical front, these innovations promise widespread application in mobile devices, enabling more efficient on-device processing conducive to real-time and immersive human-machine interactions.
Future Directions
Future developments might explore the adaptability of RSNN frameworks for various acoustic environments, further enhancing their robustness and applicability. Moreover, integrating RSNN models with attention mechanisms or transformer architectures might present opportunities for improved accuracy and processing efficiency. Leveraging advances in both neuromorphic computing and spiking neural architectures, continued research will explore these synergies, potentially unveiling novel pathways for compact, scalable, and energy-efficient AI systems.
By optimizing the balance between complexity, power consumption, and real-time performance, this paper represents a significant stride toward enabling sophisticated speech recognition functionalities on ubiquitous computing platforms.