Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Ultra-Low-Power Neuromorphic Speech Enhancement with Spiking-FullSubNet (2410.04785v1)

Published 7 Oct 2024 in eess.AS and cs.SD

Abstract: Speech enhancement is critical for improving speech intelligibility and quality in various audio devices. In recent years, deep learning-based methods have significantly improved speech enhancement performance, but they often come with a high computational cost, which is prohibitive for a large number of edge devices, such as headsets and hearing aids. This work proposes an ultra-low-power speech enhancement system based on the brain-inspired spiking neural network (SNN) called Spiking-FullSubNet. Spiking-FullSubNet follows a full-band and sub-band fusioned approach to effectively capture both global and local spectral information. To enhance the efficiency of computationally expensive sub-band modeling, we introduce a frequency partitioning method inspired by the sensitivity profile of the human peripheral auditory system. Furthermore, we introduce a novel spiking neuron model that can dynamically control the input information integration and forgetting, enhancing the multi-scale temporal processing capability of SNN, which is critical for speech denoising. Experiments conducted on the recent Intel Neuromorphic Deep Noise Suppression (N-DNS) Challenge dataset show that the Spiking-FullSubNet surpasses state-of-the-art methods by large margins in terms of both speech quality and energy efficiency metrics. Notably, our system won the championship of the Intel N-DNS Challenge (Algorithmic Track), opening up a myriad of opportunities for ultra-low-power speech enhancement at the edge. Our source code and model checkpoints are publicly available at https://github.com/haoxiangsnr/spiking-fullsubnet.

Summary

  • The paper introduces Spiking-FullSubNet, a novel system that fuses full-band and sub-band processing to capture comprehensive spectral details with minimal energy.
  • The paper presents a gated spiking neuron (GSN) model that dynamically adjusts decay rates for enhanced temporal processing in real-time applications.
  • Experimental results demonstrate that the approach reduces power consumption by nearly three orders of magnitude compared to traditional ANN-based systems.

Ultra-Low-Power Neuromorphic Speech Enhancement: An Analysis of Spiking-FullSubNet

The application of speech enhancement (SE) techniques is crucial for improving audio quality and intelligibility in various devices such as headsets and hearing aids. However, conventional deep learning approaches often incur substantial computational costs that preclude their usage in resource-constrained edge devices. The paper under review proposes a novel SE system, termed Spiking-FullSubNet, leveraging the efficiency of Spiking Neural Networks (SNNs) to address this gap.

Key Contributions

Spiking-FullSubNet is grounded in several innovative strategies, aiming to optimize both energy efficiency and enhancement performance:

  1. Full-Band and Sub-Band Fusion: The architecture integrates full-band and sub-band models to capture comprehensive spectral information. The full-band model focuses on the entire spectral structure, while the sub-band models target specific frequency bands, capitalizing on local spectral patterns. Such a bifurcated methodology supports effective SE under diverse auditory conditions.
  2. Frequency Partitioning Inspired by Human Auditory Sensitivity: The research introduces a frequency partitioning mechanism, wherein lower frequencies are processed with finer granularity than higher frequencies. This approach is both computationally efficient and aligned with the frequency sensitivity of the human auditory system.
  3. Gated Spiking Neuron (GSN) Model: A novel spiking neuron model, the GSN, is presented to enhance temporal processing. Unlike traditional models such as LIF neurons, the GSN dynamically adjusts the decay rates of membrane potentials in response to inputs, offering superior adaptability to the temporal complexities inherent in audio signals.

Experimental Outcomes

The empirical evaluation of Spiking-FullSubNet demonstrates its efficacy in surpassing state-of-the-art SE methods across multiple criteria:

  • Speech Quality and Energy Efficiency: The model achieved top-tier performance in the Intel Neuromorphic Deep Noise Suppression (N-DNS) Challenge, with substantial gains in energy efficiency. Particularly, the model's power consumption was almost three orders of magnitude lower than comparable ANN-based systems.
  • Temporal Processing Excellence: By dynamically controlling information retention and decay, the GSN model facilitates effective processing across varying signal durations, crucial for real-time applications.

Theoretical and Practical Implications

The theoretical advancements presented by Spiking-FullSubNet highlight several key areas:

  1. Enhanced SNN Capabilities: The introduction of the GSN model addresses a significant limitation in existing SNNs, thereby opening avenues for further research into adaptive, input-dependent spiking models.
  2. Cross-Disciplinary Innovation: The fusion of neuromorphic computing principles with deep learning indicates potential breakthroughs in applications requiring low-power, real-time processing.
  3. Scalability in Edge Computing: The reduced energy demands position Spiking-FullSubNet as a feasible SE solution for edge devices, demonstrating the viability of deploying advanced models in constrained environments.

Future Prospects

Looking forward, one potential avenue of exploration is the integration of Spiking-FullSubNet into neuromorphic hardware platforms, such as Intel's Loihi or other neuromorphic architectures, to fully leverage its low-power proficiency. Further, expanding the architecture's capability to incorporate additional auditory processing tasks, such as automatic speech recognition, could diversify its application scope.

In conclusion, Spiking-FullSubNet introduces a robust framework for ultra-low-power speech enhancement, showcasing significant advancements in spike-based neural computation for audio processing tasks. Its contributions lay a foundation for continued exploration in efficient neuromorphic systems capable of addressing the increasing demands of real-time, on-device processing.