Papers
Topics
Authors
Recent
2000 character limit reached

TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids

Published 20 May 2020 in eess.AS, cs.LG, cs.SD, and stat.ML | (2005.11138v1)

Abstract: Modern speech enhancement algorithms achieve remarkable noise suppression by means of large recurrent neural networks (RNNs). However, large RNNs limit practical deployment in hearing aid hardware (HW) form-factors, which are battery powered and run on resource-constrained microcontroller units (MCUs) with limited memory capacity and compute capability. In this work, we use model compression techniques to bridge this gap. We define the constraints imposed on the RNN by the HW and describe a method to satisfy them. Although model compression techniques are an active area of research, we are the first to demonstrate their efficacy for RNN speech enhancement, using pruning and integer quantization of weights/activations. We also demonstrate state update skipping, which reduces the computational load. Finally, we conduct a perceptual evaluation of the compressed models to verify audio quality on human raters. Results show a reduction in model size and operations of 11.9$\times$ and 2.9$\times$, respectively, over the baseline for compressed models, without a statistical difference in listening preference and only exhibiting a loss of 0.55dB SDR. Our model achieves a computational latency of 2.39ms, well within the 10ms target and 351$\times$ better than previous work.

Citations (87)

Summary

  • The paper introduces a method combining structured pruning and 8-bit quantization to significantly compress LSTM-based speech enhancement models for hearing aids.
  • It employs skip RNN cells to dynamically reduce computational load while preserving high-quality audio, meeting strict hardware constraints.
  • Evaluations demonstrate comparable SDR scores and perceptual quality, ensuring low-latency and energy-efficient performance on constrained devices.

TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids

Introduction

The paper "TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids" addresses the critical challenge of deploying speech enhancement (SE) algorithms within the constraints of hearing aids (HAs). These devices necessitate models with minimal computational and storage requirements due to their limited hardware capabilities, namely reduced memory capacity and power constraints of microcontroller units (MCUs). Modern SE systems rely on RNNs, especially long short-term memory (LSTM) networks, for effective noise suppression. The research outlines how model compression, specifically pruning and integer quantization, can bridge the gap to create efficient neural networks that are feasible for deployment on HAs. The results highlight a substantial reduction in model size and operations, with negligible loss in perceptual quality, enabling real-world application in HAs.

Model Constraints and Methodology

The HA form factor necessitates several hardware constraints: compute complexity must not exceed 1.55 million operations per second to achieve the desired sub-10ms latency; model size must be under 0.5 MB due to flash memory constraints; and working memory is limited by the device SRAM to 320 KB. The authors introduce a methodology focused on pruning and quantization to generate compressed RNN SE models that adhere to these requirements.

  • Pruning: By employing structured pruning, weights in the LSTM and fully connected (FC) layers were organized in groups, allowing for efficient removal without incurring performance degradation. A novel aspect of this approach is the direct learning of pruning thresholds, reducing the need for computationally intensive hyperparameter searches.
  • Quantization: The research implements training-aware quantization of weights and activations to 8 bits. This approach ensures the model is robust to quantization noise while maintaining integer arithmetic, which is less power-intensive than floating point operations.
  • Skip RNNs: Skip RNN cells are introduced to further decrease computational load by allowing state updates to be skipped dynamically, conditioned on input signal characteristics, thus reducing the operational demand on the MCU.

Performance Evaluation

The TinyLSTMs were evaluated using both objective measures and subjective perceptual tests. In terms of objective metrics, models achieved comparable SDR scores to existing large-scale systems while adhering to the HA constraints, particularly excelling in computational latency and energy efficiency. Figure 1

Figure 1: MS vs. SISDR. Each point represents a model checkpoint and lines represent a Pareto front.

Subjective evaluation was conducted to assess the auditory quality of the enhanced audio compared to the baseline models. Listeners showed a consistent preference for processed audio, confirming that the compressions did not compromise perceptual quality. Figure 2

Figure 2: Preference of perceptual study participants for enhanced audio vs. unprocessed audio for both uncompressed and pruned models across input SNR's.

Implications and Future Work

This paper provides a blueprint for designing efficient SE models deployable on resource-constrained hardware. The investigations into pruning methods and quantization processes offer a pathway for future enhancements in low-power, high-efficiency neural networks. The successful application and optimization of LSTMs for HA devices suggest potential extrapolation to other portable devices requiring low-latency speech enhancement, such as mobile phones or IoT devices.

Conclusion

The TinyLSTMs method delineates a robust framework for achieving efficient speech enhancement applicable to hearing aids, balancing resource constraints with computational demands. The significant reduction in model size and operations without perceptual quality loss demonstrates the viability of deploying advanced neural networks on limited hardware. This approach paves the way for further research into optimizing neural architectures for low-power, real-time applications in wearable and portable devices.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.