Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Spiking Structured State Space Model for Monaural Speech Enhancement (2309.03641v2)

Published 7 Sep 2023 in cs.SD, cs.CV, and eess.AS

Abstract: Speech enhancement seeks to extract clean speech from noisy signals. Traditional deep learning methods face two challenges: efficiently using information in long speech sequences and high computational costs. To address these, we introduce the Spiking Structured State Space Model (Spiking-S4). This approach merges the energy efficiency of Spiking Neural Networks (SNN) with the long-range sequence modeling capabilities of Structured State Space Models (S4), offering a compelling solution. Evaluation on the DNS Challenge and VoiceBank+Demand Datasets confirms that Spiking-S4 rivals existing Artificial Neural Network (ANN) methods but with fewer computational resources, as evidenced by reduced parameters and Floating Point Operations (FLOPs).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Peter Ochieng, “Deep neural network techniques for monaural speech enhancement: State of the art analysis,” arXiv preprint arXiv:2212.00369, 2022.
  2. “Frcrn: Boosting feature representation using frequency recurrence for monaural speech enhancement,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 9281–9285.
  3. “Monaural speech enhancement with complex convolutional block attention module and joint time frequency losses,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 6648–6652.
  4. “Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015, pp. 708–712.
  5. “Discriminatively trained recurrent neural networks for single-channel speech separation,” in 2014 IEEE global conference on signal and information processing (GlobalSIP). IEEE, 2014, pp. 577–581.
  6. “Attention is all you need in speech separation,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 21–25.
  7. “Monaural speech dereverberation using temporal convolutional networks with self attention,” IEEE/ACM transactions on audio, speech, and language processing, vol. 28, pp. 1598–1607, 2020.
  8. “Efficiently modeling long sequences with structured state spaces,” arXiv preprint arXiv:2111.00396, 2021.
  9. “Simplified state space layers for sequence modeling,” arXiv preprint arXiv:2208.04933, 2022.
  10. “Long range arena: A benchmark for efficient transformers,” arXiv: Learning,arXiv: Learning, Nov 2020.
  11. “A multi-dimensional deep structured state space approach to speech enhancement using small-footprint models,” arXiv preprint arXiv:2306.00331, 2023.
  12. “Spatio-temporal backpropagation for training high-performance spiking neural networks,” Frontiers in neuroscience, vol. 12, pp. 331, 2018.
  13. “Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks,” IEEE Signal Processing Magazine, vol. 36, no. 6, pp. 51–63, 2019.
  14. “Single channel speech enhancement using u-net spiking neural networks,” arXiv preprint arXiv:2307.14464, 2023.
  15. “Recurrent lateral inhibitory spiking networks for speech enhancement,” in 2016 International Joint Conference on Neural Networks (IJCNN). IEEE, 2016, pp. 1023–1028.
  16. “Noise reduction using neural lateral inhibition for speech enhancement,” International Journal of Machine Learning and Computing, 2019.
  17. “The intel neuromorphic dns challenge,” Neuromorphic Computing and Engineering, vol. 3, no. 3, pp. 034005, 2023.
  18. Neuronal dynamics: From single neurons to networks and models of cognition, Cambridge University Press, 2014.
  19. “Hippo: Recurrent memory with optimal polynomial projections,” Advances in neural information processing systems, vol. 33, pp. 1474–1487, 2020.
  20. “Incorporating learnable membrane time constant to enhance learning of spiking neural networks,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 2661–2671.
  21. Cassia Valentini-Botinhao et al., “Noisy speech database for training speech enhancement algorithms and tts models,” University of Edinburgh. School of Informatics. Centre for Speech Technology Research (CSTR), 2017.
  22. Yi Hu and Philipos C Loizou, “Evaluation of objective quality measures for speech enhancement,” IEEE Transactions on audio, speech, and language processing, vol. 16, no. 1, pp. 229–238, 2007.
  23. “Improved speech enhancement with the wave-u-net,” arXiv preprint arXiv:1811.11307, 2018.
  24. “Glance and gaze: A collaborative learning framework for single-channel speech enhancement,” Applied Acoustics, vol. 187, pp. 108499, 2022.
  25. “Metricgan+: An improved version of metricgan for speech enhancement,” arXiv preprint arXiv:2104.03538, 2021.
  26. “Perceptual loss based speech denoising with an ensemble of audio pattern recognition and self-supervised models,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 7118–7122.
Citations (12)

Summary

We haven't generated a summary for this paper yet.