Spiking Structured State Space Model for Monaural Speech Enhancement (2309.03641v2)
Abstract: Speech enhancement seeks to extract clean speech from noisy signals. Traditional deep learning methods face two challenges: efficiently using information in long speech sequences and high computational costs. To address these, we introduce the Spiking Structured State Space Model (Spiking-S4). This approach merges the energy efficiency of Spiking Neural Networks (SNN) with the long-range sequence modeling capabilities of Structured State Space Models (S4), offering a compelling solution. Evaluation on the DNS Challenge and VoiceBank+Demand Datasets confirms that Spiking-S4 rivals existing Artificial Neural Network (ANN) methods but with fewer computational resources, as evidenced by reduced parameters and Floating Point Operations (FLOPs).
- Peter Ochieng, “Deep neural network techniques for monaural speech enhancement: State of the art analysis,” arXiv preprint arXiv:2212.00369, 2022.
- “Frcrn: Boosting feature representation using frequency recurrence for monaural speech enhancement,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 9281–9285.
- “Monaural speech enhancement with complex convolutional block attention module and joint time frequency losses,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 6648–6652.
- “Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015, pp. 708–712.
- “Discriminatively trained recurrent neural networks for single-channel speech separation,” in 2014 IEEE global conference on signal and information processing (GlobalSIP). IEEE, 2014, pp. 577–581.
- “Attention is all you need in speech separation,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 21–25.
- “Monaural speech dereverberation using temporal convolutional networks with self attention,” IEEE/ACM transactions on audio, speech, and language processing, vol. 28, pp. 1598–1607, 2020.
- “Efficiently modeling long sequences with structured state spaces,” arXiv preprint arXiv:2111.00396, 2021.
- “Simplified state space layers for sequence modeling,” arXiv preprint arXiv:2208.04933, 2022.
- “Long range arena: A benchmark for efficient transformers,” arXiv: Learning,arXiv: Learning, Nov 2020.
- “A multi-dimensional deep structured state space approach to speech enhancement using small-footprint models,” arXiv preprint arXiv:2306.00331, 2023.
- “Spatio-temporal backpropagation for training high-performance spiking neural networks,” Frontiers in neuroscience, vol. 12, pp. 331, 2018.
- “Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks,” IEEE Signal Processing Magazine, vol. 36, no. 6, pp. 51–63, 2019.
- “Single channel speech enhancement using u-net spiking neural networks,” arXiv preprint arXiv:2307.14464, 2023.
- “Recurrent lateral inhibitory spiking networks for speech enhancement,” in 2016 International Joint Conference on Neural Networks (IJCNN). IEEE, 2016, pp. 1023–1028.
- “Noise reduction using neural lateral inhibition for speech enhancement,” International Journal of Machine Learning and Computing, 2019.
- “The intel neuromorphic dns challenge,” Neuromorphic Computing and Engineering, vol. 3, no. 3, pp. 034005, 2023.
- Neuronal dynamics: From single neurons to networks and models of cognition, Cambridge University Press, 2014.
- “Hippo: Recurrent memory with optimal polynomial projections,” Advances in neural information processing systems, vol. 33, pp. 1474–1487, 2020.
- “Incorporating learnable membrane time constant to enhance learning of spiking neural networks,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 2661–2671.
- Cassia Valentini-Botinhao et al., “Noisy speech database for training speech enhancement algorithms and tts models,” University of Edinburgh. School of Informatics. Centre for Speech Technology Research (CSTR), 2017.
- Yi Hu and Philipos C Loizou, “Evaluation of objective quality measures for speech enhancement,” IEEE Transactions on audio, speech, and language processing, vol. 16, no. 1, pp. 229–238, 2007.
- “Improved speech enhancement with the wave-u-net,” arXiv preprint arXiv:1811.11307, 2018.
- “Glance and gaze: A collaborative learning framework for single-channel speech enhancement,” Applied Acoustics, vol. 187, pp. 108499, 2022.
- “Metricgan+: An improved version of metricgan for speech enhancement,” arXiv preprint arXiv:2104.03538, 2021.
- “Perceptual loss based speech denoising with an ensemble of audio pattern recognition and self-supervised models,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 7118–7122.