2000 character limit reached
Ultra Low Complexity Deep Learning Based Noise Suppression (2312.08132v1)
Published 13 Dec 2023 in eess.AS, cs.LG, and eess.SP
Abstract: This paper introduces an innovative method for reducing the computational complexity of deep neural networks in real-time speech enhancement on resource-constrained devices. The proposed approach utilizes a two-stage processing framework, employing channelwise feature reorientation to reduce the computational load of convolutional operations. By combining this with a modified power law compression technique for enhanced perceptual quality, this approach achieves noise suppression performance comparable to state-of-the-art methods with significantly less computational requirements. Notably, our algorithm exhibits 3 to 4 times less computational complexity and memory usage than prior state-of-the-art approaches.
- “A perceptually-motivated approach for low-complexity, real-time enhancement of fullband speech,” in Interspeech, 2020.
- “Segan: Speech enhancement generative adversarial network,” in Interspeech, 2017.
- Emanuël A. P. Habets and Jacob Benesty, “A two-stage beamforming approach for noise reduction and dereverberation,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 5, pp. 945–958, 2013.
- S. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 27, no. 2, pp. 113–120, 1979.
- Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square error log-spectral amplitude estimator,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 33, no. 2, pp. 443–445, 1985.
- “Real-time denoising and dereverberation wtih tiny recurrent u-net,” ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5789–5793, 2021.
- “Dccrn: Deep complex convolution recurrent network for phase-aware speech enhancement,” arXiv preprint arXiv:2008.00264, 2020.
- “Phase-aware single-stage speech denoising and dereverberation with u-net,” arXiv preprint arXiv:2006.00687, 2020.
- “Towards efficient models for real-time deep noise suppression,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 656–660.
- “Frcrn: Boosting feature representation using frequency recurrence for monaural speech enhancement,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 9281–9285.
- “Dmf-net: A decoupling-style multi-band fusion model for full-band speech enhancement,” in 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2022, pp. 1382–1387.
- “Tea-pse 2.0: Sub-band network for real-time personalized speech enhancement,” in 2022 IEEE Spoken Language Technology Workshop (SLT), 2023, pp. 472–479.
- James Clayton Anderson, Speech analysis/synthesis based on perception, Ph.D. thesis, Massachusetts Institute of Technology, 1984.
- “Deepfilternet2: Towards real-time speech enhancement on embedded devices for full-band audio,” in 2022 International Workshop on Acoustic Signal Enhancement (IWAENC). IEEE, 2022, pp. 1–5.
- “Clcnet: Deep learning-based noise reduction for hearing aids using complex linear coding,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 6949–6953.
- “Deep filtering: Signal extraction and reconstruction using complex time-frequency filters,” IEEE Signal Processing Letters, vol. 27, pp. 61–65, 2019.
- “Deepfilternet: A low complexity speech enhancement framework for full-band audio based on deep filtering,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 7407–7411.
- Ke Tan and DeLiang Wang, “A convolutional recurrent neural network for real-time speech enhancement.,” in Interspeech, 2018, vol. 2018, pp. 3229–3233.
- “Channel-wise subband input for better voice and accompaniment separation on high resolution music,” arXiv preprint arXiv:2008.05216, 2020.
- “On the importance of power compression and phase estimation in monaural speech dereverberation,” JASA express letters, vol. 1, no. 1, 2021.
- “Dnsmos: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 6493–6497.
- “Investigating rnn-based speech enhancement methods for noise-robust text-to-speech.,” in SSW, 2016, pp. 146–152.
- “Fullsubnet+: Channel attention fullsubnet with complex spectrograms for speech enhancement,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 7857–7861.
- “On training targets for supervised speech separation,” IEEE/ACM transactions on audio, speech, and language processing, vol. 22, no. 12, pp. 1849–1858, 2014.
- “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
- “Fullsubnet+: Channel attention fullsubnet with complex spectrograms for speech enhancement,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 7857–7861.
- “The interspeech 2020 deep noise suppression challenge: Datasets, subjective testing framework, and challenge results,” arXiv preprint arXiv:2005.13981, 2020.
- “Data augmentation and loss normalization for deep noise suppression,” in International Conference on Speech and Computer. Springer, 2020, pp. 79–86.
- IT Union, “Wideband extension to recommendation p. 862 for the assessment of wideband telephone networks and speech codecs,” International Telecommunication Union, Recommendation P, vol. 862, 2007.
- “Sdr–half-baked or well done?,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019, pp. 626–630.
- “User preference between residual noise and speech distortion in speech enhancement,” in 2022 International Workshop on Acoustic Signal Enhancement (IWAENC), 2022, pp. 1–5.
- “Towards the next generation of web-based experiments: A case study assessing basic audio quality following the itu-r recommendation bs. 1534 (mushra),” in 1st Web Audio Conference, 2015, pp. 1–6.