Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Ultra Low Complexity Deep Learning Based Noise Suppression (2312.08132v1)

Published 13 Dec 2023 in eess.AS, cs.LG, and eess.SP

Abstract: This paper introduces an innovative method for reducing the computational complexity of deep neural networks in real-time speech enhancement on resource-constrained devices. The proposed approach utilizes a two-stage processing framework, employing channelwise feature reorientation to reduce the computational load of convolutional operations. By combining this with a modified power law compression technique for enhanced perceptual quality, this approach achieves noise suppression performance comparable to state-of-the-art methods with significantly less computational requirements. Notably, our algorithm exhibits 3 to 4 times less computational complexity and memory usage than prior state-of-the-art approaches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. “A perceptually-motivated approach for low-complexity, real-time enhancement of fullband speech,” in Interspeech, 2020.
  2. “Segan: Speech enhancement generative adversarial network,” in Interspeech, 2017.
  3. Emanuël A. P. Habets and Jacob Benesty, “A two-stage beamforming approach for noise reduction and dereverberation,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 5, pp. 945–958, 2013.
  4. S. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 27, no. 2, pp. 113–120, 1979.
  5. Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square error log-spectral amplitude estimator,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 33, no. 2, pp. 443–445, 1985.
  6. “Real-time denoising and dereverberation wtih tiny recurrent u-net,” ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5789–5793, 2021.
  7. “Dccrn: Deep complex convolution recurrent network for phase-aware speech enhancement,” arXiv preprint arXiv:2008.00264, 2020.
  8. “Phase-aware single-stage speech denoising and dereverberation with u-net,” arXiv preprint arXiv:2006.00687, 2020.
  9. “Towards efficient models for real-time deep noise suppression,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 656–660.
  10. “Frcrn: Boosting feature representation using frequency recurrence for monaural speech enhancement,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 9281–9285.
  11. “Dmf-net: A decoupling-style multi-band fusion model for full-band speech enhancement,” in 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2022, pp. 1382–1387.
  12. “Tea-pse 2.0: Sub-band network for real-time personalized speech enhancement,” in 2022 IEEE Spoken Language Technology Workshop (SLT), 2023, pp. 472–479.
  13. James Clayton Anderson, Speech analysis/synthesis based on perception, Ph.D. thesis, Massachusetts Institute of Technology, 1984.
  14. “Deepfilternet2: Towards real-time speech enhancement on embedded devices for full-band audio,” in 2022 International Workshop on Acoustic Signal Enhancement (IWAENC). IEEE, 2022, pp. 1–5.
  15. “Clcnet: Deep learning-based noise reduction for hearing aids using complex linear coding,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 6949–6953.
  16. “Deep filtering: Signal extraction and reconstruction using complex time-frequency filters,” IEEE Signal Processing Letters, vol. 27, pp. 61–65, 2019.
  17. “Deepfilternet: A low complexity speech enhancement framework for full-band audio based on deep filtering,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 7407–7411.
  18. Ke Tan and DeLiang Wang, “A convolutional recurrent neural network for real-time speech enhancement.,” in Interspeech, 2018, vol. 2018, pp. 3229–3233.
  19. “Channel-wise subband input for better voice and accompaniment separation on high resolution music,” arXiv preprint arXiv:2008.05216, 2020.
  20. “On the importance of power compression and phase estimation in monaural speech dereverberation,” JASA express letters, vol. 1, no. 1, 2021.
  21. “Dnsmos: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 6493–6497.
  22. “Investigating rnn-based speech enhancement methods for noise-robust text-to-speech.,” in SSW, 2016, pp. 146–152.
  23. “Fullsubnet+: Channel attention fullsubnet with complex spectrograms for speech enhancement,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 7857–7861.
  24. “On training targets for supervised speech separation,” IEEE/ACM transactions on audio, speech, and language processing, vol. 22, no. 12, pp. 1849–1858, 2014.
  25. “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
  26. “Fullsubnet+: Channel attention fullsubnet with complex spectrograms for speech enhancement,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 7857–7861.
  27. “The interspeech 2020 deep noise suppression challenge: Datasets, subjective testing framework, and challenge results,” arXiv preprint arXiv:2005.13981, 2020.
  28. “Data augmentation and loss normalization for deep noise suppression,” in International Conference on Speech and Computer. Springer, 2020, pp. 79–86.
  29. IT Union, “Wideband extension to recommendation p. 862 for the assessment of wideband telephone networks and speech codecs,” International Telecommunication Union, Recommendation P, vol. 862, 2007.
  30. “Sdr–half-baked or well done?,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019, pp. 626–630.
  31. “User preference between residual noise and speech distortion in speech enhancement,” in 2022 International Workshop on Acoustic Signal Enhancement (IWAENC), 2022, pp. 1–5.
  32. “Towards the next generation of web-based experiments: A case study assessing basic audio quality following the itu-r recommendation bs. 1534 (mushra),” in 1st Web Audio Conference, 2015, pp. 1–6.
Citations (7)

Summary

We haven't generated a summary for this paper yet.