Papers
Topics
Authors
Recent
Search
2000 character limit reached

Super Denoise Net: Speech Super Resolution with Noise Cancellation in Low Sampling Rate Noisy Environments

Published 9 Oct 2023 in eess.AS and cs.SD | (2310.05629v2)

Abstract: Speech super-resolution (SSR) aims to predict a high resolution (HR) speech signal from its low resolution (LR) corresponding part. Most neural SSR models focus on producing the final result in a noise-free environment by recovering the spectrogram of high-frequency part of the signal and concatenating it with the original low-frequency part. Although these methods achieve high accuracy, they become less effective when facing the real-world scenario, where unavoidable noise is present. To address this problem, we propose a Super Denoise Net (SDNet), a neural network for a joint task of super-resolution and noise reduction from a low sampling rate signal. To that end, we design gated convolution and lattice convolution blocks to enhance the repair capability and capture information in the time-frequency axis, respectively. The experiments show our method outperforms baseline speech denoising and SSR models on DNS 2020 no-reverb test set with higher objective and subjective scores.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. “Cyclegan bandwidth extension acoustic modeling for automatic speech recognition,” in Proc. ICASSP, 2019, pp. 6780–6784.
  2. “Nonparallel high-quality audio super resolution with domain adaptation and resampling cyclegans,” in Proc. ICASSP, 2023, pp. 1–5.
  3. “A deep neural network approach to speech bandwidth expansion,” in Proc. ICASSP, 2015, pp. 4395–4399.
  4. “Audio super resolution using neural networks,” arXiv preprint arXiv:1708.00853, 2017.
  5. Nathanaël Carraz Rakotonirina, “Self-attention for audio super-resolution,” in Proc. MLSP, 2021, pp. 1–6.
  6. “Nu-wave 2: A general neural audio upsampling model for various sampling rates,” in Proc. Interspeech, 2022, pp. 4401–4405.
  7. “mdctGAN: Taming transformer-based GAN for speech super-resolution with Modified DCT spectra,” in Proc. Interspeech, 2023, pp. 5112–5116.
  8. “Analysing diffusion-based generative approaches versus discriminative approaches for speech restoration,” in Proc. ICASSP, 2023, pp. 1–5.
  9. “Robust bandwidth extension of noise-corrupted narrowband speech,” in Ninth European conference on speech communication and technology. Citeseer, 2005.
  10. “A maximum a posterior-based reconstruction approach to speech bandwidth expansion in noise,” in Proc. ICASSP, 2014, pp. 6087–6091.
  11. “VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration,” in Proc. Interspeech, 2022, pp. 4232–4236.
  12. “Deep learning audio super resolution and noise cancellation system for low sampling rate noise environment,” in Proc. ICOT, 2022, pp. 1–5.
  13. “Dual-Signal Transformation LSTM Network for Real-Time Noise Suppression,” in Proc. Interspeech, 2020, pp. 2477–2481.
  14. “A joint bandwidth expansion and speech enhancement approach using deep neural network,” in Proc. ECCE, 2023, pp. 1–4.
  15. “Free-form image inpainting with gated convolution,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 4471–4480.
  16. “Lattice network for lightweight image restoration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 4, pp. 4826–4842, 2022.
  17. “Aero: Audio super resolution in the spectral domain,” in Proc. ICASSP, 2023, pp. 1–5.
  18. “Phasen: A phase-and-harmonics-aware speech enhancement network,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, vol. 34, pp. 9458–9465.
  19. “Real time speech enhancement in the waveform domain,” in Proc. Interspeech, 2020, pp. 3291–3295.
  20. Dubey et al., “Deep speech enhancement challenge at icassp 2023,” in ICASSP, 2023.
  21. Reddy et al., “The interspeech 2020 deep noise suppression challenge: Datasets, subjective testing framework, and challenge results,” in Proc. Interspeech, 2020, pp. 2492–2496.
  22. “WSRGlow: A Glow-Based Waveform Generative Model for Audio Super-Resolution,” in Proc. Interspeech, 2021, pp. 1649–1653.
  23. “DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement,” in Proc. Interspeech, 2020, pp. 2472–2476.
  24. “Fullsubnet: A full-band and sub-band fusion model for real-time single-channel speech enhancement,” in Proc. ICASSP, 2021, pp. 6633–6637.
  25. “Dpt-fsnet: Dual-path transformer based full-band and sub-band fusion network for speech enhancement,” in Proc. ICASSP, 2022, pp. 6857–6861.
  26. ITU-T Recommendation, “Perceptual evaluation of speech quality (pesq): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs,” Rec. ITU-T P. 862, 2001.
  27. “An algorithm for intelligibility prediction of time–frequency weighted noisy speech,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2125–2136, 2011.
  28. “Dnsmos: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors,” in ICASSP, 2021.
  29. ITU-T P. 835, “Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm,” ITU-T Recommendation, 2003.
  30. “U-shaped transformer with frequency-band aware attention for speech enhancement,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 1511–1521, 2023.
Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (4)

Collections

Sign up for free to add this paper to one or more collections.