Super Denoise Net: Speech Super Resolution with Noise Cancellation in Low Sampling Rate Noisy Environments
Abstract: Speech super-resolution (SSR) aims to predict a high resolution (HR) speech signal from its low resolution (LR) corresponding part. Most neural SSR models focus on producing the final result in a noise-free environment by recovering the spectrogram of high-frequency part of the signal and concatenating it with the original low-frequency part. Although these methods achieve high accuracy, they become less effective when facing the real-world scenario, where unavoidable noise is present. To address this problem, we propose a Super Denoise Net (SDNet), a neural network for a joint task of super-resolution and noise reduction from a low sampling rate signal. To that end, we design gated convolution and lattice convolution blocks to enhance the repair capability and capture information in the time-frequency axis, respectively. The experiments show our method outperforms baseline speech denoising and SSR models on DNS 2020 no-reverb test set with higher objective and subjective scores.
- “Cyclegan bandwidth extension acoustic modeling for automatic speech recognition,” in Proc. ICASSP, 2019, pp. 6780–6784.
- “Nonparallel high-quality audio super resolution with domain adaptation and resampling cyclegans,” in Proc. ICASSP, 2023, pp. 1–5.
- “A deep neural network approach to speech bandwidth expansion,” in Proc. ICASSP, 2015, pp. 4395–4399.
- “Audio super resolution using neural networks,” arXiv preprint arXiv:1708.00853, 2017.
- Nathanaël Carraz Rakotonirina, “Self-attention for audio super-resolution,” in Proc. MLSP, 2021, pp. 1–6.
- “Nu-wave 2: A general neural audio upsampling model for various sampling rates,” in Proc. Interspeech, 2022, pp. 4401–4405.
- “mdctGAN: Taming transformer-based GAN for speech super-resolution with Modified DCT spectra,” in Proc. Interspeech, 2023, pp. 5112–5116.
- “Analysing diffusion-based generative approaches versus discriminative approaches for speech restoration,” in Proc. ICASSP, 2023, pp. 1–5.
- “Robust bandwidth extension of noise-corrupted narrowband speech,” in Ninth European conference on speech communication and technology. Citeseer, 2005.
- “A maximum a posterior-based reconstruction approach to speech bandwidth expansion in noise,” in Proc. ICASSP, 2014, pp. 6087–6091.
- “VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration,” in Proc. Interspeech, 2022, pp. 4232–4236.
- “Deep learning audio super resolution and noise cancellation system for low sampling rate noise environment,” in Proc. ICOT, 2022, pp. 1–5.
- “Dual-Signal Transformation LSTM Network for Real-Time Noise Suppression,” in Proc. Interspeech, 2020, pp. 2477–2481.
- “A joint bandwidth expansion and speech enhancement approach using deep neural network,” in Proc. ECCE, 2023, pp. 1–4.
- “Free-form image inpainting with gated convolution,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 4471–4480.
- “Lattice network for lightweight image restoration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 4, pp. 4826–4842, 2022.
- “Aero: Audio super resolution in the spectral domain,” in Proc. ICASSP, 2023, pp. 1–5.
- “Phasen: A phase-and-harmonics-aware speech enhancement network,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, vol. 34, pp. 9458–9465.
- “Real time speech enhancement in the waveform domain,” in Proc. Interspeech, 2020, pp. 3291–3295.
- Dubey et al., “Deep speech enhancement challenge at icassp 2023,” in ICASSP, 2023.
- Reddy et al., “The interspeech 2020 deep noise suppression challenge: Datasets, subjective testing framework, and challenge results,” in Proc. Interspeech, 2020, pp. 2492–2496.
- “WSRGlow: A Glow-Based Waveform Generative Model for Audio Super-Resolution,” in Proc. Interspeech, 2021, pp. 1649–1653.
- “DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement,” in Proc. Interspeech, 2020, pp. 2472–2476.
- “Fullsubnet: A full-band and sub-band fusion model for real-time single-channel speech enhancement,” in Proc. ICASSP, 2021, pp. 6633–6637.
- “Dpt-fsnet: Dual-path transformer based full-band and sub-band fusion network for speech enhancement,” in Proc. ICASSP, 2022, pp. 6857–6861.
- ITU-T Recommendation, “Perceptual evaluation of speech quality (pesq): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs,” Rec. ITU-T P. 862, 2001.
- “An algorithm for intelligibility prediction of time–frequency weighted noisy speech,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2125–2136, 2011.
- “Dnsmos: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors,” in ICASSP, 2021.
- ITU-T P. 835, “Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm,” ITU-T Recommendation, 2003.
- “U-shaped transformer with frequency-band aware attention for speech enhancement,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 1511–1521, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.