Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unrestricted Global Phase Bias-Aware Single-channel Speech Enhancement with Conformer-based Metric GAN (2402.08252v2)

Published 13 Feb 2024 in eess.AS and cs.SD

Abstract: With the rapid development of neural networks in recent years, the ability of various networks to enhance the magnitude spectrum of noisy speech in the single-channel speech enhancement domain has become exceptionally outstanding. However, enhancing the phase spectrum using neural networks is often ineffective, which remains a challenging problem. In this paper, we found that the human ear cannot sensitively perceive the difference between a precise phase spectrum and a biased phase (BP) spectrum. Therefore, we propose an optimization method of phase reconstruction, allowing freedom on the global-phase bias instead of reconstructing the precise phase spectrum. We applied it to a Conformer-based Metric Generative Adversarial Networks (CMGAN) baseline model, which relaxes the existing constraints of precise phase and gives the neural network a broader learning space. Results show that this method achieves a new state-of-the-art performance without incurring additional computational overhead.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Speech enhancement, Springer Science & Business Media, 2006.
  2. Automatic speech recognition, vol. 1, Springer, 2016.
  3. “Effect of noise suppression losses on speech distortion and asr performance,” in Proc. ICASSP, 2022, pp. 996–1000.
  4. “Metricgan: Generative adversarial networks based black-box metric scores optimization for speech enhancement,” in Proc. ICML, 2019, pp. 2031–2041.
  5. “MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement,” in Proc. Interspeech, 2021, pp. 201–205.
  6. “Dual-branch attention-in-attention transformer for single-channel speech enhancement,” in Proc. ICASSP, 2022, pp. 7847–7851.
  7. “MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra,” in Proc. Interspeech, 2023, pp. 3834–3838.
  8. “Phasen: A phase-and-harmonics-aware speech enhancement network,” in Proc. AAAI, 2020, vol. 34, no. 05, pp. 9458–9465.
  9. “Using separate losses for speech and noise in mask-based speech enhancement,” in Proc. ICASSP, 2020, pp. 7519–7523.
  10. “Time-frequency masking-based speech enhancement using generative adversarial network,” in Proc. ICASSP, 2018, pp. 5039–5043.
  11. Ke Tan and DeLiang Wang, “Complex spectral mapping with a convolutional recurrent network for monaural speech enhancement,” in Proc. ICASSP, 2019, pp. 6865–6869.
  12. Ke Tan and DeLiang Wang, “Learning complex spectral mapping with gated convolutional recurrent networks for monaural speech enhancement,” IEEE Trans. ASLP, vol. 28, pp. 380–390, 2019.
  13. “Improving Perceptual Quality by Phone-Fortified Perceptual Loss Using Wasserstein Distance for Speech Enhancement,” in Proc. Interspeech, 2021, pp. 196–200.
  14. “DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement,” in Proc. Interspeech, 2020, pp. 2472–2476.
  15. “Dpt-fsnet: Dual-path transformer based full-band and sub-band fusion network for speech enhancement,” in Proc. ICASSP, 2022, pp. 6857–6861.
  16. “TridentSE: Guiding Speech Enhancement with 32 Global Tokens,” in Proc. Interspeech, 2023, pp. 3839–3843.
  17. “SEGAN: Speech Enhancement Generative Adversarial Network,” in Proc. Interspeech, 2017, pp. 3642–3646.
  18. “Se-conformer: Time-domain speech enhancement using conformer.,” in Proc. Interspeech, 2021, pp. 2736–2740.
  19. Yi Luo and Nima Mesgarani, “Tasnet: time-domain audio separation network for real-time, single-channel speech separation,” in Proc. ICASSP, 2018, pp. 696–700.
  20. Yi Luo and Nima Mesgarani, “Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation,” IEEE Trans. ASLP, vol. 27, no. 8, pp. 1256–1266, 2019.
  21. “Signal estimation from modified short-time fourier transform,” IEEE Trans. ASLP, vol. 32, no. 2, pp. 236–243, 1984.
  22. “Neural speech phase prediction based on parallel estimation architecture and anti-wrapping losses,” in Proc. ICASSP, 2023, pp. 1–5.
  23. “Phase continuity: Learning derivatives of phase spectrum for speech enhancement,” in Proc. ICASSP, 2022, pp. 6942–6946.
  24. “Deep griffin–lim iteration,” in Proc. ICASSP, 2019, pp. 61–65.
  25. “Inter-frequency phase difference for phase reconstruction using deep neural networks and maximum likelihood,” IEEE Trans. ASLP, 2023.
  26. “Two-stage phase reconstruction using dnn and von mises distribution-based maximum likelihood,” in Proc. APSIPA, 2021, pp. 995–999.
  27. “Deep griffin–lim iteration: Trainable iterative phase reconstruction using neural network,” JSTSP, vol. 15, no. 1, pp. 37–50, 2020.
  28. “Sensitivity of human hearing to changes in phase spectrum,” JAES, vol. 61, no. 11, pp. 860–877, 2013.
  29. Roy D Patterson, “A pulse ribbon model of monaural phase perception,” JASA, vol. 82, no. 5, pp. 1560–1586, 1987.
  30. “Perceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone networks and codecs,” in Proc. ICASSP, 2001, vol. 2, pp. 749–752.
  31. “An effective quality evaluation protocol for speech enhancement algorithms,” in Proc. ICSLP, 1998.
  32. “CMGAN: Conformer-based Metric GAN for Speech Enhancement,” in Proc. Interspeech, 2022, pp. 936–940.
  33. “Online phase reconstruction via dnn-based phase differences estimation,” IEEE Trans. ASLP, vol. 31, pp. 163–176, 2022.
  34. “Speech analysis using instantaneous frequency deviation,” in Proc. Interspeech, 2008.
  35. “Recurrent phase reconstruction using estimated phase derivatives from deep neural networks,” in Proc. ICASSP, 2021, pp. 7088–7092.
  36. “Manner: Multi-view attention network for noise erasure,” in Proc. ICASSP, 2022, pp. 7842–7846.
  37. “D 2 net: A denoising and dereverberation network based on two-branch encoder and dual-path transformer,” in Proc. APSIPA, 2022, pp. 1649–1654.
  38. “D2former: A fully complex dual-path dual-decoder conformer network using joint complex masking and complex spectral mapping for monaural speech enhancement,” in Proc. ICASSP. IEEE, 2023, pp. 1–5.
  39. “Investigating rnn-based speech enhancement methods for noise-robust text-to-speech.,” in Proc. SSW, 2016, pp. 146–152.
  40. “A short-time objective intelligibility measure for time-frequency weighted noisy speech,” in Proc. ICASSP, 2010, pp. 4214–4217.
  41. Yi Hu and Philipos C Loizou, “Evaluation of objective quality measures for speech enhancement,” IEEE Trans. ASLP, vol. 16, no. 1, pp. 229–238, 2007.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com