Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Diffusion Posterior Sampling for Informed Single-Channel Dereverberation (2306.12286v1)

Published 21 Jun 2023 in eess.AS, cs.LG, and cs.SD

Abstract: We present in this paper an informed single-channel dereverberation method based on conditional generation with diffusion models. With knowledge of the room impulse response, the anechoic utterance is generated via reverse diffusion using a measurement consistency criterion coupled with a neural network that represents the clean speech prior. The proposed approach is largely more robust to measurement noise compared to a state-of-the-art informed single-channel dereverberation method, especially for non-stationary noise. Furthermore, we compare to other blind dereverberation methods using diffusion models and show superiority of the proposed approach for large reverberation times. We motivate the presented algorithm by introducing an extension for blind dereverberation allowing joint estimation of the room impulse response and anechoic speech. Audio samples and code can be found online (https://uhh.de/inf-sp-derev-dps).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. D. Wang and J. Chen, “Supervised speech separation based on deep learning: An overview,” IEEE Trans. Audio, Speech, Language Proc., vol. 26, no. 10, pp. 1702–1726, 2018.
  2. D. S. Williamson and D. Wang, “Time-frequency masking in the complex domain for speech dereverberation and denoising,” IEEE/ACM Trans. Audio, Speech, Language Proc., vol. 25, no. 7, pp. 1492–1501, 2017.
  3. O. Ernst, S. E. Chazan, S. Gannot, and J. Goldberger, “Speech dereverberation using fully convolutional networks,” in Proc. Euro. Signal Proc. Conf. (EUSIPCO), Sept. 2019.
  4. K. Han, Y. Wang, D. Wang, W. S. Woods, I. Merks, and T. Zhang, “Learning spectral mapping for speech dereverberation and denoising,” IEEE/ACM Trans. Audio, Speech, Language Proc., vol. 23, no. 6, pp. 982–992, 2015.
  5. J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in Neural Information Proc. Systems (NIPS), Dec. 2020.
  6. Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” in Int. Conf. Learning Repr. (ICLR), May 2021.
  7. J. Richter, S. Welker, J.-M. Lemercier, B. Lay, and T. Gerkmann, “Speech enhancement and dereverberation with diffusion-based generative models,” IEEE Trans. Audio, Speech, Language Proc., pp. 1–13, 2023.
  8. J.-M. Lemercier, J. Richter, S. Welker, and T. Gerkmann, “StoRM: A diffusion-based stochastic regeneration model for speech enhancement and dereverberation,” arXiv, Dec. 2022.
  9. Y.-J. Lu, Z.-Q. Wang, S. Watanabe, A. Richard, C. Yu, and Y. Tsao, “Conditional diffusion probabilistic model for speech enhancement,” in IEEE Int. Conf. Acoustics, Speech, Signal Proc. (ICASSP), June 2022.
  10. S. T. Neely and J. B. Allen, “Invertibility of a room impulse response,” The Journal of the Acoustical Society of America, vol. 66, no. 1, pp. 165–169, 07 1979.
  11. M. Miyoshi and Y. Kaneda, “Inverse filtering of room acoustics,” IEEE Trans. Audio, Speech, Language Proc., vol. 36, no. 2, pp. 145–152, 1988.
  12. T. Hikichi, M. Delcroix, and M. Miyoshi, “Inverse filtering for speech dereverberation less sensitive to noise and room transfer function fluctuations,” EURASIP J. Adv. Sig. Proc., vol. 2007, Dec. 2007.
  13. J. Mourjopoulos, P. Clarkson, and J. Hammond, “A comparative study of least-squares and homomorphic techniques for the inversion of mixed phase signals,” in IEEE Int. Conf. Acoustics, Speech, Signal Proc. (ICASSP), June 1982.
  14. A. Mertins, T. Mei, and M. Kallinger, “Room impulse response shortening/reshaping with infinity- and p𝑝pitalic_p-norm optimization,” IEEE Trans. Audio, Speech, Language Proc., vol. 18, no. 2, pp. 249–259, 2010.
  15. H. Schepker, F. Denk, B. Kollmeier, and S. Doclo, “Robust single- and multi-loudspeaker least-squares-based equalization for hearing devices,” EURASIP J. Aud. Speech and Mus. Proc., vol. 2022, pp. 1–14, 06 2022.
  16. I. Kodrasi, T. Gerkmann, and S. Doclo, “Frequency-domain single-channel inverse filtering for speech dereverberation: Theory and practice,” in IEEE Int. Conf. Acoustics, Speech, Signal Proc. (ICASSP), May 2014.
  17. S. Welker, J. Richter, and T. Gerkmann, “Speech enhancement with score-based generative models in the complex STFT domain,” in Interspeech, Sept. 2022.
  18. J.-M. Lemercier, J. Richter, S. Welker, and T. Gerkmann, “Analysing discriminative versus diffusion generative models for speech restoration tasks,” in IEEE Int. Conf. Acoustics, Speech, Signal Proc. (ICASSP), June 2023.
  19. B. D. Anderson, “Reverse-time diffusion equation models,” Stochastic Processes and their Applications, vol. 12, no. 3, pp. 313–326, 1982.
  20. P. Vincent, “A connection between score matching and denoising autoencoders,” Neural Computation, vol. 23, no. 7, pp. 1661–1674, 2011.
  21. H. Chung, J. Kim, M. T. Mccann, M. L. Klasky, and J. C. Ye, “Diffusion posterior sampling for general noisy inverse problems,” Int. Conf. Learning Repr. (ICLR), May 2023.
  22. E. Moliner, J. Lehtinen, and V. Välimäki, “Solving audio inverse problems with a diffusion model,” in IEEE Int. Conf. Acoustics, Speech, Signal Proc. (ICASSP), June 2023.
  23. B. Efron, “Tweedie’s formula and selection bias,” Journal of the American Statistical Association, vol. 106, no. 496, pp. 1602–1614, 2011.
  24. S. Shoushtari, J. Liu, and U. S. Kamilov, “DOLPH: Diffusion models for phase retrieval,” arXiv, Nov. 2022.
  25. M. Maciejewski, G. Wichern, E. McQuinn, and J. L. Roux, “WHAMR!: Noisy and reverberant single-channel speech separation,” in IEEE Int. Conf. Acoustics, Speech, Signal Proc. (ICASSP), 2020.
  26. R. Scheibler, E. Bezzam, and I. Dokmanic, “Pyroomacoustics: A python package for audio room simulation and array processing algorithms,” in IEEE Int. Conf. Acoustics, Speech, Signal Proc. (ICASSP), Apr. 2018.
  27. A. Rix, J. Beerends, M. Hollier, and A. Hekstra, “Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs,” in IEEE Int. Conf. Acoustics, Speech, Signal Proc. (ICASSP), May 2001.
  28. J. Jensen and C. Taal, “An algorithm for predicting the intelligibility of speech masked by modulated noise maskers,” IEEE/ACM Trans. Audio, Speech, Language Proc., vol. 24, no. 11, pp. 2009–2022, 2016.
  29. P. Andreev, A. Alanov, O. Ivanov, and D. Vetrov, “Hifi++: a unified framework for bandwidth extension and speech enhancement,” in IEEE Int. Conf. Acoustics, Speech, Signal Proc. (ICASSP), June 2023.
  30. H. Chung, J. Kim, S. Kim, and J. C. Ye, “Parallel diffusion models of operator and image for blind inverse problems,” IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), June 2023.
Citations (5)

Summary

We haven't generated a summary for this paper yet.