Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model
Abstract: In this paper we present a method for single-channel wind noise reduction using our previously proposed diffusion-based stochastic regeneration model combining predictive and generative modelling. We introduce a non-additive speech in noise model to account for the non-linear deformation of the membrane caused by the wind flow and possible clipping. We show that our stochastic regeneration model outperforms other neural-network-based wind noise reduction methods as well as purely predictive and generative models, on a dataset using simulated and real-recorded wind noise. We further show that the proposed method generalizes well by testing on an unseen dataset with real-recorded wind noise. Audio samples, data generation scripts and code for the proposed methods can be found online (https://uhh.de/inf-sp-storm-wind).
- S. Kochkin, “Marketrak VIII: Consumer satisfaction with hearing aids is slowly increasing.,” The Hearing Journal, vol. 63, pp. 19–32, 2010.
- C. M. Nelke, Wind Noise Reduction: Signal Processing Concepts. PhD thesis, IKS RWTH Aachen, 2016.
- J. A. Zakis, “Wind noise at microphones within and across hearing aids at wind speeds below and above microphone saturation,” The Journal of the Acoustical Society of America, vol. 129, pp. 3897–3907, 06 2011.
- P. Thuene and G. Enzner, “Maximum-likelihood approach to adaptive multichannel-wiener postfiltering for wind-noise reduction,” in ITG Symp. Speech Comm., Oct. 2016.
- D. Mirabilii and E. Habets, “Spatial coherence-aware multi-channel wind noise reduction,” IEEE Trans. Audio, Speech, Language Proc., vol. 28, pp. 1974–1987, 2020.
- S. Franz and J. Bitzer, “Multi-channel algorithms for wind noise reduction and signal compensation in binaural hearing aids,” in Int. Workshop on Acoustic Signal Enhancement, Aug. 2010.
- E. Nemer and W. Leblanc, “Single-microphone wind noise reduction by adaptive postfiltering,” in IEEE Workshop Applications Signal Proc. Audio, Acoustics (WASPAA), Oct. 2009.
- C. M. Nelke, N. Chatlani, C. Beaugeant, and P. Vary, “Single microphone wind noise PSD estimation using signal centroids,” in IEEE Int. Conf. Acoustics, Speech, Signal Proc. (ICASSP), May 2014.
- C. M. Nelke, N. Nawroth, M. Jeub, C. Beaugeant, and P. Vary, “Single microphone wind noise reduction using techniques of artificial bandwidth extension,” in Proc. Euro. Signal Proc. Conf. (EUSIPCO), Aug. 2012.
- C. M. Nelke, P. A. Naylor, and P. Vary, “Corpus based reconstruction of speech degraded by wind noise,” in Proc. Euro. Signal Proc. Conf. (EUSIPCO), Mar. 2015.
- H. Bai, F. Ge, and Y. Yan, “DNN-based speech enhancement using soft audible noise masking for wind noise reduction,” China Communications, vol. 15, no. 9, pp. 235–243, 2018.
- J. Lee, K. Kim, T. Z. Shabestary, and H.-G. Kang, “Deep bi-directional long short-term memory based speech enhancement for wind noise reduction,” in Hands-free Speech Communications and Microphone Arrays (HSCMA), Mar. 2017.
- K. P. Murphy, Probabilistic Machine Learning: Advanced Topics. MIT Press, 2023.
- J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” in Int. Conf. Machine Learning (ICML), Apr. 2015.
- Y.-J. Lu, Z.-Q. Wang, S. Watanabe, A. Richard, C. Yu, and Y. Tsao, “Conditional diffusion probabilistic model for speech enhancement,” in IEEE Int. Conf. Acoustics, Speech, Signal Proc. (ICASSP), June 2022.
- S. Welker, J. Richter, and T. Gerkmann, “Speech enhancement with score-based generative models in the complex STFT domain,” in Interspeech, Sept. 2022.
- J. Richter, S. Welker, J.-M. Lemercier, B. Lay, and T. Gerkmann, “Speech enhancement and dereverberation with diffusion-based generative models,” arXiv 2208.05830, 2022.
- J.-M. Lemercier, J. Richter, S. Welker, and T. Gerkmann, “Analysing discriminative versus diffusion generative models for speech restoration tasks,” in IEEE Int. Conf. Acoustics, Speech, Signal Proc. (ICASSP), June 2023.
- J.-M. Lemercier, J. Richter, S. Welker, and T. Gerkmann, “StoRM: A diffusion-based stochastic regeneration model for speech enhancement and dereverberation,” arXiv 2212.11851, 2022.
- Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” in Int. Conf. Learning Repr. (ICLR), May 2021.
- B. D. Anderson, “Reverse-time diffusion equation models,” Stochastic Processes and their Applications, vol. 12, no. 3, pp. 313–326, 1982.
- N. Chen, Y. Zhang, H. Zen, R. J. Weiss, M. Norouzi, and W. Chan, “WaveGrad: Estimating gradients for waveform generation,” Int. Conf. Learning Repr. (ICLR), May 2021.
- P. Dhariwal and A. Nichol, “Diffusion models beat GANs on image synthesis,” in Neural Information Proc. Systems (NIPS), vol. 34, Dec. 2021.
- P. Vincent, “A connection between score matching and denoising autoencoders,” Neural Computation, vol. 23, no. 7, pp. 1661–1674, 2011.
- D. Mirabilii, A. Lodermeyer, F. Czwielong, S. Becker, and E. A. Habets, “Simulating wind noise with airflow speed-dependent characteristics,” in Int. Workshop on Acoustic Signal Enhancement, Sept. 2022.
- IKS RWTH Aachen University, “The IKS wind noise database,” 2023. https://www.iks.rwth-aachen.de/forschung/tools-downloads/databases/wind-noise-database.
- Yang, “Wind noise dataset,” 2023. https://doi.org/10.5281/zenodo.6687982.
- K. Arendt, A. Szumaczuk, B. Jasik, K. Piaskowski, P. Masztalski, M. Matuszewski, K. Nowicki, and P. Zborowski, “Test dataset for separation of speech, traffic sounds, wind noise, and general sounds,” 2020. https://doi.org/10.5281/zenodo.4279220.
- A. Li, C. Zheng, L. Zhang, and X. Li, “Glance and gaze: A collaborative learning framework for single-channel speech enhancement,” Applied Acoustics, vol. 187, p. 108499, 2022.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” Int. Conf. Learning Repr. (ICLR), May 2015.
- Y. Song and S. Ermon, “Improved techniques for training score-based generative models,” in Neural Information Proc. Systems (NIPS), Dec. 2020.
- A. Rix, J. Beerends, M. Hollier, and A. Hekstra, “Perceptual evaluation of speech quality (PESQ) : a new method for speech quality assessment of telephone networks and codecs,” in IEEE Int. Conf. Acoustics, Speech, Signal Proc. (ICASSP), May 2001.
- J. Jensen and C. Taal, “An algorithm for predicting the intelligibility of speech masked by modulated noise maskers,” IEEE/ACM Trans. Audio, Speech, Language Proc., vol. 24, no. 11, pp. 2009–2022, 2016.
- J. L. Roux, S. Wisdom, H. Erdogan, and J. R. Hershey, “SDR - Half-baked or well done?,” in IEEE Int. Conf. Acoustics, Speech, Signal Proc. (ICASSP), May 2019.
- C. K. A. Reddy, V. Gopal, and R. Cutler, “DNSMOS: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors,” arXiv 2010.15258, 2021.
- P. Andreev, A. Alanov, O. Ivanov, and D. Vetrov, “HiFi++: a unified framework for bandwidth extension and speech enhancement,” in IEEE Int. Conf. Acoustics, Speech, Signal Proc. (ICASSP), June 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.