Noise-robust Speech Separation with Fast Generative Correction (2406.07461v1)

Published 11 Jun 2024 in eess.AS

Abstract: Speech separation, the task of isolating multiple speech sources from a mixed audio signal, remains challenging in noisy environments. In this paper, we propose a generative correction method to enhance the output of a discriminative separator. By leveraging a generative corrector based on a diffusion model, we refine the separation process for single-channel mixture speech by removing noises and perceptually unnatural distortions. Furthermore, we optimize the generative model using a predictive loss to streamline the diffusion model's reverse process into a single step and rectify any associated errors by the reverse process. Our method achieves state-of-the-art performance on the in-domain Libri2Mix noisy dataset, and out-of-domain WSJ with a variety of noises, improving SI-SNR by 22-35% relative to SepFormer, demonstrating robustness and strong generalization capabilities.

Authors (6)

Helin Wang (35 papers)
Jesus Villalba (47 papers)
Jiarui Hai (10 papers)
Thomas Thebaud (15 papers)
Najim Dehak (71 papers)
Laureano Moro-Velazquez (28 papers)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Noise-robust Speech Separation with Fast Generative Correction (2406.07461v1)

Summary

Related Papers