Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Noise-robust Speech Separation with Fast Generative Correction (2406.07461v1)

Published 11 Jun 2024 in eess.AS

Abstract: Speech separation, the task of isolating multiple speech sources from a mixed audio signal, remains challenging in noisy environments. In this paper, we propose a generative correction method to enhance the output of a discriminative separator. By leveraging a generative corrector based on a diffusion model, we refine the separation process for single-channel mixture speech by removing noises and perceptually unnatural distortions. Furthermore, we optimize the generative model using a predictive loss to streamline the diffusion model's reverse process into a single step and rectify any associated errors by the reverse process. Our method achieves state-of-the-art performance on the in-domain Libri2Mix noisy dataset, and out-of-domain WSJ with a variety of noises, improving SI-SNR by 22-35% relative to SepFormer, demonstrating robustness and strong generalization capabilities.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Helin Wang (35 papers)
  2. Jesus Villalba (47 papers)
  3. Jiarui Hai (10 papers)
  4. Thomas Thebaud (15 papers)
  5. Najim Dehak (71 papers)
  6. Laureano Moro-Velazquez (28 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.