Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Speech separation with large-scale self-supervised learning (2211.05172v2)

Published 9 Nov 2022 in eess.AS, cs.CL, and cs.SD

Abstract: Self-supervised learning (SSL) methods such as WavLM have shown promising speech separation (SS) results in small-scale simulation-based experiments. In this work, we extend the exploration of the SSL-based SS by massively scaling up both the pre-training data (more than 300K hours) and fine-tuning data (10K hours). We also investigate various techniques to efficiently integrate the pre-trained model with the SS network under a limited computation budget, including a low frame rate SSL model training setup and a fine-tuning scheme using only the part of the pre-trained model. Compared with a supervised baseline and the WavLM-based SS model using feature embeddings obtained with the previously released 94K hours trained WavLM, our proposed model obtains 15.9% and 11.2% of relative word error rate (WER) reductions, respectively, for a simulated far-field speech mixture test set. For conversation transcription on real meeting recordings using continuous speech separation, the proposed model achieves 6.8% and 10.6% of relative WER reductions over the purely supervised baseline on AMI and ICSI evaluation sets, respectively, while reducing the computational cost by 38%.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Zhuo Chen (319 papers)
  2. Naoyuki Kanda (61 papers)
  3. Jian Wu (314 papers)
  4. Yu Wu (196 papers)
  5. Xiaofei Wang (138 papers)
  6. Takuya Yoshioka (77 papers)
  7. Jinyu Li (164 papers)
  8. Sunit Sivasankaran (11 papers)
  9. Sefik Emre Eskimez (28 papers)
Citations (11)