Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Speaker-Conditional Chain Model for Speech Separation and Extraction (2006.14149v1)

Published 25 Jun 2020 in eess.AS and cs.SD

Abstract: Speech separation has been extensively explored to tackle the cocktail party problem. However, these studies are still far from having enough generalization capabilities for real scenarios. In this work, we raise a common strategy named Speaker-Conditional Chain Model to process complex speech recordings. In the proposed method, our model first infers the identities of variable numbers of speakers from the observation based on a sequence-to-sequence model. Then, it takes the information from the inferred speakers as conditions to extract their speech sources. With the predicted speaker information from whole observation, our model is helpful to solve the problem of conventional speech separation and speaker extraction for multi-round long recordings. The experiments from standard fully-overlapped speech separation benchmarks show comparable results with prior studies, while our proposed model gets better adaptability for multi-round long recordings.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Jing Shi (123 papers)
  2. Jiaming Xu (86 papers)
  3. Yusuke Fujita (37 papers)
  4. Shinji Watanabe (416 papers)
  5. Bo Xu (212 papers)
Citations (20)

Summary

We haven't generated a summary for this paper yet.