Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Continuous Speech Separation with Recurrent Selective Attention Network (2110.14838v1)

Published 28 Oct 2021 in eess.AS and cs.SD

Abstract: While permutation invariant training (PIT) based continuous speech separation (CSS) significantly improves the conversation transcription accuracy, it often suffers from speech leakages and failures in separation at "hot spot" regions because it has a fixed number of output channels. In this paper, we propose to apply recurrent selective attention network (RSAN) to CSS, which generates a variable number of output channels based on active speaker counting. In addition, we propose a novel block-wise dependency extension of RSAN by introducing dependencies between adjacent processing blocks in the CSS framework. It enables the network to utilize the separation results from the previous blocks to facilitate the current block processing. Experimental results on the LibriCSS dataset show that the RSAN-based CSS (RSAN-CSS) network consistently improves the speech recognition accuracy over PIT-based models. The proposed block-wise dependency modeling further boosts the performance of RSAN-CSS.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yixuan Zhang (94 papers)
  2. Zhuo Chen (319 papers)
  3. Jian Wu (314 papers)
  4. Takuya Yoshioka (77 papers)
  5. Peidong Wang (33 papers)
  6. Zhong Meng (53 papers)
  7. Jinyu Li (164 papers)
Citations (7)