Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dual-Path Modeling for Long Recording Speech Separation in Meetings (2102.11634v1)

Published 23 Feb 2021 in eess.AS and cs.SD

Abstract: The continuous speech separation (CSS) is a task to separate the speech sources from a long, partially overlapped recording, which involves a varying number of speakers. A straightforward extension of conventional utterance-level speech separation to the CSS task is to segment the long recording with a size-fixed window and process each window separately. Though effective, this extension fails to model the long dependency in speech and thus leads to sub-optimum performance. The recent proposed dual-path modeling could be a remedy to this problem, thanks to its capability in jointly modeling the cross-window dependency and the local-window processing. In this work, we further extend the dual-path modeling framework for CSS task. A transformer-based dual-path system is proposed, which integrates transform layers for global modeling. The proposed models are applied to LibriCSS, a real recorded multi-talk dataset, and consistent WER reduction can be observed in the ASR evaluation for separated speech. Also, a dual-path transformer equipped with convolutional layers is proposed. It significantly reduces the computation amount by 30% with better WER evaluation. Furthermore, the online processing dual-path models are investigated, which shows 10% relative WER reduction compared to the baseline.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Chenda Li (23 papers)
  2. Zhuo Chen (319 papers)
  3. Yi Luo (153 papers)
  4. Cong Han (27 papers)
  5. Tianyan Zhou (11 papers)
  6. Keisuke Kinoshita (44 papers)
  7. Marc Delcroix (94 papers)
  8. Shinji Watanabe (419 papers)
  9. Yanmin Qian (99 papers)
Citations (9)

Summary

We haven't generated a summary for this paper yet.