Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis (2011.02014v1)

Published 3 Nov 2020 in eess.AS and cs.SD

Abstract: Multi-speaker speech recognition of unsegmented recordings has diverse applications such as meeting transcription and automatic subtitle generation. With technical advances in systems dealing with speech separation, speaker diarization, and automatic speech recognition (ASR) in the last decade, it has become possible to build pipelines that achieve reasonable error rates on this task. In this paper, we propose an end-to-end modular system for the LibriCSS meeting data, which combines independently trained separation, diarization, and recognition components, in that order. We study the effect of different state-of-the-art methods at each stage of the pipeline, and report results using task-specific metrics like SDR and DER, as well as downstream WER. Experiments indicate that the problem of overlapping speech for diarization and ASR can be effectively mitigated with the presence of a well-trained separation module. Our best system achieves a speaker-attributed WER of 12.7%, which is close to that of a non-overlapping ASR.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (14)
  1. Desh Raj (32 papers)
  2. Pavel Denisov (19 papers)
  3. Zhuo Chen (319 papers)
  4. Hakan Erdogan (32 papers)
  5. Zili Huang (18 papers)
  6. Maokui He (8 papers)
  7. Shinji Watanabe (416 papers)
  8. Jun Du (130 papers)
  9. Takuya Yoshioka (77 papers)
  10. Yi Luo (153 papers)
  11. Naoyuki Kanda (61 papers)
  12. Jinyu Li (164 papers)
  13. Scott Wisdom (33 papers)
  14. John R. Hershey (40 papers)
Citations (80)

Summary

We haven't generated a summary for this paper yet.