Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-talker ASR for an unknown number of sources: Joint training of source counting, separation and ASR (2006.02786v3)

Published 4 Jun 2020 in eess.AS, cs.CL, and cs.SD

Abstract: Most approaches to multi-talker overlapped speech separation and recognition assume that the number of simultaneously active speakers is given, but in realistic situations, it is typically unknown. To cope with this, we extend an iterative speech extraction system with mechanisms to count the number of sources and combine it with a single-talker speech recognizer to form the first end-to-end multi-talker automatic speech recognition system for an unknown number of active speakers. Our experiments show very promising performance in counting accuracy, source separation and speech recognition on simulated clean mixtures from WSJ0-2mix and WSJ0-3mix. Among others, we set a new state-of-the-art word error rate on the WSJ0-2mix database. Furthermore, our system generalizes well to a larger number of speakers than it ever saw during training, as shown in experiments with the WSJ0-4mix database.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Thilo von Neumann (16 papers)
  2. Christoph Boeddeker (36 papers)
  3. Lukas Drude (13 papers)
  4. Keisuke Kinoshita (44 papers)
  5. Marc Delcroix (94 papers)
  6. Tomohiro Nakatani (50 papers)
  7. Reinhold Haeb-Umbach (60 papers)
Citations (39)