Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR (2305.13716v3)

Published 23 May 2023 in cs.SD, cs.CL, and eess.AS

Abstract: The recently proposed serialized output training (SOT) simplifies multi-talker automatic speech recognition (ASR) by generating speaker transcriptions separated by a special token. However, frequent speaker changes can make speaker change prediction difficult. To address this, we propose boundary-aware serialized output training (BA-SOT), which explicitly incorporates boundary knowledge into the decoder via a speaker change detection task and boundary constraint loss. We also introduce a two-stage connectionist temporal classification (CTC) strategy that incorporates token-level SOT CTC to restore temporal context information. Besides typical character error rate (CER), we introduce utterance-dependent character error rate (UD-CER) to further measure the precision of speaker change prediction. Compared to original SOT, BA-SOT reduces CER/UD-CER by 5.1%/14.0%, and leveraging a pre-trained ASR model for BA-SOT model initialization further reduces CER/UD-CER by 8.4%/19.9%.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yuhao Liang (10 papers)
  2. Fan Yu (63 papers)
  3. Yangze Li (11 papers)
  4. Pengcheng Guo (55 papers)
  5. Shiliang Zhang (132 papers)
  6. Qian Chen (264 papers)
  7. Lei Xie (337 papers)
Citations (7)