Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR (2011.02921v1)

Published 3 Nov 2020 in eess.AS, cs.CL, and cs.SD

Abstract: Recently, an end-to-end speaker-attributed automatic speech recognition (E2E SA-ASR) model was proposed as a joint model of speaker counting, speech recognition and speaker identification for monaural overlapped speech. In the previous study, the model parameters were trained based on the speaker-attributed maximum mutual information (SA-MMI) criterion, with which the joint posterior probability for multi-talker transcription and speaker identification are maximized over training data. Although SA-MMI training showed promising results for overlapped speech consisting of various numbers of speakers, the training criterion was not directly linked to the final evaluation metric, i.e., speaker-attributed word error rate (SA-WER). In this paper, we propose a speaker-attributed minimum Bayes risk (SA-MBR) training method where the parameters are trained to directly minimize the expected SA-WER over the training data. Experiments using the LibriSpeech corpus show that the proposed SA-MBR training reduces the SA-WER by 9.0 % relative compared with the SA-MMI-trained model.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Naoyuki Kanda (61 papers)
  2. Zhong Meng (53 papers)
  3. Liang Lu (42 papers)
  4. Yashesh Gaur (43 papers)
  5. Xiaofei Wang (138 papers)
  6. Zhuo Chen (319 papers)
  7. Takuya Yoshioka (77 papers)
Citations (16)

Summary

We haven't generated a summary for this paper yet.