Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers (2309.14130v2)

Published 25 Sep 2023 in cs.SD, cs.CL, cs.LG, and eess.AS

Abstract: Internal LLM (ILM) subtraction has been widely applied to improve the performance of the RNN-Transducer with external LLM (LM) fusion for speech recognition. In this work, we show that sequence discriminative training has a strong correlation with ILM subtraction from both theoretical and empirical points of view. Theoretically, we derive that the global optimum of maximum mutual information (MMI) training shares a similar formula as ILM subtraction. Empirically, we show that ILM subtraction and sequence discriminative training achieve similar effects across a wide range of experiments on Librispeech, including both MMI and minimum Bayes risk (MBR) criteria, as well as neural transducers and LMs of both full and limited context. The benefit of ILM subtraction also becomes much smaller after sequence discriminative training. We also provide an in-depth study to show that sequence discriminative training has a minimal effect on the commonly used zero-encoder ILM estimation, but a joint effect on both encoder and prediction + joint network for posterior probability reshaping including both ILM and blank suppression.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zijian Yang (20 papers)
  2. Wei Zhou (311 papers)
  3. Ralf Schlüter (73 papers)
  4. Hermann Ney (104 papers)

Summary

We haven't generated a summary for this paper yet.