Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Early Stage LM Integration Using Local and Global Log-Linear Combination (2005.10049v1)

Published 20 May 2020 in eess.AS, cs.CL, cs.LG, cs.SD, and stat.ML

Abstract: Sequence-to-sequence models with an implicit alignment mechanism (e.g. attention) are closing the performance gap towards traditional hybrid hidden Markov models (HMM) for the task of automatic speech recognition. One important factor to improve word error rate in both cases is the use of an external LLM (LM) trained on large text-only corpora. LLM integration is straightforward with the clear separation of acoustic model and LLM in classical HMM-based modeling. In contrast, multiple integration schemes have been proposed for attention models. In this work, we present a novel method for LLM integration into implicit-alignment based sequence-to-sequence models. Log-linear model combination of acoustic and LLM is performed with a per-token renormalization. This allows us to compute the full normalization term efficiently both in training and in testing. This is compared to a global renormalization scheme which is equivalent to applying shallow fusion in training. The proposed methods show good improvements over standard model combination (shallow fusion) on our state-of-the-art Librispeech system. Furthermore, the improvements are persistent even if the LM is exchanged for a more powerful one after training.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Wilfried Michel (12 papers)
  2. Ralf Schlüter (73 papers)
  3. Hermann Ney (104 papers)
Citations (11)

Summary

We haven't generated a summary for this paper yet.