Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Global Normalization for Streaming Speech Recognition in a Modular Framework (2205.13674v1)

Published 26 May 2022 in cs.LG, cs.AI, and cs.CL

Abstract: We introduce the Globally Normalized Autoregressive Transducer (GNAT) for addressing the label bias problem in streaming speech recognition. Our solution admits a tractable exact computation of the denominator for the sequence-level normalization. Through theoretical and empirical results, we demonstrate that by switching to a globally normalized model, the word error rate gap between streaming and non-streaming speech-recognition models can be greatly reduced (by more than 50\% on the Librispeech dataset). This model is developed in a modular framework which encompasses all the common neural speech recognition models. The modularity of this framework enables controlled comparison of modelling choices and creation of new models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Ehsan Variani (13 papers)
  2. Ke Wu (85 papers)
  3. Michael Riley (16 papers)
  4. David Rybach (19 papers)
  5. Matt Shannon (10 papers)
  6. Cyril Allauzen (13 papers)
Citations (8)

Summary

We haven't generated a summary for this paper yet.