Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RNN-T For Latency Controlled ASR With Improved Beam Search (1911.01629v2)

Published 5 Nov 2019 in cs.CL, cs.LG, and eess.AS

Abstract: Neural transducer-based systems such as RNN Transducers (RNN-T) for automatic speech recognition (ASR) blend the individual components of a traditional hybrid ASR systems (acoustic model, LLM, punctuation model, inverse text normalization) into one single model. This greatly simplifies training and inference and hence makes RNN-T a desirable choice for ASR systems. In this work, we investigate use of RNN-T in applications that require a tune-able latency budget during inference time. We also improved the decoding speed of the originally proposed RNN-T beam search algorithm. We evaluated our proposed system on English videos ASR dataset and show that neural RNN-T models can achieve comparable WER and better computational efficiency compared to a well tuned hybrid ASR baseline.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Mahaveer Jain (6 papers)
  2. Kjell Schubert (5 papers)
  3. Jay Mahadeokar (36 papers)
  4. Ching-Feng Yeh (22 papers)
  5. Kaustubh Kalgaonkar (6 papers)
  6. Anuroop Sriram (32 papers)
  7. Christian Fuegen (36 papers)
  8. Michael L. Seltzer (34 papers)
Citations (43)