Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Online Automatic Speech Recognition with Listen, Attend and Spell Model (2008.05514v2)

Published 12 Aug 2020 in eess.AS, cs.CL, and cs.SD

Abstract: The Listen, Attend and Spell (LAS) model and other attention-based automatic speech recognition (ASR) models have known limitations when operated in a fully online mode. In this paper, we analyze the online operation of LAS models to demonstrate that these limitations stem from the handling of silence regions and the reliability of online attention mechanism at the edge of input buffers. We propose a novel and simple technique that can achieve fully online recognition while meeting accuracy and latency targets. For the Mandarin dictation task, our proposed approach can achieve a character error rate in online operation that is within 4% relative to an offline LAS model. The proposed online LAS model operates at 12% lower latency relative to a conventional neural network hidden Markov model hybrid of comparable accuracy. We have validated the proposed method through a production scale deployment, which, to the best of our knowledge, is the first such deployment of a fully online LAS model.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Roger Hsiao (10 papers)
  2. Dogan Can (4 papers)
  3. Tim Ng (9 papers)
  4. Ruchir Travadi (6 papers)
  5. Arnab Ghoshal (5 papers)
Citations (17)

Summary

We haven't generated a summary for this paper yet.