Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Streaming Attention-Based Models with Augmented Memory for End-to-End Speech Recognition (2011.07120v1)

Published 3 Nov 2020 in cs.CL

Abstract: Attention-based models have been gaining popularity recently for their strong performance demonstrated in fields such as machine translation and automatic speech recognition. One major challenge of attention-based models is the need of access to the full sequence and the quadratically growing computational cost concerning the sequence length. These characteristics pose challenges, especially for low-latency scenarios, where the system is often required to be streaming. In this paper, we build a compact and streaming speech recognition system on top of the end-to-end neural transducer architecture with attention-based modules augmented with convolution. The proposed system equips the end-to-end models with the streaming capability and reduces the large footprint from the streaming attention-based model using augmented memory. On the LibriSpeech dataset, our proposed system achieves word error rates 2.7% on test-clean and 5.8% on test-other, to our best knowledge the lowest among streaming approaches reported so far.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Ching-Feng Yeh (22 papers)
  2. Yongqiang Wang (92 papers)
  3. Yangyang Shi (53 papers)
  4. Chunyang Wu (24 papers)
  5. Frank Zhang (22 papers)
  6. Julian Chan (11 papers)
  7. Michael L. Seltzer (34 papers)
Citations (8)