Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory (2005.08042v1)

Published 16 May 2020 in eess.AS and cs.CL

Abstract: Transformer-based acoustic modeling has achieved great suc-cess for both hybrid and sequence-to-sequence speech recogni-tion. However, it requires access to the full sequence, and thecomputational cost grows quadratically with respect to the in-put sequence length. These factors limit its adoption for stream-ing applications. In this work, we proposed a novel augmentedmemory self-attention, which attends on a short segment of theinput sequence and a bank of memories. The memory bankstores the embedding information for all the processed seg-ments. On the librispeech benchmark, our proposed methodoutperforms all the existing streamable transformer methods bya large margin and achieved over 15% relative error reduction,compared with the widely used LC-BLSTM baseline. Our find-ings are also confirmed on some large internal datasets.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Chunyang Wu (24 papers)
  2. Yongqiang Wang (92 papers)
  3. Yangyang Shi (53 papers)
  4. Ching-Feng Yeh (22 papers)
  5. Frank Zhang (22 papers)
Citations (58)