Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition (2006.01713v1)

Published 21 May 2020 in cs.SD and eess.AS

Abstract: End-to-end speech recognition has become popular in recent years, since it can integrate the acoustic, pronunciation and LLMs into a single neural network. Among end-to-end approaches, attention-based methods have emerged as being superior. For example, Transformer, which adopts an encoder-decoder architecture. The key improvement introduced by Transformer is the utilization of self-attention instead of recurrent mechanisms, enabling both encoder and decoder to capture long-range dependencies with lower computational complexity.In this work, we propose boosting the self-attention ability with a DFSMN memory block, forming the proposed memory equipped self-attention (SAN-M) mechanism. Theoretical and empirical comparisons have been made to demonstrate the relevancy and complementarity between self-attention and the DFSMN memory block. Furthermore, the proposed SAN-M provides an efficient mechanism to integrate these two modules. We have evaluated our approach on the public AISHELL-1 benchmark and an industrial-level 20,000-hour Mandarin speech recognition task. On both tasks, SAN-M systems achieved much better performance than the self-attention based Transformer baseline system. Specially, it can achieve a CER of 6.46% on the AISHELL-1 task even without using any external LM, comfortably outperforming other state-of-the-art systems.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zhifu Gao (28 papers)
  2. Shiliang Zhang (132 papers)
  3. Ming Lei (52 papers)
  4. Ian McLoughlin (30 papers)
Citations (34)

Summary

We haven't generated a summary for this paper yet.