Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition (2107.01275v2)

Published 2 Jul 2021 in eess.AS, cs.CL, cs.LG, and cs.SD

Abstract: Recently, attention-based encoder-decoder (AED) models have shown high performance for end-to-end automatic speech recognition (ASR) across several tasks. Addressing overconfidence in such models, in this paper we introduce the concept of relaxed attention, which is a simple gradual injection of a uniform distribution to the encoder-decoder attention weights during training that is easily implemented with two lines of code. We investigate the effect of relaxed attention across different AED model architectures and two prominent ASR tasks, Wall Street Journal (WSJ) and Librispeech. We found that transformers trained with relaxed attention outperform the standard baseline models consistently during decoding with external LLMs. On WSJ, we set a new benchmark for transformer-based end-to-end speech recognition with a word error rate of 3.65%, outperforming state of the art (4.20%) by 13.1% relative, while introducing only a single hyperparameter.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Timo Lohrenz (3 papers)
  2. Patrick Schwarz (1 paper)
  3. Zhengyang Li (35 papers)
  4. Tim Fingscheidt (56 papers)
Citations (10)