SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition (2110.05571v1)

Published 11 Oct 2021 in eess.AS and cs.CL

Abstract: The Transformer architecture has been well adopted as a dominant architecture in most sequence transduction tasks including automatic speech recognition (ASR), since its attention mechanism excels in capturing long-range dependencies. While models built solely upon attention can be better parallelized than regular RNN, a novel network architecture, SRU++, was recently proposed. By combining the fast recurrence and attention mechanism, SRU++ exhibits strong capability in sequence modeling and achieves near-state-of-the-art results in various LLMing and machine translation tasks with improved compute efficiency. In this work, we present the advantages of applying SRU++ in ASR tasks by comparing with Conformer across multiple ASR benchmarks and study how the benefits can be generalized to long-form speech inputs. On the popular LibriSpeech benchmark, our SRU++ model achieves 2.0% / 4.7% WER on test-clean / test-other, showing competitive performances compared with the state-of-the-art Conformer encoder under the same set-up. Specifically, SRU++ can surpass Conformer on long-form speech input with a large margin, based on our analysis.

Authors (5)

Jing Pan (25 papers)
Tao Lei (51 papers)
Kwangyoun Kim (18 papers)
Kyu Han (5 papers)
Shinji Watanabe (416 papers)

Citations (9)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition (2110.05571v1)

Summary

Related Papers