Papers
Topics
Authors
Recent
Search
2000 character limit reached

ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition

Published 21 May 2020 in eess.AS, cs.CL, and cs.SD | (2005.10469v1)

Abstract: In this paper we present state-of-the-art (SOTA) performance on the LibriSpeech corpus with two novel neural network architectures, a multistream CNN for acoustic modeling and a self-attentive simple recurrent unit (SRU) for language modeling. In the hybrid ASR framework, the multistream CNN acoustic model processes an input of speech frames in multiple parallel pipelines where each stream has a unique dilation rate for diversity. Trained with the SpecAugment data augmentation method, it achieves relative word error rate (WER) improvements of 4% on test-clean and 14% on test-other. We further improve the performance via N-best rescoring using a 24-layer self-attentive SRU LLM, achieving WERs of 1.75% on test-clean and 4.46% on test-other.

Citations (21)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.