Pruned RNN-T for fast, memory-efficient ASR training (2206.13236v1)

Published 23 Jun 2022 in eess.AS, cs.AI, and cs.LG

Abstract: The RNN-Transducer (RNN-T) framework for speech recognition has been growing in popularity, particularly for deployed real-time ASR systems, because it combines high accuracy with naturally streaming recognition. One of the drawbacks of RNN-T is that its loss function is relatively slow to compute, and can use a lot of memory. Excessive GPU memory usage can make it impractical to use RNN-T loss in cases where the vocabulary size is large: for example, for Chinese character-based ASR. We introduce a method for faster and more memory-efficient RNN-T loss computation. We first obtain pruning bounds for the RNN-T recursion using a simple joiner network that is linear in the encoder and decoder embeddings; we can evaluate this without using much memory. We then use those pruning bounds to evaluate the full, non-linear joiner network.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (7)

Fangjun Kuang (13 papers)
Liyong Guo (17 papers)
Wei Kang (81 papers)
Long Lin (14 papers)
Mingshuang Luo (7 papers)
Zengwei Yao (16 papers)
Daniel Povey (45 papers)

Citations (59)

View on Semantic Scholar

Pruned RNN-T for fast, memory-efficient ASR training (2206.13236v1)

Related Papers