Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Pruned RNN-T for fast, memory-efficient ASR training (2206.13236v1)

Published 23 Jun 2022 in eess.AS, cs.AI, and cs.LG

Abstract: The RNN-Transducer (RNN-T) framework for speech recognition has been growing in popularity, particularly for deployed real-time ASR systems, because it combines high accuracy with naturally streaming recognition. One of the drawbacks of RNN-T is that its loss function is relatively slow to compute, and can use a lot of memory. Excessive GPU memory usage can make it impractical to use RNN-T loss in cases where the vocabulary size is large: for example, for Chinese character-based ASR. We introduce a method for faster and more memory-efficient RNN-T loss computation. We first obtain pruning bounds for the RNN-T recursion using a simple joiner network that is linear in the encoder and decoder embeddings; we can evaluate this without using much memory. We then use those pruning bounds to evaluate the full, non-linear joiner network.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Fangjun Kuang (13 papers)
  2. Liyong Guo (17 papers)
  3. Wei Kang (81 papers)
  4. Long Lin (14 papers)
  5. Mingshuang Luo (7 papers)
  6. Zengwei Yao (16 papers)
  7. Daniel Povey (45 papers)
Citations (59)