Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TrimTail: Low-Latency Streaming ASR with Simple but Effective Spectrogram-Level Length Penalty (2211.00522v2)

Published 1 Nov 2022 in cs.SD, cs.CL, and eess.AS

Abstract: In this paper, we present TrimTail, a simple but effective emission regularization method to improve the latency of streaming ASR models. The core idea of TrimTail is to apply length penalty (i.e., by trimming trailing frames, see Fig. 1-(b)) directly on the spectrogram of input utterances, which does not require any alignment. We demonstrate that TrimTail is computationally cheap and can be applied online and optimized with any training loss or any model architecture on any dataset without any extra effort by applying it on various end-to-end streaming ASR networks either trained with CTC loss [1] or Transducer loss [2]. We achieve 100 $\sim$ 200ms latency reduction with equal or even better accuracy on both Aishell-1 and Librispeech. Moreover, by using TrimTail, we can achieve a 400ms algorithmic improvement of User Sensitive Delay (USD) with an accuracy loss of less than 0.2.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Xingchen Song (18 papers)
  2. Di Wu (477 papers)
  3. Zhiyong Wu (171 papers)
  4. Binbin Zhang (46 papers)
  5. Yuekai Zhang (10 papers)
  6. Zhendong Peng (20 papers)
  7. Wenpeng Li (7 papers)
  8. Fuping Pan (11 papers)
  9. Changbao Zhu (6 papers)
Citations (7)

Summary

We haven't generated a summary for this paper yet.