Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Delay-penalized transducer for low-latency streaming ASR (2211.00490v1)

Published 31 Oct 2022 in eess.AS, cs.CL, cs.LG, and cs.SD

Abstract: In streaming automatic speech recognition (ASR), it is desirable to reduce latency as much as possible while having minimum impact on recognition accuracy. Although a few existing methods are able to achieve this goal, they are difficult to implement due to their dependency on external alignments. In this paper, we propose a simple way to penalize symbol delay in transducer model, so that we can balance the trade-off between symbol delay and accuracy for streaming models without external alignments. Specifically, our method adds a small constant times (T/2 - t), where T is the number of frames and t is the current frame, to all the non-blank log-probabilities (after normalization) that are fed into the two dimensional transducer recursion. For both streaming Conformer models and unidirectional long short-term memory (LSTM) models, experimental results show that it can significantly reduce the symbol delay with an acceptable performance degradation. Our method achieves similar delay-accuracy trade-off to the previously published FastEmit, but we believe our method is preferable because it has a better justification: it is equivalent to penalizing the average symbol delay. Our work is open-sourced and publicly available (https://github.com/k2-fsa/k2).

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Wei Kang (81 papers)
  2. Zengwei Yao (16 papers)
  3. Fangjun Kuang (13 papers)
  4. Liyong Guo (17 papers)
  5. Xiaoyu Yang (85 papers)
  6. Daniel Povey (45 papers)
  7. Long Lin (14 papers)
  8. Piotr Żelasko (36 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.