Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs (2305.10649v1)

Published 18 May 2023 in cs.SD, cs.CL, and eess.AS

Abstract: In this paper, we present ZeroPrompt (Figure 1-(a)) and the corresponding Prompt-and-Refine strategy (Figure 3), two simple but effective \textbf{training-free} methods to decrease the Token Display Time (TDT) of streaming ASR models \textbf{without any accuracy loss}. The core idea of ZeroPrompt is to append zeroed content to each chunk during inference, which acts like a prompt to encourage the model to predict future tokens even before they were spoken. We argue that streaming acoustic encoders naturally have the modeling ability of Masked LLMs and our experiments demonstrate that ZeroPrompt is engineering cheap and can be applied to streaming acoustic encoders on any dataset without any accuracy loss. Specifically, compared with our baseline models, we achieve 350 $\sim$ 700ms reduction on First Token Display Time (TDT-F) and 100 $\sim$ 400ms reduction on Last Token Display Time (TDT-L), with theoretically and experimentally equal WER on both Aishell-1 and Librispeech datasets.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Xingchen Song (18 papers)
  2. Di Wu (477 papers)
  3. Binbin Zhang (46 papers)
  4. Zhendong Peng (20 papers)
  5. Bo Dang (16 papers)
  6. Fuping Pan (11 papers)
  7. Zhiyong Wu (171 papers)
Citations (18)

Summary

We haven't generated a summary for this paper yet.