Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming ASR (2203.16758v2)

Published 31 Mar 2022 in eess.AS and cs.CL

Abstract: History and future contextual information are known to be important for accurate acoustic modeling. However, acquiring future context brings latency for streaming ASR. In this paper, we propose a new framework - Chunking, Simulating Future Context and Decoding (CUSIDE) for streaming speech recognition. A new simulation module is introduced to recursively simulate the future contextual frames, without waiting for future context. The simulation module is jointly trained with the ASR model using a self-supervised loss; the ASR model is optimized with the usual ASR loss, e.g., CTC-CRF as used in our experiments. Experiments show that, compared to using real future frames as right context, using simulated future context can drastically reduce latency while maintaining recognition accuracy. With CUSIDE, we obtain new state-of-the-art streaming ASR results on the AISHELL-1 dataset.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Keyu An (18 papers)
  2. Huahuan Zheng (6 papers)
  3. Zhijian Ou (58 papers)
  4. Hongyu Xiang (13 papers)
  5. Ke Ding (30 papers)
  6. Guanglu Wan (24 papers)
Citations (16)

Summary

We haven't generated a summary for this paper yet.