Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Cascaded encoders for unifying streaming and non-streaming ASR (2010.14606v1)

Published 27 Oct 2020 in eess.AS, cs.CL, and cs.SD

Abstract: End-to-end (E2E) automatic speech recognition (ASR) models, by now, have shown competitive performance on several benchmarks. These models are structured to either operate in streaming or non-streaming mode. This work presents cascaded encoders for building a single E2E ASR model that can operate in both these modes simultaneously. The proposed model consists of streaming and non-streaming encoders. Input features are first processed by the streaming encoder; the non-streaming encoder operates exclusively on the output of the streaming encoder. A single decoder then learns to decode either using the output of the streaming or the non-streaming encoder. Results show that this model achieves similar word error rates (WER) as a standalone streaming model when operating in streaming mode, and obtains 10% -- 27% relative improvement when operating in non-streaming mode. Our results also show that the proposed approach outperforms existing E2E two-pass models, especially on long-form speech.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Arun Narayanan (34 papers)
  2. Tara N. Sainath (79 papers)
  3. Ruoming Pang (59 papers)
  4. Jiahui Yu (65 papers)
  5. Chung-Cheng Chiu (48 papers)
  6. Rohit Prabhavalkar (59 papers)
  7. Ehsan Variani (13 papers)
  8. Trevor Strohman (38 papers)
Citations (80)