Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Streaming Speech-to-Confusion Network Speech Recognition (2306.03778v1)

Published 2 Jun 2023 in eess.AS and cs.CL

Abstract: In interactive automatic speech recognition (ASR) systems, low-latency requirements limit the amount of search space that can be explored during decoding, particularly in end-to-end neural ASR. In this paper, we present a novel streaming ASR architecture that outputs a confusion network while maintaining limited latency, as needed for interactive applications. We show that 1-best results of our model are on par with a comparable RNN-T system, while the richer hypothesis set allows second-pass rescoring to achieve 10-20\% lower word error rate on the LibriSpeech task. We also show that our model outperforms a strong RNN-T baseline on a far-field voice assistant task.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Denis Filimonov (12 papers)
  2. Prabhat Pandey (4 papers)
  3. Ariya Rastrow (55 papers)
  4. Ankur Gandhe (30 papers)
  5. Andreas Stolcke (57 papers)