Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Streaming Speech-to-Confusion Network Speech Recognition (2306.03778v1)

Published 2 Jun 2023 in eess.AS and cs.CL

Abstract: In interactive automatic speech recognition (ASR) systems, low-latency requirements limit the amount of search space that can be explored during decoding, particularly in end-to-end neural ASR. In this paper, we present a novel streaming ASR architecture that outputs a confusion network while maintaining limited latency, as needed for interactive applications. We show that 1-best results of our model are on par with a comparable RNN-T system, while the richer hypothesis set allows second-pass rescoring to achieve 10-20\% lower word error rate on the LibriSpeech task. We also show that our model outperforms a strong RNN-T baseline on a far-field voice assistant task.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Denis Filimonov (12 papers)
  2. Prabhat Pandey (4 papers)
  3. Ariya Rastrow (55 papers)
  4. Ankur Gandhe (30 papers)
  5. Andreas Stolcke (57 papers)

Summary

We haven't generated a summary for this paper yet.