Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A comparison of end-to-end models for long-form speech recognition (1911.02242v1)

Published 6 Nov 2019 in eess.AS, cs.CL, cs.LG, and cs.SD

Abstract: End-to-end automatic speech recognition (ASR) models, including both attention-based models and the recurrent neural network transducer (RNN-T), have shown superior performance compared to conventional systems. However, previous studies have focused primarily on short utterances that typically last for just a few seconds or, at most, a few tens of seconds. Whether such architectures are practical on long utterances that last from minutes to hours remains an open question. In this paper, we both investigate and improve the performance of end-to-end models on long-form transcription. We first present an empirical comparison of different end-to-end models on a real world long-form task and demonstrate that the RNN-T model is much more robust than attention-based systems in this regime. We next explore two improvements to attention-based systems that significantly improve its performance: restricting the attention to be monotonic, and applying a novel decoding algorithm that breaks long utterances into shorter overlapping segments. Combining these two improvements, we show that attention-based end-to-end models can be very competitive to RNN-T on long-form speech recognition.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (14)
  1. Chung-Cheng Chiu (48 papers)
  2. Wei Han (202 papers)
  3. Yu Zhang (1400 papers)
  4. Ruoming Pang (59 papers)
  5. Sergey Kishchenko (1 paper)
  6. Patrick Nguyen (15 papers)
  7. Arun Narayanan (34 papers)
  8. Hank Liao (13 papers)
  9. Shuyuan Zhang (28 papers)
  10. Anjuli Kannan (19 papers)
  11. Rohit Prabhavalkar (59 papers)
  12. Zhifeng Chen (65 papers)
  13. Tara Sainath (19 papers)
  14. Yonghui Wu (115 papers)
Citations (78)

Summary

We haven't generated a summary for this paper yet.