Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation (2110.05249v1)

Published 11 Oct 2021 in eess.AS, cs.CL, and cs.SD

Abstract: Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines. Showing great potential for real-time applications, an increasing number of NAR models have been explored in different fields to mitigate the performance gap against AR models. In this work, we conduct a comparative study of various NAR modeling methods for end-to-end automatic speech recognition (ASR). Experiments are performed in the state-of-the-art setting using ESPnet. The results on various tasks provide interesting findings for developing an understanding of NAR ASR, such as the accuracy-speed trade-off and robustness against long-form utterances. We also show that the techniques can be combined for further improvement and applied to NAR end-to-end speech translation. All the implementations are publicly available to encourage further research in NAR speech processing.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Yosuke Higuchi (23 papers)
  2. Nanxin Chen (30 papers)
  3. Yuya Fujita (16 papers)
  4. Hirofumi Inaguma (42 papers)
  5. Tatsuya Komatsu (29 papers)
  6. Jaesong Lee (8 papers)
  7. Jumon Nozaki (4 papers)
  8. Tianzi Wang (37 papers)
  9. Shinji Watanabe (416 papers)
Citations (41)