Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Pushing the Limits of Non-Autoregressive Speech Recognition (2104.03416v4)

Published 7 Apr 2021 in eess.AS, cs.CL, cs.LG, and cs.SD

Abstract: We combine recent advancements in end-to-end speech recognition to non-autoregressive automatic speech recognition. We push the limits of non-autoregressive state-of-the-art results for multiple datasets: LibriSpeech, Fisher+Switchboard and Wall Street Journal. Key to our recipe, we leverage CTC on giant Conformer neural network architectures with SpecAugment and wav2vec2 pre-training. We achieve 1.8%/3.6% WER on LibriSpeech test/test-other sets, 5.1%/9.8% WER on Switchboard, and 3.4% on the Wall Street Journal, all without a LLM.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Edwin G. Ng (4 papers)
  2. Chung-Cheng Chiu (48 papers)
  3. Yu Zhang (1400 papers)
  4. William Chan (54 papers)
Citations (26)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com