Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An improved hybrid CTC-Attention model for speech recognition (1810.12020v3)

Published 29 Oct 2018 in cs.SD and eess.AS

Abstract: Recently, end-to-end speech recognition with a hybrid model consisting of the connectionist temporal classification(CTC) and the attention encoder-decoder achieved state-of-the-art results. In this paper, we propose a novel CTC decoder structure based on the experiments we conducted and explore the relation between decoding performance and the depth of encoder. We also apply attention smoothing mechanism to acquire more context information for subword-based decoding. Taken together, these strategies allow us to achieve a word error rate(WER) of 4.43% without LM and 3.34% with RNN-LM on the test-clean subset of the LibriSpeech corpora, which by far are the best reported WERs for end-to-end ASR systems on this dataset.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zhe Yuan (75 papers)
  2. Zhuoran Lyu (1 paper)
  3. Jiwei Li (137 papers)
  4. Xi Zhou (43 papers)
Citations (9)

Summary

We haven't generated a summary for this paper yet.