Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Error Correction by Paying Attention to Both Acoustic and Confidence References for Automatic Speech Recognition (2407.12817v1)

Published 29 Jun 2024 in cs.CL, cs.SD, and eess.AS

Abstract: Accurately finding the wrong words in the automatic speech recognition (ASR) hypothesis and recovering them well-founded is the goal of speech error correction. In this paper, we propose a non-autoregressive speech error correction method. A Confidence Module measures the uncertainty of each word of the N-best ASR hypotheses as the reference to find the wrong word position. Besides, the acoustic feature from the ASR encoder is also used to provide the correct pronunciation references. N-best candidates from ASR are aligned using the edit path, to confirm each other and recover some missing character errors. Furthermore, the cross-attention mechanism fuses the information between error correction references and the ASR hypothesis. The experimental results show that both the acoustic and confidence references help with error correction. The proposed system reduces the error rate by 21% compared with the ASR model.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yuchun Shu (2 papers)
  2. Bo Hu (110 papers)
  3. Yifeng He (14 papers)
  4. Hao Shi (116 papers)
  5. Longbiao Wang (46 papers)
  6. Jianwu Dang (41 papers)