Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HypR: A comprehensive study for ASR hypothesis revising with a reference corpus (2309.09838v3)

Published 18 Sep 2023 in cs.CL, cs.SD, and eess.AS

Abstract: With the development of deep learning, automatic speech recognition (ASR) has made significant progress. To further enhance the performance of ASR, revising recognition results is one of the lightweight but efficient manners. Various methods can be roughly classified into N-best reranking modeling and error correction modeling. The former aims to select the hypothesis with the lowest error rate from a set of candidates generated by ASR for a given input speech. The latter focuses on detecting recognition errors in a given hypothesis and correcting these errors to obtain an enhanced result. However, we observe that these studies are hardly comparable to each other, as they are usually evaluated on different corpora, paired with different ASR models, and even use different datasets to train the models. Accordingly, we first concentrate on providing an ASR hypothesis revising (HypR) dataset in this study. HypR contains several commonly used corpora (AISHELL-1, TED-LIUM 2, and LibriSpeech) and provides 50 recognition hypotheses for each speech utterance. The checkpoint models of ASR are also published. In addition, we implement and compare several classic and representative methods, showing the recent research progress in revising speech recognition results. We hope that the publicly available HypR dataset can become a reference benchmark for subsequent research and promote this field of research to an advanced level.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Frederick  Jelinek, Statistical methods for speech recognition, MIT press, 1998.
  2. Automatic speech recognition, vol. 1, Springer, 2016.
  3. “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition,” in Proceedings of ICASSP, 2016, pp. 4960–4964.
  4. “Hybrid ctc/attention architecture for end-to-end speech recognition,” IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 8, pp. 1240–1253, 2017.
  5. “Non-autoregressive asr modeling using pre-trained language models for chinese speech recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 1474–1482, 2022.
  6. “A context-aware knowledge transferring strategy for ctc-based asr,” in in Proceedings of SLT, 2023, pp. 60–67.
  7. “Masked language model scoring,” in Proceedings of ACL, 2020, pp. 2699–2712.
  8. “Rescorebert: Discriminative speech recognition rescoring with bert,” in Proceedings of ICASSP, 2022, pp. 6117–6121.
  9. “FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition,” in Proceedings of NIPS, 2021, vol. 34, pp. 21708–21719.
  10. “Transcormer: Transformer for sentence scoring with sliding language modeling,” Proceedings of NIPS, vol. 35, pp. 11160–11174, 2022.
  11. “Innovative bert-based reranking language models for speech recognition,” in Proceedings of SLT, 2021, pp. 266–271.
  12. “Rescoring n-best speech recognition list based on one-on-one hypothesis comparison using encoder-classifier model,” in Proceedings of ICASSP, 2018, pp. 6099–6103.
  13. “BERT-Based Semantic Model for Rescoring N-Best Speech Recognition List,” in Proceedings of Interspeech, 2021, pp. 1867–1871.
  14. “Effective asr error correction leveraging phonetic, semantic information and n-best hypotheses,” in Proceedings of APSIPA, 2022, pp. 117–122.
  15. “Improving asr error correction using n-best hypotheses,” in Proceedings of ASRU, 2021, pp. 83–89.
  16. “BART Based Semantic Correction for Mandarin Automatic Speech Recognition System,” in Proceedings of Interspeech, 2021, pp. 2017–2021.
  17. “Ucorrect: An unsupervised framework for automatic speech recognition error correction,” in Proceedings of ICASSP, 2023.
  18. “Aishell-1: An open-source mandarin speech corpus and a speech recognition baseline,” in Proceedings of O-COCOSDA, 2017, pp. 1–5.
  19. “Enhancing the TED-LIUM corpus with selected data for language modeling and more TED talks,” in Proceedings of LREC’14, 2014, pp. 3935–3939.
  20. “Librispeech: An asr corpus based on public domain audio books,” in Proceedings of ICASSP, 2015, pp. 5206–5210.
  21. “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of NAACL, 2019, pp. 4171–4186.
  22. “FastCorrect 2: Fast error correction on multiple candidates for automatic speech recognition,” in Proceedings of EMNLP, 2021, pp. 4328–4337.
  23. “Softcorrect: Error correction with soft detection for automatic speech recognition,” in Proceedings of AAAI, 2023, vol. 37, pp. 13034–13042.
  24. “Correcting, rescoring and matching: An n-best list selection framework for speech recognition,” in Proceedings of APSIPA, 2022, pp. 729–734.
  25. “Can generative large language models perform asr error correction?,” arXiv preprint arXiv:2307.04172, 2023.
  26. “ESPnet: End-to-end speech processing toolkit,” in Proceedings of Interspeech, 2018, pp. 2207–2211.
  27. “Transformers: State-of-the-art natural language processing,” in Proceedings of EMNLP, 2020, pp. 38–45.
  28. “Language models are unsupervised multitask learners,” OpenAI blog, vol. 1, no. 8, pp. 9, 2019.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com