Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Word-Level ASR Quality Estimation for Efficient Corpus Sampling and Post-Editing through Analyzing Attentions of a Reference-Free Metric (2401.11268v2)

Published 20 Jan 2024 in cs.CL, cs.SD, and eess.AS

Abstract: In the realm of automatic speech recognition (ASR), the quest for models that not only perform with high accuracy but also offer transparency in their decision-making processes is crucial. The potential of quality estimation (QE) metrics is introduced and evaluated as a novel tool to enhance explainable artificial intelligence (XAI) in ASR systems. Through experiments and analyses, the capabilities of the NoRefER (No Reference Error Rate) metric are explored in identifying word-level errors to aid post-editors in refining ASR hypotheses. The investigation also extends to the utility of NoRefER in the corpus-building process, demonstrating its effectiveness in augmenting datasets with insightful annotations. The diagnostic aspects of NoRefER are examined, revealing its ability to provide valuable insights into model behaviors and decision patterns. This has proven beneficial for prioritizing hypotheses in post-editing workflows and fine-tuning ASR models. The findings suggest that NoRefER is not merely a tool for error detection but also a comprehensive framework for enhancing ASR systems' transparency, efficiency, and effectiveness. To ensure the reproducibility of the results, all source codes of this study are made publicly available.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. “Fermionic state discrimination by local operations and classical communication,” Physical Review Letters, vol. 125, no. 11, pp. 110403, 2020.
  2. “Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned,” arXiv preprint arXiv:1905.09418, 2019.
  3. “The Eval4NLP shared task on explainable quality estimation: Overview and results,” in Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems. 2021, pp. 165–178, Association for Computational Linguistics.
  4. “Unbabel’s participation in the WMT20 metrics shared task,” in Proceedings of the Fifth Conference on Machine Translation, 2020, pp. 911–920.
  5. “Evolvemt: an ensemble mt engine improving itself with usage only,” in The 61st Annual Meeting Of The Association For Computational Linguistics, 2023.
  6. “Multilingual word error rate estimation: E-wer3,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
  7. “A reference-less quality metric for automatic speech recognition via contrastive-learning of a multi-language model with self-supervision,” in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 2023, pp. 1–5, IEEE.
  8. “Norefer: a referenceless quality metric for automatic speech recognition via semi-supervised language model fine-tuning with contrastive learning,” in Proc. INTERSPEECH 2023, 2023, pp. 466–470.
  9. “Why should I trust you? explaining the predictions of any classifier,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 1135–1144.
  10. “Ist-unbabel 2021 submission for the explainable quality estimation shared task,” in Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems, 2021, pp. 133–145.
  11. “Attention is not only a weight: Analyzing transformers with vector norms,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2020, pp. 7057–7075.
  12. “Translation error detection as rationale extraction,” arXiv preprint arXiv:2108.12197, 2021.
  13. “The bottom-up evolution of representations in the transformer: A study with machine translation and language modeling objectives,” arXiv preprint arXiv:1909.01380, 2019.
  14. “Minilmv2: Multi-head self-attention relation distillation for compressing pretrained transformers,” arXiv preprint arXiv:2012.15828, 2020.
  15. “Conformer: Convolution-augmented transformer for speech recognition,” in Proc. INTERSPEECH, 2020.
  16. “Robust speech recognition via large-scale weak supervision,” in International Conference on Machine Learning. PMLR, 2023, pp. 28492–28518.
  17. “Common voice: A massively-multilingual speech corpus,” arXiv preprint arXiv:1912.06670, 2019.
  18. “Librispeech: an asr corpus based on public domain audio books,” in International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015, pp. 5206–5210.
  19. “A diagnostic study of explainability techniques for text classification,” arXiv preprint arXiv:2009.13295, 2020.
  20. JiWER, “JiWER: Evaluation metrics for asr,” 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Golara Javadi (5 papers)
  2. Kamer Ali Yuksel (14 papers)
  3. Yunsu Kim (40 papers)
  4. Thiago Castro Ferreira (10 papers)
  5. Mohamed Al-Badrashiny (6 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.