Papers
Topics
Authors
Recent
2000 character limit reached

SeMaScore : a new evaluation metric for automatic speech recognition tasks (2401.07506v2)

Published 15 Jan 2024 in eess.AS, cs.LG, and cs.SD

Abstract: In this study, we present SeMaScore, generated using a segment-wise mapping and scoring algorithm that serves as an evaluation metric for automatic speech recognition tasks. SeMaScore leverages both the error rate and a more robust similarity score. We show that our algorithm's score generation improves upon the state-of-the-art BERTScore. Our experimental results show that SeMaScore corresponds well with expert human assessments, signal-to-noise ratio levels, and other natural language metrics. We outperform BERTScore by 41x in metric computation speed. Overall, we demonstrate that SeMaScore serves as a more dependable evaluation metric, particularly in real-world situations involving atypical speech patterns.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. “Automatic speech recognition of disordered speech: Personalized models outperforming human listeners on short phrases.,” in Interspeech, 2021, pp. 4778–4782.
  2. “Why word error rate is not a good metric for speech recognizer training for the speech translation task?,” in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2011, pp. 5632–5635.
  3. “Automatic human utility evaluation of asr systems: Does wer really predict performance?,” in INTERSPEECH, 2013, pp. 3463–3467.
  4. “On the use of information retrieval measures for speech recognition evaluation,” Tech. Rep., IDIAP, 2004.
  5. “Performance measures for information extraction,” in Proceedings of DARPA broadcast news workshop. Herndon, VA, 1999, pp. 249–252.
  6. “From wer and ril to mer and wil: improved evaluation measures for connected speech recognition,” in Eighth International Conference on Spoken Language Processing, 2004.
  7. “Semantic distance: A new metric for asr performance analysis towards spoken language understanding,” arXiv preprint arXiv:2104.02138, 2021.
  8. “Evaluating user perception of speech recognition system quality with semantic distance metric,” arXiv preprint arXiv:2110.05376, 2021.
  9. “Evaluating and improving automatic speech recognition using severity,” in The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, 2023, pp. 79–91.
  10. “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
  11. “Assessing asr model quality on disordered speech using bertscore,” arXiv preprint arXiv:2209.10591, 2022.
  12. “Bertscore: Evaluating text generation with bert,” arXiv preprint arXiv:1904.09675, 2019.
  13. “The torgo database of acoustic and articulatory speech from speakers with dysarthria,” Language Resources and Evaluation, vol. 46, pp. 1–19, 01 2010.
  14. “Deep speech 2: End-to-end speech recognition in english and mandarin,” in International conference on machine learning. PMLR, 2016, pp. 173–182.
  15. “wav2vec 2.0: A framework for self-supervised learning of speech representations,” 2020.
  16. “Speech enhancement for a noise-robust text-to-speech synthesis system using deep recurrent neural networks.,” in Interspeech, 2016, vol. 8, pp. 352–356.
  17. “What is left to be understood in atis?,” in 2010 IEEE Spoken Language Technology Workshop. IEEE, 2010, pp. 19–24.
  18. “Speechelo,” https://speechelo.com/.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.