SeMaScore : a new evaluation metric for automatic speech recognition tasks (2401.07506v2)
Abstract: In this study, we present SeMaScore, generated using a segment-wise mapping and scoring algorithm that serves as an evaluation metric for automatic speech recognition tasks. SeMaScore leverages both the error rate and a more robust similarity score. We show that our algorithm's score generation improves upon the state-of-the-art BERTScore. Our experimental results show that SeMaScore corresponds well with expert human assessments, signal-to-noise ratio levels, and other natural language metrics. We outperform BERTScore by 41x in metric computation speed. Overall, we demonstrate that SeMaScore serves as a more dependable evaluation metric, particularly in real-world situations involving atypical speech patterns.
- “Automatic speech recognition of disordered speech: Personalized models outperforming human listeners on short phrases.,” in Interspeech, 2021, pp. 4778–4782.
- “Why word error rate is not a good metric for speech recognizer training for the speech translation task?,” in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2011, pp. 5632–5635.
- “Automatic human utility evaluation of asr systems: Does wer really predict performance?,” in INTERSPEECH, 2013, pp. 3463–3467.
- “On the use of information retrieval measures for speech recognition evaluation,” Tech. Rep., IDIAP, 2004.
- “Performance measures for information extraction,” in Proceedings of DARPA broadcast news workshop. Herndon, VA, 1999, pp. 249–252.
- “From wer and ril to mer and wil: improved evaluation measures for connected speech recognition,” in Eighth International Conference on Spoken Language Processing, 2004.
- “Semantic distance: A new metric for asr performance analysis towards spoken language understanding,” arXiv preprint arXiv:2104.02138, 2021.
- “Evaluating user perception of speech recognition system quality with semantic distance metric,” arXiv preprint arXiv:2110.05376, 2021.
- “Evaluating and improving automatic speech recognition using severity,” in The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, 2023, pp. 79–91.
- “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
- “Assessing asr model quality on disordered speech using bertscore,” arXiv preprint arXiv:2209.10591, 2022.
- “Bertscore: Evaluating text generation with bert,” arXiv preprint arXiv:1904.09675, 2019.
- “The torgo database of acoustic and articulatory speech from speakers with dysarthria,” Language Resources and Evaluation, vol. 46, pp. 1–19, 01 2010.
- “Deep speech 2: End-to-end speech recognition in english and mandarin,” in International conference on machine learning. PMLR, 2016, pp. 173–182.
- “wav2vec 2.0: A framework for self-supervised learning of speech representations,” 2020.
- “Speech enhancement for a noise-robust text-to-speech synthesis system using deep recurrent neural networks.,” in Interspeech, 2016, vol. 8, pp. 352–356.
- “What is left to be understood in atis?,” in 2010 IEEE Spoken Language Technology Workshop. IEEE, 2010, pp. 19–24.
- “Speechelo,” https://speechelo.com/.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.