Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward (2411.03866v2)

Published 6 Nov 2024 in cs.CL, cs.AI, cs.SD, and eess.AS

Abstract: Recent research has demonstrated that training a linear connector between speech foundation encoders and LLMs enables this architecture to achieve strong ASR capabilities. Despite the impressive results, it remains unclear whether these simple approaches are robust enough across different scenarios and speech conditions, such as domain shifts and speech perturbations. In this paper, we address these questions by conducting various ablation experiments using a recent and widely adopted approach called SLAM-ASR. We present novel empirical findings that offer insights on how to effectively utilize the SLAM-ASR architecture across a wide range of settings. Our main findings indicate that SLAM-ASR exhibits poor performance in cross-domain evaluation settings. Additionally, speech perturbations on in-domain data, such as changes in speech rate or additive noise, can significantly degrade performance. Our findings offer critical insights for fine-tuning and configuring robust LLM-based ASR models, tailored to different data characteristics and computational resources.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward (2411.03866v2)

Summary

Follow-up Questions

Authors (10)

Don't miss out on important new AI/ML research

Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward (2411.03866v2)

Summary

Follow-up Questions

Related Papers

Authors (10)

Don't miss out on important new AI/ML research