Papers

Topics

Authors

Recent

View all

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 81 tok/s

Gemini 2.5 Pro 42 tok/s Pro

GPT-5 Medium 23 tok/s Pro

GPT-5 High 20 tok/s Pro

GPT-4o 103 tok/s Pro

Kimi K2 188 tok/s Pro

GPT OSS 120B 454 tok/s Pro

Claude Sonnet 4 38 tok/s Pro

2000 character limit reached

Effective Text Adaptation for LLM-based ASR through Soft Prompt Fine-Tuning (2412.06967v1)

Published 9 Dec 2024 in cs.CL, cs.SD, and eess.AS

Abstract: The advent of LLMs (LLM) has reformed the Automatic Speech Recognition (ASR). Prompting LLM with audio embeddings to generate transcriptions becomes the new state-of-the-art ASR. Despite LLMs being trained with an extensive amount of text corpora, high-quality domain-specific text data can still significantly enhance ASR performance on domain adaptation tasks. Although LLM-based ASR can naturally incorporate more text corpora by fine-tuning the LLM decoder, fine-tuning such ASR on text-only data without paired prompts may diminish the effectiveness of domain-specific knowledge. To mitigate this issue, we propose a two-step soft prompt fine-tuning strategy that enhances domain-specific text adaptation. Experimental results show that text adaptation with our proposed method achieved a relative up to 9% Word Error Rate (WER) reduction and up to 18% Entity Error Rate (EER) reduction on the target domain compared to the baseline ASR. Combining this with domain-specific LLM (LM) fusion can further improve the EER by a relative 2-5%