Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Estimating Confusions in the ASR Channel for Improved Topic-based Language Model Adaptation (1303.5148v1)

Published 21 Mar 2013 in cs.CL and cs.LG

Abstract: Human language is a combination of elemental languages/domains/styles that change across and sometimes within discourses. LLMs, which play a crucial role in speech recognizers and machine translation systems, are particularly sensitive to such changes, unless some form of adaptation takes place. One approach to speech LLM adaptation is self-training, in which a LLM's parameters are tuned based on automatically transcribed audio. However, transcription errors can misguide self-training, particularly in challenging settings such as conversational speech. In this work, we propose a model that considers the confusions (errors) of the ASR channel. By modeling the likely confusions in the ASR output instead of using just the 1-best, we improve self-training efficacy by obtaining a more reliable reference transcription estimate. We demonstrate improved topic-based LLMing adaptation results over both 1-best and lattice self-training using our ASR channel confusion estimates on telephone conversations.

Summary

We haven't generated a summary for this paper yet.