Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 179 tok/s

Gemini 2.5 Pro 51 tok/s Pro

GPT-5 Medium 40 tok/s Pro

GPT-5 High 35 tok/s Pro

GPT-4o 103 tok/s Pro

Kimi K2 207 tok/s Pro

GPT OSS 120B 451 tok/s Pro

Claude Sonnet 4.5 35 tok/s Pro

2000 character limit reached

CTC-Assisted LLM-Based Contextual ASR (2411.06437v1)

Published 10 Nov 2024 in eess.AS, cs.AI, and cs.CL

Abstract: Contextual ASR or hotword customization holds substantial practical value. Despite the impressive performance of current end-to-end (E2E) automatic speech recognition (ASR) systems, they often face challenges in accurately recognizing rare words. Typical E2E contextual ASR models commonly feature complex architectures and decoding mechanisms, limited in performance and susceptible to interference from distractor words. With LLM-based ASR models emerging as the new mainstream, we propose a CTC-Assisted LLM-Based Contextual ASR model with an efficient filtering algorithm. By using coarse CTC decoding results to filter potential relevant hotwords and incorporating them into LLM prompt input, our model attains WER/B-WER of 1.27%/3.67% and 2.72%/8.02% on the Librispeech test-clean and test-other sets targeting on recognizing rare long-tail words, demonstrating significant improvements compared to the baseline LLM-based ASR model, and substantially surpassing other related work. More remarkably, with the help of the LLM and proposed filtering algorithm, our contextual ASR model still performs well with 2000 biasing words.