Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Empirical Study of Language Model Integration for Transducer based Speech Recognition (2203.16776v4)

Published 31 Mar 2022 in eess.AS, cs.CL, and cs.LG

Abstract: Utilizing text-only data with an external LLM (ELM) in end-to-end RNN-Transducer (RNN-T) for speech recognition is challenging. Recently, a class of methods such as density ratio (DR) and internal LLM estimation (ILME) have been developed, outperforming the classic shallow fusion (SF) method. The basic idea behind these methods is that RNN-T posterior should first subtract the implicitly learned internal LLM (ILM) prior, in order to integrate the ELM. While recent studies suggest that RNN-T only learns some low-order LLM information, the DR method uses a well-trained neural LLM with full context, which may be inappropriate for the estimation of ILM and deteriorate the integration performance. Based on the DR method, we propose a low-order density ratio method (LODR) by replacing the estimation with a low-order weak LLM. Extensive empirical experiments are conducted on both in-domain and cross-domain scenarios on English LibriSpeech & Tedlium-2 and Chinese WenetSpeech & AISHELL-1 datasets. It is shown that LODR consistently outperforms SF in all tasks, while performing generally close to ILME and better than DR in most tests.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Huahuan Zheng (6 papers)
  2. Keyu An (18 papers)
  3. Zhijian Ou (58 papers)
  4. Chen Huang (88 papers)
  5. Ke Ding (30 papers)
  6. Guanglu Wan (24 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.