Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep Shallow Fusion for RNN-T Personalization (2011.07754v1)

Published 16 Nov 2020 in cs.CL and eess.AS

Abstract: End-to-end models in general, and Recurrent Neural Network Transducer (RNN-T) in particular, have gained significant traction in the automatic speech recognition community in the last few years due to their simplicity, compactness, and excellent performance on generic transcription tasks. However, these models are more challenging to personalize compared to traditional hybrid systems due to the lack of external LLMs and difficulties in recognizing rare long-tail words, specifically entity names. In this work, we present novel techniques to improve RNN-T's ability to model rare WordPieces, infuse extra information into the encoder, enable the use of alternative graphemic pronunciations, and perform deep fusion with personalized LLMs for more robust biasing. We show that these combined techniques result in 15.4%-34.5% relative Word Error Rate improvement compared to a strong RNN-T baseline which uses shallow fusion and text-to-speech augmentation. Our work helps push the boundary of RNN-T personalization and close the gap with hybrid systems on use cases where biasing and entity recognition are crucial.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Duc Le (46 papers)
  2. Gil Keren (22 papers)
  3. Julian Chan (11 papers)
  4. Jay Mahadeokar (36 papers)
  5. Christian Fuegen (36 papers)
  6. Michael L. Seltzer (34 papers)
Citations (76)

Summary

We haven't generated a summary for this paper yet.