Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

VQ-T: RNN Transducers using Vector-Quantized Prediction Network States (2208.01818v1)

Published 3 Aug 2022 in cs.SD, cs.CL, and eess.AS

Abstract: Beam search, which is the dominant ASR decoding algorithm for end-to-end models, generates tree-structured hypotheses. However, recent studies have shown that decoding with hypothesis merging can achieve a more efficient search with comparable or better performance. But, the full context in recurrent networks is not compatible with hypothesis merging. We propose to use vector-quantized long short-term memory units (VQ-LSTM) in the prediction network of RNN transducers. By training the discrete representation jointly with the ASR network, hypotheses can be actively merged for lattice generation. Our experiments on the Switchboard corpus show that the proposed VQ RNN transducers improve ASR performance over transducers with regular prediction networks while also producing denser lattices with a very low oracle word error rate (WER) for the same beam size. Additional LLM rescoring experiments also demonstrate the effectiveness of the proposed lattice generation scheme.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Jiatong Shi (82 papers)
  2. George Saon (39 papers)
  3. David Haws (16 papers)
  4. Shinji Watanabe (416 papers)
  5. Brian Kingsbury (54 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.