Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HMM-Free Encoder Pre-Training for Streaming RNN Transducer (2104.10764v2)

Published 2 Apr 2021 in eess.AS, cs.CL, and cs.SD

Abstract: This work describes an encoder pre-training procedure using frame-wise label to improve the training of streaming recurrent neural network transducer (RNN-T) model. Streaming RNN-T trained from scratch usually performs worse than non-streaming RNN-T. Although it is common to address this issue through pre-training components of RNN-T with other criteria or frame-wise alignment guidance, the alignment is not easily available in end-to-end manner. In this work, frame-wise alignment, used to pre-train streaming RNN-T's encoder, is generated without using a HMM-based system. Therefore an all-neural framework equipping HMM-free encoder pre-training is constructed. This is achieved by expanding the spikes of CTC model to their left/right blank frames, and two expanding strategies are proposed. To our best knowledge, this is the first work to simulate HMM-based frame-wise label using CTC model for pre-training. Experiments conducted on LibriSpeech and MLS English tasks show the proposed pre-training procedure, compared with random initialization, reduces the WER by relatively 5%~11% and the emission latency by 60 ms. Besides, the method is lexicon-free, so it is friendly to new languages without manually designed lexicon.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Lu Huang (30 papers)
  2. Jingyu Sun (8 papers)
  3. Yufeng Tang (4 papers)
  4. Junfeng Hou (6 papers)
  5. Jinkun Chen (9 papers)
  6. Jun Zhang (1008 papers)
  7. Zejun Ma (78 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.