HMM-Free Encoder Pre-Training for Streaming RNN Transducer (2104.10764v2)

Published 2 Apr 2021 in eess.AS, cs.CL, and cs.SD

Abstract: This work describes an encoder pre-training procedure using frame-wise label to improve the training of streaming recurrent neural network transducer (RNN-T) model. Streaming RNN-T trained from scratch usually performs worse than non-streaming RNN-T. Although it is common to address this issue through pre-training components of RNN-T with other criteria or frame-wise alignment guidance, the alignment is not easily available in end-to-end manner. In this work, frame-wise alignment, used to pre-train streaming RNN-T's encoder, is generated without using a HMM-based system. Therefore an all-neural framework equipping HMM-free encoder pre-training is constructed. This is achieved by expanding the spikes of CTC model to their left/right blank frames, and two expanding strategies are proposed. To our best knowledge, this is the first work to simulate HMM-based frame-wise label using CTC model for pre-training. Experiments conducted on LibriSpeech and MLS English tasks show the proposed pre-training procedure, compared with random initialization, reduces the WER by relatively 5%~11% and the emission latency by 60 ms. Besides, the method is lexicon-free, so it is friendly to new languages without manually designed lexicon.

Authors (7)

Lu Huang (30 papers)
Jingyu Sun (8 papers)
Yufeng Tang (4 papers)
Junfeng Hou (6 papers)
Jinkun Chen (9 papers)
Jun Zhang (1008 papers)
Zejun Ma (78 papers)

Citations (3)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

HMM-Free Encoder Pre-Training for Streaming RNN Transducer (2104.10764v2)

Summary

Related Papers