L2RS: A Learning-to-Rescore Mechanism for Automatic Speech Recognition (1910.11496v1)
Abstract: Modern Automatic Speech Recognition (ASR) systems primarily rely on scores from an Acoustic Model (AM) and a LLM (LM) to rescore the N-best lists. With the abundance of recent natural language processing advances, the information utilized by current ASR for evaluating the linguistic and semantic legitimacy of the N-best hypotheses is rather limited. In this paper, we propose a novel Learning-to-Rescore (L2RS) mechanism, which is specialized for utilizing a wide range of textual information from the state-of-the-art NLP models and automatically deciding their weights to rescore the N-best lists for ASR systems. Specifically, we incorporate features including BERT sentence embedding, topic vector, and perplexity scores produced by n-gram LM, topic modeling LM, BERT LM and RNNLM to train a rescoring model. We conduct extensive experiments based on a public dataset, and experimental results show that L2RS outperforms not only traditional rescoring methods but also its deep neural network counterparts by a substantial improvement of 20.67% in terms of NDCG@10. L2RS paves the way for developing more effective rescoring models for ASR.
- Yuanfeng Song (27 papers)
- Di Jiang (42 papers)
- Xuefang Zhao (4 papers)
- Qian Xu (55 papers)
- Raymond Chi-Wing Wong (29 papers)
- Lixin Fan (77 papers)
- Qiang Yang (202 papers)