Pre-Training for Query Rewriting in A Spoken Language Understanding System (2002.05607v1)

Published 13 Feb 2020 in cs.CL and cs.IR

Abstract: Query rewriting (QR) is an increasingly important technique to reduce customer friction caused by errors in a spoken language understanding pipeline, where the errors originate from various sources such as speech recognition errors, language understanding errors or entity resolution errors. In this work, we first propose a neural-retrieval based approach for query rewriting. Then, inspired by the wide success of pre-trained contextual language embeddings, and also as a way to compensate for insufficient QR training data, we propose a language-modeling (LM) based approach to pre-train query embeddings on historical user conversation data with a voice assistant. In addition, we propose to use the NLU hypotheses generated by the language understanding system to augment the pre-training. Our experiments show pre-training provides rich prior information and help the QR task achieve strong performance. We also show joint pre-training with NLU hypotheses has further benefit. Finally, after pre-training, we find a small set of rewrite pairs is enough to fine-tune the QR model to outperform a strong baseline by full training on all QR training data.

Authors (5)

Zheng Chen (221 papers)
Xing Fan (42 papers)
Yuan Ling (7 papers)
Lambert Mathias (19 papers)
Chenlei Guo (17 papers)

Citations (23)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Pre-Training for Query Rewriting in A Spoken Language Understanding System (2002.05607v1)

Summary

Related Papers