2000 character limit reached
Beyond Lexical: A Semantic Retrieval Framework for Textual SearchEngine (2008.03917v1)
Published 10 Aug 2020 in cs.IR
Abstract: Search engine has become a fundamental component in various web and mobile applications. Retrieving relevant documents from the massive datasets is challenging for a search engine system, especially when faced with verbose or tail queries. In this paper, we explore a vector space search framework for document retrieval. Specifically, we trained a deep semantic matching model so that each query and document can be encoded as a low dimensional embedding. Our model was trained based on BERT architecture. We deployed a fast k-nearest-neighbor index service for online serving. Both offline and online metrics demonstrate that our method improved retrieval performance and search quality considerably, particularly for tail
- Kuan Fang (30 papers)
- Long Zhao (64 papers)
- Zhan Shen (2 papers)
- RiKang Zhour (1 paper)
- LiWen Fan (2 papers)
- Ruixing Wang (7 papers)