Pre-trained Language Model for Web-scale Retrieval in Baidu Search

Published 7 Jun 2021 in cs.IR | (2106.03373v4)

Abstract: Retrieval is a crucial stage in web search that identifies a small set of query-relevant candidates from a billion-scale corpus. Discovering more semantically-related candidates in the retrieval stage is very promising to expose more high-quality results to the end users. However, it still remains non-trivial challenges of building and deploying effective retrieval models for semantic matching in real search engine. In this paper, we describe the retrieval system that we developed and deployed in Baidu Search. The system exploits the recent state-of-the-art Chinese pretrained LLM, namely Enhanced Representation through kNowledge IntEgration (ERNIE), which facilitates the system with expressive semantic matching. In particular, we developed an ERNIE-based retrieval model, which is equipped with 1) expressive Transformer-based semantic encoders, and 2) a comprehensive multi-stage training paradigm. More importantly, we present a practical system workflow for deploying the model in web-scale retrieval. Eventually, the system is fully deployed into production, where rigorous offline and online experiments were conducted. The results show that the system can perform high-quality candidate retrieval, especially for those tail queries with uncommon demands. Overall, the new retrieval system facilitated by pretrained LLM (i.e., ERNIE) can largely improve the usability and applicability of our search engine.

Abstract PDF Upgrade to Chat

Citations (69)

View on Semantic Scholar

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

We haven't generated follow-up questions for this paper yet.

Generate Now

Pre-trained Language Model for Web-scale Retrieval in Baidu Search

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (10)

Collections

Pre-trained Language Model for Web-scale Retrieval in Baidu Search

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (10)

Collections