Embedding-based Retrieval in Facebook Search (2006.11632v2)

Published 20 Jun 2020 in cs.IR

Abstract: Search in social networks such as Facebook poses different challenges than in classical web search: besides the query text, it is important to take into account the searcher's context to provide relevant results. Their social graph is an integral part of this context and is a unique aspect of Facebook search. While embedding-based retrieval (EBR) has been applied in eb search engines for years, Facebook search was still mainly based on a Boolean matching model. In this paper, we discuss the techniques for applying EBR to a Facebook Search system. We introduce the unified embedding framework developed to model semantic embeddings for personalized search, and the system to serve embedding-based retrieval in a typical search system based on an inverted index. We discuss various tricks and experiences on end-to-end optimization of the whole system, including ANN parameter tuning and full-stack optimization. Finally, we present our progress on two selected advanced topics about modeling. We evaluated EBR on verticals for Facebook Search with significant metrics gains observed in online A/B experiments. We believe this paper will provide useful insights and experiences to help people on developing embedding-based retrieval systems in search engines.

PDF Abstract

Embedding-based Retrieval in Facebook Search: A Formal Overview

The paper "Embedding-based Retrieval in Facebook Search" presents an intricate paper on transforming Facebook Search mechanisms from a traditional Boolean matching model to an embedding-based retrieval (EBR) system. By introducing semantic embeddings, the authors aim to capture the complex dynamics of personalized social search, where query intent extends beyond mere textual representation to incorporate user and context information.

Technical Contributions

The authors introduce a unified embedding framework designed to encapsulate multi-faceted inputs such as the searcher’s context, query text, and social connections. This approach departs from standard text-based embeddings and tackles the inherent complexities associated with social search systems.

Key developments include:

Unified Embedding Model: A novel architecture that encodes text, user, and context into dense vectors, accommodating the multifactorial nature of Facebook queries.
System Integration: Implementation of EBR on top of an inverted index structure, utilizing the Faiss library for embedding quantization, thus enabling hybrid retrieval that marries both embedding KNN and Boolean matching approaches.
Hard Mining Techniques: The paper explores hard mining strategies for negative and positive sample selection to improve retrieval model performance. This includes both offline and online methods to maintain a balance between random and hard negatives.
Ensemble and Cascade Models: Employing ensemble techniques through weighted concatenation or cascade models to optimize retrieval performance by leveraging varied hardness levels of embeddings.

Experimental Outcomes

The paper reports significant metric improvements in retrieval quality, evidenced by offline and A/B experiments across multiple Facebook search verticals. Specific results highlighted include:

A recall improvement of up to 18% for events and 16% for groups when transitioning from text to unified embeddings.
Incremental feature engineering advances, such as the integration of location and social embedding features, further boosting recall performance.

System Optimization and Evaluation

Practical optimization involved ANN parameter tuning to effectively balance retrieval accuracy against system performance. The authors describe a detailed approach to query selection and index management aligned with embedding vectors' storage and computational efficiency, ensuring rapid and precise search responses.

The integration of EBR necessitated subsequent adaptation of ranking layers—transforming the hierarchical conversion of results from retrieval to ranking with enhanced precision using embedding-derived features.

Implications and Future Directions

Embedding-based retrieval, as demonstrated in this work, represents a pivotal shift for search technology within social networks. While the current implementation effectively addresses the semantic matching deficit, the paper identifies additional horizons for exploration:

Advanced Modeling: Future research may delve into state-of-the-art models like BERT or specialized task models to enhance semantic understanding further.
Universal Embedding Models: The potential for cross-application of text embeddings into versatile query models is presented as a promising area for development.

This paper thus provides a comprehensive approach to embedding-based search in large-scale social networks, highlighting the promising avenues offered by deep learning in achieving personalized, context-aware information retrieval.