Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OmniSearchSage: Multi-Task Multi-Entity Embeddings for Pinterest Search (2404.16260v1)

Published 25 Apr 2024 in cs.IR, cs.AI, and cs.LG

Abstract: In this paper, we present OmniSearchSage, a versatile and scalable system for understanding search queries, pins, and products for Pinterest search. We jointly learn a unified query embedding coupled with pin and product embeddings, leading to an improvement of $>8\%$ relevance, $>7\%$ engagement, and $>5\%$ ads CTR in Pinterest's production search system. The main contributors to these gains are improved content understanding, better multi-task learning, and real-time serving. We enrich our entity representations using diverse text derived from image captions from a generative LLM, historical engagement, and user-curated boards. Our multitask learning setup produces a single search query embedding in the same space as pin and product embeddings and compatible with pre-existing pin and product embeddings. We show the value of each feature through ablation studies, and show the effectiveness of a unified model compared to standalone counterparts. Finally, we share how these embeddings have been deployed across the Pinterest search stack, from retrieval to ranking, scaling to serve $300k$ requests per second at low latency. Our implementation of this work is available at https://github.com/pinterest/atg-research/tree/main/omnisearchsage.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Prabhat Agarwal (9 papers)
  2. Minhazul Islam Sk (1 paper)
  3. Nikil Pancha (6 papers)
  4. Kurchi Subhra Hazra (2 papers)
  5. Jiajing Xu (11 papers)
  6. Chuck Rosenberg (3 papers)
Citations (1)

Summary

Multi-Task Multi-Entity Embeddings Enhance Pinterest Search Performance

Introduction to the Study

The adoption of embeddings in search systems is pivotal for enhancing the user experience by enabling nuanced content understanding and retrieval. In a research spearheaded by Prabhat Agarwal and his cohort from Pinterest, a new architecture termed OmniSearchSage has been developed. This system leverages multi-task learning to jointly optimize query, pin, and product embeddings in a unified framework. The paper manifests significant enhancements in Pinterest's search capabilities, showing more than 8% relevance improvement, over 7% in user engagement, and an increase in ads click-through rate (CTR) by over 5%.

Embedding Techniques and Innovations

  • Embedding Integration: OmniSearchSage integrates pin and product embeddings with query embeddings, effectively placing these entities within the same vector space. This integration facilitates improved retrieval and ranking in Pinterest's search engine.
  • Entity Enrichment: The paper leverages diverse texts from image captions generated by a generative LLM, historical engagement data, and user-curated boards to enrich pin and product representations significantly.

Advanced Techniques Deployed

  • Compatibility with Pre-existing Embeddings: The system is trained not only to accommodate new query embeddings but also to ensure compatibility with previous embeddings through the introduction of specifically tuned compatibility encoders.
  • Multi-Task Learning: By employing multi-task learning strategies, the model simultaneously learns embeddings for multiple entities (pins, products) and tasks (query to pin, query to product retrieval), which has demonstrated improved efficiency and performance.

Practical Implementation and Results

The integration of OmniSearchSage within Pinterest's existing infrastructure illustrates how scalable and efficient the system is, handling around 300k requests per second at notably low latency.

Deployment Across Pinterest’s Search Stack

  • Retrieval and Ranking: The embeddings are crucial for both retrieval and ranking phases of the search process, significantly enhancing the accuracy and relevance of the search results.
  • Multi-Stage Ranking Models: Serving as a key feature in multi-stage ranking models, these embeddings help in understanding the nuanced user queries and aligning them with the most relevant content quickly and accurately.

Evaluation and Metrics

Extensive offline experiments combined with A/B testing on the live system provided a dual validation approach, confirming the superior performance of OmniSearchSage. The system was tested for relevance, engagement, and ads CTR improvements, with each metric showing tangible gains.

Key Results

  1. \textbf{Relevance Improvement}: There was a marked improvement in content relevance across the board, which suggests that the embeddings effectively capture and match user intent.
  2. \textbf{Engagement Uplift}: Engagement metrics indicated that users interacted more with the search results, likely due to better-matched content suggestions.
  3. \textbf{Increased Ads CTR}: The improvements in ads CTR suggest that ads are also benefiting from better targeting and relevance, enhancing overall user experience and advertiser ROI.

Theoretical Implications and Future Directions

This research illuminates the path for future improvements in embedding technologies for search systems, especially in how diverse data sources can be integrated to enhance the model's understanding of queries and content. The successful deployment of OmniSearchSage sets a precedent for future research focused on multi-task and multi-entity embedding systems.

Further explorations could focus on even more granulated multi-task learning frameworks, deeper integration with machine learning pipelines, and expanding the embedding capabilities to include more varied content types and richer media. The substantial improvements observed in this paper underscore the potential of advanced embedding techniques in transforming search system landscapes, making them more intuitive, helpful, and engaging for users.