Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Enhancing Relevance of Embedding-based Retrieval at Walmart (2408.04884v2)

Published 9 Aug 2024 in cs.IR

Abstract: Embedding-based neural retrieval (EBR) is an effective search retrieval method in product search for tackling the vocabulary gap between customer search queries and products. The initial launch of our EBR system at Walmart yielded significant gains in relevance and add-to-cart rates [1]. However, despite EBR generally retrieving more relevant products for reranking, we have observed numerous instances of relevance degradation. Enhancing retrieval performance is crucial, as it directly influences product reranking and affects the customer shopping experience. Factors contributing to these degradations include false positives/negatives in the training data and the inability to handle query misspellings. To address these issues, we present several approaches to further strengthen the capabilities of our EBR model in terms of retrieval relevance. We introduce a Relevance Reward Model (RRM) based on human relevance feedback. We utilize RRM to remove noise from the training data and distill it into our EBR model through a multi-objective loss. In addition, we present the techniques to increase the performance of our EBR model, such as typo-aware training, and semi-positive generation. The effectiveness of our EBR is demonstrated through offline relevance evaluation, online AB tests, and successful deployments to live production. [1] Alessandro Magnani, Feng Liu, Suthee Chaidaroon, Sachin Yadav, Praveen Reddy Suram, Ajit Puthenputhussery, Sijie Chen, Min Xie, Anirudh Kashi, Tony Lee, et al. 2022. Semantic retrieval at walmart. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3495-3503.

Citations (1)

Summary

  • The paper presents a relevance reward model integrating human feedback with multi-objective loss to optimize product retrieval relevance.
  • The paper employs typo-aware training and sophisticated negative sampling to robustly handle noisy data and misspelled queries.
  • The paper demonstrates, via offline metrics and online A/B testing, an aggregate improvement up to 9.58% in recall, underscoring its practical impact.

Enhancing Relevance of Embedding-based Retrieval at Walmart

Introduction

This paper presents the enhancements made to Walmart's Embedding-based Retrieval (EBR) system designed to optimize the search functionality on Walmart's e-commerce platform. Traditional keyword-based search, while effective in certain aspects, often struggles to bridge the semantic gap between varied customer queries and product listings. The shift to embedding-based retrieval aims to overcome these limitations by representing queries and products as dense vectors within a dual-encoder architecture, facilitating a more semantic understanding of search queries.

However, despite initial gains in relevance through the adoption of EBR, Walmart encountered several instances of relevance degradation, primarily arising from noise within the training data and an inability to manage misspelled queries effectively. This paper explores multiple strategies proposed to resolve these issues and enhance relevance, subsequently improving customer shopping experience.

Methods

Several enhancements were introduced to the EBR system:

  1. Relevance Reward Model (RRM): A model based on human relevance feedback, leveraging a cross-encoder architecture, was developed. This model is employed to refine training data and derive relevance labels, introducing a multi-objective loss function during training to simultaneously optimize for relevance and customer engagement.
  2. Typo-Aware Training: Techniques to robustly handle misspelled queries were integrated into the training process, incorporating common typos in queries to enhance model resilience.
  3. Enhanced Negative Sampling: Improvements in offline negative sample generation were introduced. This includes a more sophisticated approach for ensuring relevance guardrails and dynamically generating semi-positive samples.
  4. New Labeling Scheme: A revised labeling strategy was proposed for better distinguishing product relevance based on comprehensive customer engagement metrics.

Performance Evaluation

The effectiveness of these methodologies was assessed through a series of offline and online evaluations:

  • Offline Evaluation: Employing exact match metrics to determine the relevance and precision of retrieved products demonstrated significant improvements, particularly when utilizing the refined multi-objective loss function.
  • Online A/B Testing: Deployment within a hybrid retrieval system on Walmart.com showed positive outcomes, with improvements in NDCG metrics and revenue lift corroborating the enhancements observed in relevance metrics.

Results

Table~\ref{table.rrm} illustrated that integrating the RRM to revise labels yielded an EM Recall@20 improvement of 2.57\%, while employing it in a multi-objective loss setting enhanced recall by 7.07%. The combination of both approaches resulted in an aggregate improvement of 9.58%. Furthermore, the implementation of typo-aware training and enhanced negative sampling strategies provided substantial relevance lifts—particularly in EM Recall and Precision metrics.

Future Implications

These enhancements have practical implications for scaling semantic search capabilities in large-scale e-commerce platforms. The integration of human-annotated feedback into embedding-based systems represents a significant step towards achieving higher relevance and precision in product retrieval. The methodologies proposed, notably the novel approach to multi-objective optimization, hold potential for adaptation across various industries reliant on precise information retrieval systems.

Moving forward, further research could delve into refining the relevance reward mechanisms and extending the typo tolerance capabilities of the EBR model. Additionally, exploring more intricate methods of synthesizing engagement data with semantic relevance could provide deeper insights into customer behaviors and preferences.

In summary, this paper demonstrates Walmart's commitment to advancing their search retrieval systems by addressing core challenges within their EBR implementation, achieving substantial gains in relevance and overall search performance on their platform.

X Twitter Logo Streamline Icon: https://streamlinehq.com