- The paper presents a relevance reward model integrating human feedback with multi-objective loss to optimize product retrieval relevance.
- The paper employs typo-aware training and sophisticated negative sampling to robustly handle noisy data and misspelled queries.
- The paper demonstrates, via offline metrics and online A/B testing, an aggregate improvement up to 9.58% in recall, underscoring its practical impact.
Enhancing Relevance of Embedding-based Retrieval at Walmart
Introduction
This paper presents the enhancements made to Walmart's Embedding-based Retrieval (EBR) system designed to optimize the search functionality on Walmart's e-commerce platform. Traditional keyword-based search, while effective in certain aspects, often struggles to bridge the semantic gap between varied customer queries and product listings. The shift to embedding-based retrieval aims to overcome these limitations by representing queries and products as dense vectors within a dual-encoder architecture, facilitating a more semantic understanding of search queries.
However, despite initial gains in relevance through the adoption of EBR, Walmart encountered several instances of relevance degradation, primarily arising from noise within the training data and an inability to manage misspelled queries effectively. This paper explores multiple strategies proposed to resolve these issues and enhance relevance, subsequently improving customer shopping experience.
Methods
Several enhancements were introduced to the EBR system:
- Relevance Reward Model (RRM): A model based on human relevance feedback, leveraging a cross-encoder architecture, was developed. This model is employed to refine training data and derive relevance labels, introducing a multi-objective loss function during training to simultaneously optimize for relevance and customer engagement.
- Typo-Aware Training: Techniques to robustly handle misspelled queries were integrated into the training process, incorporating common typos in queries to enhance model resilience.
- Enhanced Negative Sampling: Improvements in offline negative sample generation were introduced. This includes a more sophisticated approach for ensuring relevance guardrails and dynamically generating semi-positive samples.
- New Labeling Scheme: A revised labeling strategy was proposed for better distinguishing product relevance based on comprehensive customer engagement metrics.
Performance Evaluation
The effectiveness of these methodologies was assessed through a series of offline and online evaluations:
- Offline Evaluation: Employing exact match metrics to determine the relevance and precision of retrieved products demonstrated significant improvements, particularly when utilizing the refined multi-objective loss function.
- Online A/B Testing: Deployment within a hybrid retrieval system on Walmart.com showed positive outcomes, with improvements in NDCG metrics and revenue lift corroborating the enhancements observed in relevance metrics.
Results
Table~\ref{table.rrm} illustrated that integrating the RRM to revise labels yielded an EM Recall@20 improvement of 2.57\%, while employing it in a multi-objective loss setting enhanced recall by 7.07%. The combination of both approaches resulted in an aggregate improvement of 9.58%. Furthermore, the implementation of typo-aware training and enhanced negative sampling strategies provided substantial relevance lifts—particularly in EM Recall and Precision metrics.
Future Implications
These enhancements have practical implications for scaling semantic search capabilities in large-scale e-commerce platforms. The integration of human-annotated feedback into embedding-based systems represents a significant step towards achieving higher relevance and precision in product retrieval. The methodologies proposed, notably the novel approach to multi-objective optimization, hold potential for adaptation across various industries reliant on precise information retrieval systems.
Moving forward, further research could delve into refining the relevance reward mechanisms and extending the typo tolerance capabilities of the EBR model. Additionally, exploring more intricate methods of synthesizing engagement data with semantic relevance could provide deeper insights into customer behaviors and preferences.
In summary, this paper demonstrates Walmart's commitment to advancing their search retrieval systems by addressing core challenges within their EBR implementation, achieving substantial gains in relevance and overall search performance on their platform.