Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 91 tok/s

Gemini 2.5 Pro 49 tok/s Pro

GPT-5 Medium 26 tok/s Pro

GPT-5 High 24 tok/s Pro

GPT-4o 95 tok/s Pro

Kimi K2 209 tok/s Pro

GPT OSS 120B 458 tok/s Pro

Claude Sonnet 4 37 tok/s Pro

2000 character limit reached

Semantic Retrieval at Walmart (2412.04637v1)

Published 5 Dec 2024 in cs.IR, cs.AI, and cs.LG

Abstract: In product search, the retrieval of candidate products before re-ranking is more critical and challenging than other search like web search, especially for tail queries, which have a complex and specific search intent. In this paper, we present a hybrid system for e-commerce search deployed at Walmart that combines traditional inverted index and embedding-based neural retrieval to better answer user tail queries. Our system significantly improved the relevance of the search engine, measured by both offline and online evaluations. The improvements were achieved through a combination of different approaches. We present a new technique to train the neural model at scale. and describe how the system was deployed in production with little impact on response time. We highlight multiple learnings and practical tricks that were used in the deployment of this system.

Citations (16)

View on Semantic Scholar

Collections

Summary

The paper introduces a hybrid retrieval system that fuses traditional inverted indexes with neural retrieval models to improve tail query search.
It develops advanced negative sampling techniques, using in-batch and offline hard negatives to enhance model training.
It demonstrates that transformer-based models like a 6-layer DistilBERT significantly boost recall while reducing computational overhead.

Semantic Retrieval at Walmart: An Expert Overview

The research paper presents Walmart's implementation of a hybrid semantic retrieval system for e-commerce search, focusing on optimizing product search, particularly for tail queries. It combines traditional inverted index approaches with embedding-based neural retrieval to enhance search relevance. This paper details the system's deployment at Walmart and highlights the innovations and learnings gleaned during its implementation.

The system is designed to tackle unique challenges in product searches, such as vocabulary mismatches, where synonym and hypernym search terms hinder accurate retrieval. Traditional web search models fall short due to the succinct nature of product listings, prompting a shift toward advanced methods. The proposed hybrid architecture leverages conventional text-match mechanisms while incorporating neural retrieval to mitigate these limitations. The paper underscores the development of a new technique for training neural models at scale, ensuring minimal response time impact during deployment.

Key Contributions

Hybrid System Architecture: The paper introduces a hybrid retrieval system integrating both inverted index and neural retrieval, specifically catering to high-traffic tail queries. This dual-model approach captures the benefits of traditional recall for rare tokens while using neural retrieval to address vocabulary gaps and synonym handling.
Negative Sampling in Training: The paper outlines novel techniques for selecting negative examples in training large neural retrieval models, enhancing model performance. The strategies include in-batch hard negatives, offline hard negatives, and advanced query-product negative selection methods like PT match and token match filters.
Model Complexity vs. Performance: Extensive experimentation revealed the balance between model complexity and retrieval efficacy. BERT-based models, especially a 6-layer DistilBERT architecture, demonstrated superior performance over simpler Bag-of-Words models, showing a significant recall improvement. The methodology also explored embedding dimension reduction strategies, evidently maintaining performance with reduced computational overhead.

Implications and Future Work

The implications of this research are both practical and theoretical within the field of e-commerce search systems. Practically, it demonstrates a viable path toward integrating complex semantic understanding in high-traffic environments like Walmart.com. Theoretically, the research contributes valuable insights into neural retrieval and hybrid search systems, especially regarding the efficacy of transformer-based architectures in real-world applications.

Future developments in this area may involve further optimizations in neural network architectures, potential integrations of more advanced transformer models, and exploration of unsupervised learning methods to enhance query understanding. Additionally, there is potential to fine-tune negative sample selection further and even explore dynamic query embeddings to improve adaptive learning capabilities in response to evolving user search behaviors and product catalog changes.

This paper remains a significant contribution to the ongoing advancement of information retrieval systems in e-commerce, offering a robust framework that other large-scale online retailers may consider adopting.

PDF Markdown

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

Authors (11)

Tweets

https://twitter.com/_reachsumit/status/1865959172971680086

https://twitter.com/victorialslocum/status/1899415003754565973

https://twitter.com/rohanpaul_ai/status/1867000742906917066

https://twitter.com/arxivsanitybot/status/1866115944629620854

HackerNews

Semantic Retrieval at Walmart (2 points, 1 comment)