Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Shopping Queries Dataset: A Large-Scale ESCI Benchmark for Improving Product Search (2206.06588v1)

Published 14 Jun 2022 in cs.IR and cs.LG

Abstract: Improving the quality of search results can significantly enhance users experience and engagement with search engines. In spite of several recent advancements in the fields of machine learning and data mining, correctly classifying items for a particular user search query has been a long-standing challenge, which still has a large room for improvement. This paper introduces the "Shopping Queries Dataset", a large dataset of difficult Amazon search queries and results, publicly released with the aim of fostering research in improving the quality of search results. The dataset contains around 130 thousand unique queries and 2.6 million manually labeled (query,product) relevance judgements. The dataset is multilingual with queries in English, Japanese, and Spanish. The Shopping Queries Dataset is being used in one of the KDDCup'22 challenges. In this paper, we describe the dataset and present three evaluation tasks along with baseline results: (i) ranking the results list, (ii) classifying product results into relevance categories, and (iii) identifying substitute products for a given query. We anticipate that this data will become the gold standard for future research in the topic of product search.

An Overview of the "Shopping Queries Dataset: A Large-Scale ESCI Benchmark for Improving Product Search"

This essay examines a dataset introduced to advance research in product search, focusing on semantic matching between customer queries and product listings. The "Shopping Queries Dataset" provides a substantial resource composed of approximately 130,000 unique shopping queries linked to 2.6 million manually annotated product relevance judgments. Queries are obtained from diverse markets, including English, Japanese, and Spanish, which allow for multilingual research applicability.

The dataset is structured around a novel classification strategy, which categorizes query-product matches into four distinct classes: Exact (E), Substitute (S), Complement (C), and Irrelevant (I). This fine-grained relevance labeling contrasts with traditional binary relevance models. Such granularity acknowledges the nuanced nature of e-commerce searches, where products may fulfill query criteria to varying degrees (e.g., a charger for an "iPhone" query).

Dataset Characteristics and Evaluation Tasks

The authors highlight several key properties of the dataset that make it a valuable resource for ML research:

  1. Real-world Relevance: This is derived from actual customer queries, providing rich data that reflects genuine user intent and e-commerce behavior.
  2. Broad and Deep: Offers both breadth and depth, distinguishing it from existing large collections that typically emphasize one over the other.
  3. Multi-valued Labels: Queries incorporate detailed ESCI labels, allowing for the exploration of more sophisticated search relevance algorithms.
  4. Diverse Query Selection: The dataset is curated to focus on challenging queries, promoting the development of more robust search technologies.
  5. Multilingual Data: The inclusion of English, Spanish, and Japanese-speaking markets ensures wider applicability.

The dataset supports three primary tasks:

  • Query-Product Ranking: Optimizing the order of product listings to enhance relevance to user queries.
  • Multiclass Product Classification: Categorizing products into the ESCI relevance framework.
  • Product Substitute Identification: Identifying alternative products that can serve the same function as the initially queried item.

Research Implications and Methodology

This dataset introduces significant implications for research both in terms of practical applications and underlying theory. Its deployment within scenarios like mobile and voice search applications can enhance user experiences by providing more directly relevant results, improving commercial metrics like click-through rates and sales conversions.

From a theoretical perspective, the benchmark calls for revisiting traditional models of search relevance, moving beyond binary to a system that appreciates nuanced user expectations. Researchers could leverage state-of-the-art natural language processing models, refining techniques in semantic similarity, ranking, and text classification.

In the experiments conducted, baseline models employing well-known ML architectures, such as BERT and MPNet, demonstrate the potential within the dataset, though there remains notable scope for performance enhancements. Notably, the results reveal variability in performance across language locales, suggesting that future endeavors might better align models with language-specific consumer behavior patterns.

Conclusion: Future Prospects

The introduction of the Shopping Queries Dataset establishes a new standard for e-commerce search relevance research. By providing a comprehensive repository replete with challenging scenarios and multilingual data, this benchmark invites future innovation geared towards more responsive search systems. Given the public availability of this dataset alongside baseline codes, it promises to invigorate academic and industry efforts to craft more nuanced understanding and algorithms dedicated to product search tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Chandan K. Reddy (64 papers)
  2. Lluís Màrquez (31 papers)
  3. Fran Valero (1 paper)
  4. Nikhil Rao (34 papers)
  5. Hugo Zaragoza (1 paper)
  6. Sambaran Bandyopadhyay (20 papers)
  7. Arnab Biswas (1 paper)
  8. Anlu Xing (1 paper)
  9. Karthik Subbian (28 papers)
Citations (41)