Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SIGIR 2021 E-Commerce Workshop Data Challenge (2104.09423v4)

Published 19 Apr 2021 in cs.IR

Abstract: The 2021 SIGIR workshop on eCommerce is hosting the Coveo Data Challenge for "In-session prediction for purchase intent and recommendations". The challenge addresses the growing need for reliable predictions within the boundaries of a shopping session, as customer intentions can be different depending on the occasion. The need for efficient procedures for personalization is even clearer if we consider the e-commerce landscape more broadly: outside of giant digital retailers, the constraints of the problem are stricter, due to smaller user bases and the realization that most users are not frequently returning customers. We release a new session-based dataset including more than 30M fine-grained browsing events (product detail, add, purchase), enriched by linguistic behavior (queries made by shoppers, with items clicked and items not clicked after the query) and catalog meta-data (images, text, pricing information). On this dataset, we ask participants to showcase innovative solutions for two open problems: a recommendation task (where a model is shown some events at the start of a session, and it is asked to predict future product interactions); an intent prediction task, where a model is shown a session containing an add-to-cart event, and it is asked to predict whether the item will be bought before the end of the session.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Jacopo Tagliabue (34 papers)
  2. Ciro Greco (19 papers)
  3. Jean-Francis Roy (5 papers)
  4. Bingqing Yu (11 papers)
  5. Patrick John Chia (9 papers)
  6. Federico Bianchi (47 papers)
  7. Giovanni Cassani (3 papers)
Citations (18)

Summary

An Evaluation of the SIGIR 2021 E-Commerce Workshop Data Challenge

The SIGIR 2021 E-Commerce Workshop organized by Coveo Labs introduced a competitive research environment aimed at advancing the capabilities of machine learning models, particularly in the context of e-commerce. The burgeoning e-commerce industry, estimated to grow to a $4.5 trillion market, provides fertile ground for research in Information Retrieval (IR), NLP, and recommendation systems. This event specifically addressed personalization challenges within e-commerce, a critical task given the increasingly dynamic customer journey and changing user intentions throughout sessions.

Data Challenge Overview

The workshop presented the Coveo Data Challenge, which offered researchers access to a new session-based dataset, boasting over 10 million product interactions. The dataset simulates the heterogeneous nature of mid-sized e-commerce platforms, thus offering a realistic scenario representative of the broader industry spectrum. Participants were tasked with solving two key problems:

  1. Session-Based Recommendation Task: Participants were required to predict future product interactions within a session, leveraging past interactions and search queries.
  2. Cart-Abandonment Prediction Task: This involved predicting whether a shopper will complete a purchase after an add-to-cart event, focusing on how abandonment prediction evolves with additional user interactions.

Dataset and Methodological Nuances

The dataset comprises browsing interactions, search interactions, and product metadata, facilitating multifaceted investigations into user behavior. Interaction sessions were finely granular, providing rich temporal data for modeling transactional and interactive behaviors. This sets the stage for rigorous session-based modeling, representing a step forward from traditional datasets, which may lack session integrity or context granularity.

The competition provided a structured evaluation protocol. For recommendations, evaluation relied on metrics such as Mean Reciprocal Rank (MRR) and F1 score, tailored for sequential prediction. For the cart-abandonment task, a weighted micro F1 score was deployed across different session stages to emphasize earlier prediction accuracy.

Implications and Future Directions

The SIGIR 2021 workshop's dataset and challenges carry significant implications. Practically, the dataset equips medium-sized retailers with the tools to improve conversion rates through more refined session-based recommendations and cart abandonment models, a substantial contributor to revenue. Theoretically, it affords researchers a medium to evaluate advanced neural networks within realistic e-commerce contexts, beyond traditional benchmarks and datasets that lack broad applicability or entail significant assumptions.

The focus on neural session-based models and intent prediction echoes the need for adaptive and fast-learning systems capable of operating under dynamic user interaction scenarios. This stress on speed and adaptive capacity is especially pertinent in non-recurring shopper contexts where repeated interactions across user sessions are limited.

Conclusion

The SIGIR 2021 E-Commerce Workshop, through its data challenge, has successfully posed critical questions and offered methodologies that push the envelope of personalized e-commerce systems. Its emphasis on session-based data encourages the development of models that are cognizant of intra-session dynamics and foster a deeper understanding of user intents. Future research motivated by these challenges may well focus on improving model efficiency, reducing training costs, and enhancing interpretability within session-based recommendation and intent prediction frameworks—key concerns for successfully implementing these models in diverse retail settings.

Github Logo Streamline Icon: https://streamlinehq.com