Papers
Topics
Authors
Recent
Search
2000 character limit reached

SIGIR 2021 E-Commerce Workshop Data Challenge

Published 19 Apr 2021 in cs.IR | (2104.09423v4)

Abstract: The 2021 SIGIR workshop on eCommerce is hosting the Coveo Data Challenge for "In-session prediction for purchase intent and recommendations". The challenge addresses the growing need for reliable predictions within the boundaries of a shopping session, as customer intentions can be different depending on the occasion. The need for efficient procedures for personalization is even clearer if we consider the e-commerce landscape more broadly: outside of giant digital retailers, the constraints of the problem are stricter, due to smaller user bases and the realization that most users are not frequently returning customers. We release a new session-based dataset including more than 30M fine-grained browsing events (product detail, add, purchase), enriched by linguistic behavior (queries made by shoppers, with items clicked and items not clicked after the query) and catalog meta-data (images, text, pricing information). On this dataset, we ask participants to showcase innovative solutions for two open problems: a recommendation task (where a model is shown some events at the start of a session, and it is asked to predict future product interactions); an intent prediction task, where a model is shown a session containing an add-to-cart event, and it is asked to predict whether the item will be bought before the end of the session.

Citations (18)

Summary

  • The paper introduces a competitive data challenge that refines session-based recommendation and cart-abandonment prediction using over 10 million recorded interactions.
  • The methodology leverages granular session data—including browsing, search, and product metadata—and employs metrics such as MRR and weighted micro F1 for rigorous evaluation.
  • The results offer actionable insights for medium-sized retailers and guide future research on adaptive neural models in dynamic e-commerce environments.

An Evaluation of the SIGIR 2021 E-Commerce Workshop Data Challenge

The SIGIR 2021 E-Commerce Workshop organized by Coveo Labs introduced a competitive research environment aimed at advancing the capabilities of machine learning models, particularly in the context of e-commerce. The burgeoning e-commerce industry, estimated to grow to a $4.5 trillion market, provides fertile ground for research in Information Retrieval (IR), NLP, and recommendation systems. This event specifically addressed personalization challenges within e-commerce, a critical task given the increasingly dynamic customer journey and changing user intentions throughout sessions.

Data Challenge Overview

The workshop presented the Coveo Data Challenge, which offered researchers access to a new session-based dataset, boasting over 10 million product interactions. The dataset simulates the heterogeneous nature of mid-sized e-commerce platforms, thus offering a realistic scenario representative of the broader industry spectrum. Participants were tasked with solving two key problems:

  1. Session-Based Recommendation Task: Participants were required to predict future product interactions within a session, leveraging past interactions and search queries.
  2. Cart-Abandonment Prediction Task: This involved predicting whether a shopper will complete a purchase after an add-to-cart event, focusing on how abandonment prediction evolves with additional user interactions.

Dataset and Methodological Nuances

The dataset comprises browsing interactions, search interactions, and product metadata, facilitating multifaceted investigations into user behavior. Interaction sessions were finely granular, providing rich temporal data for modeling transactional and interactive behaviors. This sets the stage for rigorous session-based modeling, representing a step forward from traditional datasets, which may lack session integrity or context granularity.

The competition provided a structured evaluation protocol. For recommendations, evaluation relied on metrics such as Mean Reciprocal Rank (MRR) and F1 score, tailored for sequential prediction. For the cart-abandonment task, a weighted micro F1 score was deployed across different session stages to emphasize earlier prediction accuracy.

Implications and Future Directions

The SIGIR 2021 workshop's dataset and challenges carry significant implications. Practically, the dataset equips medium-sized retailers with the tools to improve conversion rates through more refined session-based recommendations and cart abandonment models, a substantial contributor to revenue. Theoretically, it affords researchers a medium to evaluate advanced neural networks within realistic e-commerce contexts, beyond traditional benchmarks and datasets that lack broad applicability or entail significant assumptions.

The focus on neural session-based models and intent prediction echoes the need for adaptive and fast-learning systems capable of operating under dynamic user interaction scenarios. This stress on speed and adaptive capacity is especially pertinent in non-recurring shopper contexts where repeated interactions across user sessions are limited.

Conclusion

The SIGIR 2021 E-Commerce Workshop, through its data challenge, has successfully posed critical questions and offered methodologies that push the envelope of personalized e-commerce systems. Its emphasis on session-based data encourages the development of models that are cognizant of intra-session dynamics and foster a deeper understanding of user intents. Future research motivated by these challenges may well focus on improving model efficiency, reducing training costs, and enhancing interpretability within session-based recommendation and intent prediction frameworks—key concerns for successfully implementing these models in diverse retail settings.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.