MerRec: A Large-scale Multipurpose Mercari Dataset for Consumer-to-Consumer Recommendation Systems (2402.14230v2)
Abstract: In the evolving e-commerce field, recommendation systems crucially shape user experience and engagement. The rise of Consumer-to-Consumer (C2C) recommendation systems, noted for their flexibility and ease of access for customer vendors, marks a significant trend. However, the academic focus remains largely on Business-to-Consumer (B2C) models, leaving a gap filled by the limited C2C recommendation datasets that lack in item attributes, user diversity, and scale. The intricacy of C2C recommendation systems is further accentuated by the dual roles users assume as both sellers and buyers, introducing a spectrum of less uniform and varied inputs. Addressing this, we introduce MerRec, the first large-scale dataset specifically for C2C recommendations, sourced from the Mercari e-commerce platform, covering millions of users and products over 6 months in 2023. MerRec not only includes standard features such as user_id, item_id, and session_id, but also unique elements like timestamped action types, product taxonomy, and textual product attributes, offering a comprehensive dataset for research. This dataset, extensively evaluated across four recommendation tasks, establishes a new benchmark for the development of advanced recommendation algorithms in real-world scenarios, bridging the gap between academia and industry and propelling the study of C2C recommendations. Our experiment code is available at https://github.com/mercari/mercari-ml-merrec-pub-us and dataset at https://huggingface.co/datasets/mercari-us/merrec.
- A systematic study on the recommender systems in the E-commerce. Ieee Access 8 (2020), 115694–115716.
- Denoising User-aware Memory Network for Recommendation. In Proceedings of the 15th ACM Conference on Recommender Systems (Amsterdam, Netherlands) (RecSys ’21). Association for Computing Machinery, New York, NY, USA, 400–410. https://doi.org/10.1145/3460231.3474237
- Robin Burke. 2002. Hybrid recommender systems: Survey and experiments. User modeling and user-adapted interaction 12 (2002), 331–370.
- Sustainable business models of e-marketplaces: An analysis from the consumer perspective. Journal of Open Innovation: Technology, Market, and Complexity 9, 3 (2023), 100121.
- Sampling Is All You Need on Modeling Long-Term User Behaviors for CTR Prediction. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management (Atlanta, GA, USA) (CIKM ’22). Association for Computing Machinery, New York, NY, USA, 2974–2983. https://doi.org/10.1145/3511808.3557082
- PEPNet: Parameter and Embedding Personalized Network for Infusing with Personalized Prior Information. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (¡conf-loc¿, ¡city¿Long Beach¡/city¿, ¡state¿CA¡/state¿, ¡country¿USA¡/country¿, ¡/conf-loc¿) (KDD ’23). Association for Computing Machinery, New York, NY, USA, 3795–3804. https://doi.org/10.1145/3580305.3599884
- Wide & Deep Learning for Recommender Systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems (Boston, MA, USA) (DLRS 2016). Association for Computing Machinery, New York, NY, USA, 7–10. https://doi.org/10.1145/2988450.2988454
- Adaptive Factorization Network: Learning Adaptive-Order Feature Interactions. Proceedings of the AAAI Conference on Artificial Intelligence 34, 04 (Apr. 2020), 3609–3616. https://doi.org/10.1609/aaai.v34i04.5768
- Deep Neural Networks for YouTube Recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems (Boston, Massachusetts, USA) (RecSys ’16). Association for Computing Machinery, New York, NY, USA, 191–198. https://doi.org/10.1145/2959100.2959190
- Graph trend filtering networks for recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 112–121.
- DeepFM: a factorization-machine based neural network for CTR prediction. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (Melbourne, Australia) (IJCAI’17). AAAI Press, Washington, DC, USA, 1725–1731.
- Xiangnan He and Tat-Seng Chua. 2017. Neural Factorization Machines for Sparse Predictive Analytics. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (Shinjuku, Tokyo, Japan) (SIGIR ’17). Association for Computing Machinery, New York, NY, USA, 355–364. https://doi.org/10.1145/3077136.3080777
- Session-based Recommendations with Recurrent Neural Networks. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). 10 pages. http://arxiv.org/abs/1511.06939
- Deep Context Interest Network for Click-Through Rate Prediction. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (¡conf-loc¿, ¡city¿Birmingham¡/city¿, ¡country¿United Kingdom¡/country¿, ¡/conf-loc¿) (CIKM ’23). Association for Computing Machinery, New York, NY, USA, 3948–3952. https://doi.org/10.1145/3583780.3615233
- Recommendation systems: Principles, methods and evaluation. Egyptian informatics journal 16, 3 (2015), 261–273.
- Adaptive mixtures of local experts. Neural computation 3, 1 (1991), 79–87.
- ESMC: Entire Space Multi-Task Model for Post-Click Conversion Rate via Parameter Constraint. , 14 pages. arXiv:2307.09193 [cs.AI]
- Amazon-M2: A Multilingual Multi-locale Shopping Session Dataset for Recommendation and Text Generation. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 21 pages.
- W. Kang and J. McAuley. 2018. Self-Attentive Sequential Recommendation. In 2018 IEEE International Conference on Data Mining (ICDM). IEEE Computer Society, Los Alamitos, CA, USA, 197–206. https://doi.org/10.1109/ICDM.2018.00035
- Seth Siyuan Li and Elena Karahanna. 2015. Online recommendation systems in a B2C E-commerce context: a review and future directions. Journal of the association for information systems 16, 2 (2015), 2.
- Interpretable Click-Through Rate Prediction through Hierarchical Attention. In Proceedings of the 13th International Conference on Web Search and Data Mining (Houston, TX, USA) (WSDM ’20). Association for Computing Machinery, New York, NY, USA, 313–321. https://doi.org/10.1145/3336191.3371785
- xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (London, United Kingdom) (KDD ’18). Association for Computing Machinery, New York, NY, USA, 1754–1763. https://doi.org/10.1145/3219819.3220023
- Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (London, United Kingdom) (KDD ’18). Association for Computing Machinery, New York, NY, USA, 1930–1939. https://doi.org/10.1145/3219819.3220007
- Entire space multi-task model: An effective approach for estimating post-click conversion rate. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 1137–1140.
- Jeffrey Mvutu Mabilama. 2024. C2C Fashion Store User Data - User Behavior and Demographics in a C2C Fashion Store. https://www.kaggle.com/datasets/thedevastator/global-c2c-fashion-store-user-behaviour-analysis
- FinalMLP: An Enhanced Two-Stream MLP Model for CTR Prediction. Proceedings of the AAAI Conference on Artificial Intelligence 37, 4 (Jun. 2023), 4552–4560. https://doi.org/10.1609/aaai.v37i4.25577
- Netflix. 2020. Netflix Prize data. https://www.kaggle.com/datasets/netflix-inc/netflix-prize-data
- Modeling Heart Rate and Activity Data for Personalized Fitness Recommendation. In The World Wide Web Conference (San Francisco, CA, USA) (WWW ’19). Association for Computing Machinery, New York, NY, USA, 1343–1353. https://doi.org/10.1145/3308558.3313643
- How Green Consumption Values Affect the Intention-Behavior Relationship in C2C e-commerce. In Proceedings of the 57th Hawaii International Conference on System Sciences. 4375–4382. https://hdl.handle.net/10125/106911
- Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 8748–8763. https://proceedings.mlr.press/v139/radford21a.html
- Steffen Rendle. 2010. Factorization Machines. In Proceedings of the 2010 IEEE International Conference on Data Mining (ICDM ’10). IEEE Computer Society, USA, 995–1000. https://doi.org/10.1109/ICDM.2010.127
- Collaborative Filtering Recommender Systems. Springer Berlin Heidelberg, Berlin, Heidelberg, 291–324. https://doi.org/10.1007/978-3-540-72079-9_9
- Neural Machine Translation of Rare Words with Subword Units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Katrin Erk and Noah A. Smith (Eds.). Association for Computational Linguistics, Berlin, Germany, 1715–1725. https://doi.org/10.18653/v1/P16-1162
- Kihyuk Sohn. 2016. Improved Deep Metric Learning with Multi-class N-pair Loss Objective. In Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (Eds.), Vol. 29. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2016/file/6b180037abbebea991d8b1232f8a8ca9-Paper.pdf
- BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (Beijing, China) (CIKM ’19). Association for Computing Machinery, New York, NY, USA, 1441–1450. https://doi.org/10.1145/3357384.3357895
- FM2: Field-matrixed Factorization Machines for Recommender Systems. In Proceedings of the Web Conference 2021 (Ljubljana, Slovenia) (WWW ’21). Association for Computing Machinery, New York, NY, USA, 2828–2837. https://doi.org/10.1145/3442381.3449930
- Survey on collaborative filtering, content-based filtering and hybrid recommendation system. International Journal of Computer Applications 110, 4 (2015), 31–36.
- EulerNet: Adaptive Feature Interaction Learning via Euler’s Formula for CTR Prediction. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (¡conf-loc¿, ¡city¿Taipei¡/city¿, ¡country¿Taiwan¡/country¿, ¡/conf-loc¿) (SIGIR ’23). Association for Computing Machinery, New York, NY, USA, 1376–1385. https://doi.org/10.1145/3539618.3591681
- Deep & Cross Network for Ad Click Predictions. In Proceedings of the ADKDD’17 (Halifax, NS, Canada) (ADKDD’17). Association for Computing Machinery, New York, NY, USA, Article 12, 7 pages. https://doi.org/10.1145/3124749.3124754
- DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems. In Proceedings of the Web Conference 2021 (Ljubljana, Slovenia) (WWW ’21). Association for Computing Machinery, New York, NY, USA, 1785–1797. https://doi.org/10.1145/3442381.3450078
- Personalized Embedding-based e-Commerce Recommendations at eBay. arXiv:2102.06156 [cs.IR]
- Attentional factorization machines: learning the weight of feature interactions via attention networks. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (Melbourne, Australia) (IJCAI’17). AAAI Press, Washington, DC, USA, 3119–3125.
- Deep feedback network for recommendation. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (Yokohama, Yokohama, Japan) (IJCAI’20). Article 349, 7 pages.
- Canran Xu and Ming Wu. 2020. Learning Feature Interactions with Lorentzian Factorization Machine. Proceedings of the AAAI Conference on Artificial Intelligence 34, 04 (Apr. 2020), 6470–6477. https://doi.org/10.1609/aaai.v34i04.6119
- Mixed Negative Sampling for Learning Two-tower Neural Networks in Recommendations. In Companion Proceedings of the Web Conference 2020 (Taipei, Taiwan) (WWW ’20). Association for Computing Machinery, New York, NY, USA, 441–447. https://doi.org/10.1145/3366424.3386195
- Sampling-Bias-Corrected Neural Modeling for Large Corpus Item Recommendations.
- A Simple Convolutional Generative Network for Next Item Recommendation. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (Melbourne VIC, Australia) (WSDM ’19). Association for Computing Machinery, New York, NY, USA, 582–590. https://doi.org/10.1145/3289600.3290975
- Tenrec: A Large-scale Multipurpose Benchmark Dataset for Recommender Systems. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., USA, 11480–11493. https://proceedings.neurips.cc/paper_files/paper/2022/file/4ad4fc1528374422dd7a69dea9e72948-Paper-Datasets_and_Benchmarks.pdf
- FiBiNet++: Reducing Model Size by Low Rank Feature Interaction Layer for CTR Prediction. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (¡conf-loc¿, ¡city¿Birmingham¡/city¿, ¡country¿United Kingdom¡/country¿, ¡/conf-loc¿) (CIKM ’23). Association for Computing Machinery, New York, NY, USA, 4425–4429. https://doi.org/10.1145/3583780.3615242
- Deep learning based recommender system: A survey and new perspectives. ACM computing surveys (CSUR) 52, 1 (2019), 1–38.
- RecBole: Towards a Unified, Comprehensive and Efficient Framework for Recommendation Algorithms. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (Virtual Event, Queensland, Australia) (CIKM ’21). Association for Computing Machinery, New York, NY, USA, 4653–4664. https://doi.org/10.1145/3459637.3482016
- Deep interest evolution network for click-through rate prediction. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence (Honolulu, Hawaii, USA) (AAAI’19/IAAI’19/EAAI’19). AAAI Press, Washington, DC, USA, Article 729, 8 pages. https://doi.org/10.1609/aaai.v33i01.33015941
- Deep Interest Network for Click-Through Rate Prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (London, United Kingdom) (KDD ’18). Association for Computing Machinery, New York, NY, USA, 1059–1068. https://doi.org/10.1145/3219819.3219823
- Temporal Interest Network for Click-Through Rate Prediction. arXiv:2308.08487 [cs.IR]
- Open Benchmarking for Click-Through Rate Prediction. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (Virtual Event, Queensland, Australia) (CIKM ’21). Association for Computing Machinery, New York, NY, USA, 2759–2769. https://doi.org/10.1145/3459637.3482486