Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RecSys Challenge 2023: From data preparation to prediction, a simple, efficient, robust and scalable solution (2401.06830v1)

Published 12 Jan 2024 in cs.IR, cs.AI, and cs.LG

Abstract: The RecSys Challenge 2023, presented by ShareChat, consists to predict if an user will install an application on his smartphone after having seen advertising impressions in ShareChat & Moj apps. This paper presents the solution of 'Team UMONS' to this challenge, giving accurate results (our best score is 6.622686) with a relatively small model that can be easily implemented in different production configurations. Our solution scales well when increasing the dataset size and can be used with datasets containing missing values.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (11)
  1. Alan C. Acock. 2005. Working With Missing Values. Journal of Marriage and Family 67, 4 (2005), 1012–1028. https://doi.org/10.1111/j.1741-3737.2005.00191.x arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1741-3737.2005.00191.x
  2. Marvin L Brown and John F Kros. 2003. Data mining and the impact of missing data. Industrial Management & Data Systems 103, 8 (2003), 611–621.
  3. Binary classification performance measures/metrics: A comprehensive visualized roadmap to gain new insights. In 2017 International Conference on Computer Science and Engineering (UBMK). 821–826. https://doi.org/10.1109/UBMK.2017.8093539
  4. Michael Fire and Jonathan Schler. 2017. Exploring Online Ad Images Using a Deep Convolutional Neural Network Approach. In 2017 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData). 1053–1060. https://doi.org/10.1109/iThings-GreenCom-CPSCom-SmartData.2017.160
  5. Therese D. Pigott. 2001. A Review of Methods for Missing Data. Educational Research and Evaluation 7, 4 (2001), 353–383. https://doi.org/10.1076/edre.7.4.353.8937 arXiv:https://doi.org/10.1076/edre.7.4.353.8937
  6. Product-Based Neural Networks for User Response Prediction. In 2016 IEEE 16th International Conference on Data Mining (ICDM). 1149–1154. https://doi.org/10.1109/ICDM.2016.0151 ISSN: 2374-8486.
  7. Sebastian Raschka. 2014. An Overview of General Performance Metrics of Binary Classifier Systems. arXiv:1410.5330 [cs.LG]
  8. Joseph L Schafer. 1999. Multiple imputation: a primer. Statistical Methods in Medical Research 8, 1 (1999), 3–15. https://doi.org/10.1177/096228029900800102 arXiv:https://doi.org/10.1177/096228029900800102 PMID: 10347857.
  9. Xue Ying. 2019. An Overview of Overfitting and its Solutions. Journal of Physics: Conference Series 1168, 2 (feb 2019), 022022. https://doi.org/10.1088/1742-6596/1168/2/022022
  10. DeepIntent: Learning Attentions for Online Advertising with Recurrent Neural Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16). Association for Computing Machinery, New York, NY, USA, 1295–1304. https://doi.org/10.1145/2939672.2939759
  11. Deep Learning over Multi-field Categorical Data. In Advances in Information Retrieval (Lecture Notes in Computer Science), Nicola Ferro, Fabio Crestani, Marie-Francine Moens, Josiane Mothe, Fabrizio Silvestri, Giorgio Maria Di Nunzio, Claudia Hauff, and Gianmaria Silvello (Eds.). Springer International Publishing, Cham, 45–57. https://doi.org/10.1007/978-3-319-30671-1_4

Summary

We haven't generated a summary for this paper yet.