2000 character limit reached
RecSys Challenge 2023: From data preparation to prediction, a simple, efficient, robust and scalable solution (2401.06830v1)
Published 12 Jan 2024 in cs.IR, cs.AI, and cs.LG
Abstract: The RecSys Challenge 2023, presented by ShareChat, consists to predict if an user will install an application on his smartphone after having seen advertising impressions in ShareChat & Moj apps. This paper presents the solution of 'Team UMONS' to this challenge, giving accurate results (our best score is 6.622686) with a relatively small model that can be easily implemented in different production configurations. Our solution scales well when increasing the dataset size and can be used with datasets containing missing values.
- Alan C. Acock. 2005. Working With Missing Values. Journal of Marriage and Family 67, 4 (2005), 1012–1028. https://doi.org/10.1111/j.1741-3737.2005.00191.x arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1741-3737.2005.00191.x
- Marvin L Brown and John F Kros. 2003. Data mining and the impact of missing data. Industrial Management & Data Systems 103, 8 (2003), 611–621.
- Binary classification performance measures/metrics: A comprehensive visualized roadmap to gain new insights. In 2017 International Conference on Computer Science and Engineering (UBMK). 821–826. https://doi.org/10.1109/UBMK.2017.8093539
- Michael Fire and Jonathan Schler. 2017. Exploring Online Ad Images Using a Deep Convolutional Neural Network Approach. In 2017 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData). 1053–1060. https://doi.org/10.1109/iThings-GreenCom-CPSCom-SmartData.2017.160
- Therese D. Pigott. 2001. A Review of Methods for Missing Data. Educational Research and Evaluation 7, 4 (2001), 353–383. https://doi.org/10.1076/edre.7.4.353.8937 arXiv:https://doi.org/10.1076/edre.7.4.353.8937
- Product-Based Neural Networks for User Response Prediction. In 2016 IEEE 16th International Conference on Data Mining (ICDM). 1149–1154. https://doi.org/10.1109/ICDM.2016.0151 ISSN: 2374-8486.
- Sebastian Raschka. 2014. An Overview of General Performance Metrics of Binary Classifier Systems. arXiv:1410.5330 [cs.LG]
- Joseph L Schafer. 1999. Multiple imputation: a primer. Statistical Methods in Medical Research 8, 1 (1999), 3–15. https://doi.org/10.1177/096228029900800102 arXiv:https://doi.org/10.1177/096228029900800102 PMID: 10347857.
- Xue Ying. 2019. An Overview of Overfitting and its Solutions. Journal of Physics: Conference Series 1168, 2 (feb 2019), 022022. https://doi.org/10.1088/1742-6596/1168/2/022022
- DeepIntent: Learning Attentions for Online Advertising with Recurrent Neural Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16). Association for Computing Machinery, New York, NY, USA, 1295–1304. https://doi.org/10.1145/2939672.2939759
- Deep Learning over Multi-field Categorical Data. In Advances in Information Retrieval (Lecture Notes in Computer Science), Nicola Ferro, Fabio Crestani, Marie-Francine Moens, Josiane Mothe, Fabrizio Silvestri, Giorgio Maria Di Nunzio, Claudia Hauff, and Gianmaria Silvello (Eds.). Springer International Publishing, Cham, 45–57. https://doi.org/10.1007/978-3-319-30671-1_4