Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Lightweight Boosting Models for User Response Prediction Using Adversarial Validation (2310.03778v1)

Published 5 Oct 2023 in cs.LG and cs.AI

Abstract: The ACM RecSys Challenge 2023, organized by ShareChat, aims to predict the probability of the app being installed. This paper describes the lightweight solution to this challenge. We formulate the task as a user response prediction task. For rapid prototyping for the task, we propose a lightweight solution including the following steps: 1) using adversarial validation, we effectively eliminate uninformative features from a dataset; 2) to address noisy continuous features and categorical features with a large number of unique values, we employ feature engineering techniques.; 3) we leverage Gradient Boosted Decision Trees (GBDT) for their exceptional performance and scalability. The experiments show that a single LightGBM model, without additional ensembling, performs quite well. Our team achieved ninth place in the challenge with the final leaderboard score of 6.059065. Code for our approach can be found here: https://github.com/choco9966/recsys-challenge-2023.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. Sercan Ö Arik and Tomas Pfister. 2021. Tabnet: Attentive interpretable tabular learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 6679–6687.
  2. The Kaggle Book: Data analysis and machine learning for competitive data science. Packt Publishing Ltd.
  3. The netflix prize. In Proceedings of KDD cup and workshop, Vol. 2007. New York, 35.
  4. Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785–794.
  5. CatBoost: gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363 (2018).
  6. Why do tree-based models still outperform deep learning on typical tabular data? Advances in Neural Information Processing Systems 35 (2022), 507–520.
  7. Deep multifaceted transformers for multi-objective ranking in large-scale e-commerce recommender systems. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2493–2500.
  8. Practical lessons from predicting clicks on ads at facebook. In Proceedings of the eighth international workshop on data mining for online advertising. 1–9.
  9. Boosting algorithms for a session-based, context-aware recommender system in an online travel domain. In Proceedings of the Workshop on ACM Recommender Systems Challenge. 1–5.
  10. Field-aware factorization machines for CTR prediction. In Proceedings of the 10th ACM conference on recommender systems. 43–50.
  11. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems 30 (2017).
  12. When Do Neural Nets Outperform Boosted Trees on Tabular Data? arXiv preprint arXiv:2305.02997 (2023).
  13. Ad click prediction: a view from the trenches. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 1222–1230.
  14. Steffen Rendle. 2010. Factorization machines. In 2010 IEEE International conference on data mining. IEEE, 995–1000.
  15. Attention is all you need. Advances in neural information processing systems 30 (2017).
  16. Deep & cross network for ad click predictions. In Proceedings of the ADKDD’17. 1–7.
  17. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1059–1068.
  18. Industrial Solution in Fashion-domain Recommendation by an Efficient Pipeline using GNN and Lightgbm. In Proceedings of the Recommender Systems Challenge 2022. 45–49.

Summary

We haven't generated a summary for this paper yet.