ERASE: Benchmarking Feature Selection Methods for Deep Recommender Systems (2403.12660v3)
Abstract: Deep Recommender Systems (DRS) are increasingly dependent on a large number of feature fields for more precise recommendations. Effective feature selection methods are consequently becoming critical for further enhancing the accuracy and optimizing storage efficiencies to align with the deployment demands. This research area, particularly in the context of DRS, is nascent and faces three core challenges. Firstly, variant experimental setups across research papers often yield unfair comparisons, obscuring practical insights. Secondly, the existing literature's lack of detailed analysis on selection attributes, based on large-scale datasets and a thorough comparison among selection techniques and DRS backbones, restricts the generalizability of findings and impedes deployment on DRS. Lastly, research often focuses on comparing the peak performance achievable by feature selection methods, an approach that is typically computationally infeasible for identifying the optimal hyperparameters and overlooks evaluating the robustness and stability of these methods. To bridge these gaps, this paper presents ERASE, a comprehensive bEnchmaRk for feAture SElection for DRS. ERASE comprises a thorough evaluation of eleven feature selection methods, covering both traditional and deep learning approaches, across four public datasets, private industrial datasets, and a real-world commercial platform, achieving significant enhancement. Our code is available online for ease of reproduction.
- A review of feature selection methods on synthetic data. Knowledge and information systems 34 (2013), 483–519.
- A review of microarray datasets and applied feature selection methods. Information sciences 282 (2014), 111–135.
- Benchmark for filter methods for feature selection in high-dimensional classification data. Computational Statistics & Data Analysis 143 (2020), 106839.
- Leo Breiman. 2001. Random forests. Machine learning 45 (2001), 5–32.
- A Comprehensive Survey on Automated Machine Learning for Recommendations. arXiv preprint arXiv:2204.01390 (2022).
- Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785–794.
- Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. 7–10.
- A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning. In NeurIPS 2023 Second Table Representation Learning Workshop.
- SL Shiva Darshan and CD Jaidhar. 2018. Performance evaluation of filter-based feature selection techniques in classifying portable executable files. Procedia Computer Science 125 (2018), 346–356.
- AutoFS: Automated Feature selection via diversity-aware interactive reinforcement learning. In 2020 IEEE International Conference on Data Mining (ICDM). IEEE, 1008–1013.
- All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously. Journal of machine learning research: JMLR 20 (2019).
- George Forman. 2003. An Extensive Empirical Study of Feature Selection Metrics for Text Classification. Journal of Machine Learning Research 3 (2003), 1289–1305.
- Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189–1232.
- Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemometrics and intelligent laboratory systems (2006).
- An embedding learning framework for numerical features in ctr prediction. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2910–2918.
- DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247 (2017).
- LPFS: Learnable Polarizing Feature Selection for Click-Through Rate Prediction. arXiv preprint arXiv:2206.00267 (2022).
- Isabelle Guyon and André Elisseeff. 2003. An introduction to variable and feature selection. Journal of machine learning research 3, Mar (2003), 1157–1182.
- Jeff Heaton. 2016. An empirical analysis of feature engineering for predictive modeling. In SoutheastCon 2016. IEEE, 1–6.
- FiBiNET: combining feature importance and bilinear feature interaction for click-through rate prediction. In Proceedings of the 13th ACM Conference on Recommender Systems. 169–177.
- MvFS: Multi-view Feature Selection for Recommender System. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 4048–4052.
- AdaFS: Adaptive Feature Selection in Deep Recommender System. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). 3309–3317.
- Huan Liu and Rudy Setiono. 1995. Chi2: Feature selection and discretization of numeric attributes. In Proceedings of 7th IEEE international conference on tools with artificial intelligence. IEEE, 388–391.
- DARTS: Differentiable Architecture Search. In International Conference on Learning Representations.
- Automating feature subspace exploration via multi-agent reinforcement learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 207–215.
- Optimizing Feature Set for Click-Through Rate Prediction. In Proceedings of the ACM Web Conference 2023. 3386–3395.
- Entire space multi-task model: An effective approach for estimating post-click conversion rate. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 1137–1140.
- Steffen Rendle. 2010. Factorization Machines. In Proceedings of the 2010 IEEE International Conference on Data Mining. 995–1000.
- BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. 452–461.
- Shital C Shah and Andrew Kusiak. 2004. Data mining and genetic algorithm based gene/SNP selection. Artificial intelligence in medicine 31, 3 (2004), 183–196.
- Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC bioinformatics 8, 1 (2007), 1–21.
- Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) (1996).
- Jorge R Vergara and Pablo A Estévez. 2014. A review of feature selection methods based on mutual information. Neural computing and applications 24 (2014), 175–186.
- Deep & cross network for ad click predictions. In Proceedings of the ADKDD’17. 1–7.
- Single-shot Feature Selection for Multi-task Recommendations. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 341–351.
- AutoField: Automating Feature Selection in Deep Recommender Systems. In Proceedings of the ACM Web Conference.
- SHARK: A Lightweight Model Compression Approach for Large-Scale Recommender Systems. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (CIKM ’23).
- Simplifying reinforced feature selection via restructured choice strategy of single agent. In 2020 IEEE International Conference on Data Mining (ICDM). IEEE, 871–880.
- Automl for deep recommender systems: A survey. ACM Transactions on Information Systems 41, 4 (2023), 1–38.