Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FinalMLP: An Enhanced Two-Stream MLP Model for CTR Prediction (2304.00902v4)

Published 3 Apr 2023 in cs.IR

Abstract: Click-through rate (CTR) prediction is one of the fundamental tasks for online advertising and recommendation. While multi-layer perceptron (MLP) serves as a core component in many deep CTR prediction models, it has been widely recognized that applying a vanilla MLP network alone is inefficient in learning multiplicative feature interactions. As such, many two-stream interaction models (e.g., DeepFM and DCN) have been proposed by integrating an MLP network with another dedicated network for enhanced CTR prediction. As the MLP stream learns feature interactions implicitly, existing research focuses mainly on enhancing explicit feature interactions in the complementary stream. In contrast, our empirical study shows that a well-tuned two-stream MLP model that simply combines two MLPs can even achieve surprisingly good performance, which has never been reported before by existing work. Based on this observation, we further propose feature gating and interaction aggregation layers that can be easily plugged to make an enhanced two-stream MLP model, FinalMLP. In this way, it not only enables differentiated feature inputs but also effectively fuses stream-level interactions across two streams. Our evaluation results on four open benchmark datasets as well as an online A/B test in our industrial system show that FinalMLP achieves better performance than many sophisticated two-stream CTR models. Our source code will be available at MindSpore/models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Higher-Order Factorization Machines. In Annual Conference on Neural Information Processing Systems (NeurIPS), 3351–3359.
  2. Enhancing Explicit and Implicit Feature Interactions via Information Sharing for Parallel Deep CTR Models. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (CIKM), 3757–3766.
  3. Wide & Deep Learning for Recommender Systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems (DLRS@RecSys), 7–10.
  4. Adaptive Factorization Network: Learning Adaptive-Order Feature Interactions. In The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), 3609–3616.
  5. Looking at CTR Prediction Again: Is Attention All You Need? In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 1279–1287.
  6. Deep Neural Networks for YouTube Recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys), 191–198.
  7. Enhanced Exploration in Neural Feature Selection for Deep Click-Through Rate Prediction Models via Ensemble of Gating Layers. arXiv preprint, abs/2112.03487.
  8. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction. In International Joint Conference on Artificial Intelligence (IJCAI), 1725–1731.
  9. Practical Lessons from Predicting Clicks on Ads at Facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising (ADKDD), 5:1–5:9.
  10. FiBiNET: Combining Feature Importance and Bilinear Feature Interaction for Click-Through Rate Prediction. In Proceedings of ACM Conference on Recommender Systems (RecSys), 169–177.
  11. Field-aware Factorization Machines for CTR Prediction. In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys), 43–50.
  12. Factorized Bilinear Models for Image Recognition. In IEEE International Conference on Computer Vision (ICCV), 2098–2106.
  13. Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM), 539–548.
  14. xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (KDD), 1754–1763.
  15. Bilinear CNN Models for Fine-Grained Visual Recognition. In IEEE International Conference on Computer Vision (ICCV), 1449–1457.
  16. Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), 1930–1939.
  17. Field-weighted Factorization Machines for Click-Through Rate Prediction in Display Advertising. In Proceedings of the 2018 World Wide Web Conference (WWW), 1349–1357.
  18. Feature Selection Methods Evaluation for CTR Estimation. In Fifteenth Mexican International Conference on Artificial Intelligence (MICAI), 57–62.
  19. Rendle, S. 2010. Factorization Machines. In Proceedings of the 10th IEEE International Conference on Data Mining (ICDM), 995–1000.
  20. Neural Collaborative Filtering vs. Matrix Factorization Revisited. In Fourteenth ACM Conference on Recommender Systems (RecSys), 240–248.
  21. Predicting Clicks: Estimating the Click-Through Rate for New Ads. In Proceedings of the 16th International Conference on World Wide Web (WWW), 521–530.
  22. AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM), 1161–1170.
  23. FM2: Field-matrixed Factorization Machines for Recommender Systems. In Proceedings of the Web Conference (WWW), 2828–2837.
  24. Attention is All you Need. In Annual Conference on Neural Information Processing Systems (NeurIPS), 5998–6008.
  25. Deep & Cross Network for Ad Click Predictions. In Proceedings of the 11th International Workshop on Data Mining for Online Advertising (ADKDD), 12:1–12:7.
  26. DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems. In Proceedings of the Web Conference 2021 (WWW), 1785–1797.
  27. MaskNet: Introducing Feature-Wise Multiplication to CTR Ranking Models by Instance-Guided Mask. arXiv preprint arXiv:2102.07619.
  28. Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks. In The Twenty-Sixth International Joint Conference on Artificial Intelligence, (IJCAI), 3119–3125.
  29. Deep Interaction Machine: A Simple but Effective Model for High-order Feature Interactions. In The 29th ACM International Conference on Information and Knowledge Management (CIKM), 2285–2288.
  30. Deep Learning for Click-Through Rate Estimation. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI), 4695–4703.
  31. BARS: Towards Open Benchmarking for Recommender Systems. In The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2912–2923.
  32. Open Benchmarking for Click-Through Rate Prediction. In The 30th ACM International Conference on Information & Knowledge Management (CIKM), 2759–2769.
Citations (51)

Summary

We haven't generated a summary for this paper yet.