Ads Recommendation in a Collapsed and Entangled World (2403.00793v2)
Abstract: We present Tencent's ads recommendation system and examine the challenges and practices of learning appropriate recommendation representations. Our study begins by showcasing our approaches to preserving prior knowledge when encoding features of diverse types into embedding representations. We specifically address sequence features, numeric features, and pre-trained embedding features. Subsequently, we delve into two crucial challenges related to feature representation: the dimensional collapse of embeddings and the interest entanglement across different tasks or scenarios. We propose several practical approaches to address these challenges that result in robust and disentangled recommendation representations. We then explore several training techniques to facilitate model optimization, reduce bias, and enhance exploration. Additionally, we introduce three analysis tools that enable us to study feature correlation, dimensional collapse, and interest entanglement. This work builds upon the continuous efforts of Tencent's ads recommendation team over the past decade. It summarizes general design principles and presents a series of readily applicable solutions and analysis tools. The reported performance is based on our online advertising platform, which handles hundreds of billions of requests daily and serves millions of ads to billions of users.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
- Understanding Scaling Laws for Recommendation Models. arXiv preprint arXiv:2208.08489 (2022).
- Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35, 8 (2013), 1798–1828.
- Yoshua Bengio and Olivier Delalleau. 2011. On the expressive power of deep architectures. In International conference on algorithmic learning theory. Springer, 18–36.
- The curse of highly variable functions for local kernel machines. Advances in neural information processing systems 18 (2005).
- Rich Caruana. 1997. Multitask learning. Machine learning 28, 1 (1997), 41–75.
- Pepnet: Parameter and embedding personalized network for infusing with personalized prior information. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3795–3804.
- Olivier Chapelle. 2014. Modeling delayed feedback in display advertising. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 1097–1105.
- Numerical Feature Representation with Hybrid N-ary Encoding. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 2984–2993.
- Behavior sequence transformer for e-commerce recommendation in alibaba. In Proceedings of the 1st International Workshop on Deep Learning Practice for High-Dimensional Sparse Data. 1–4.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597–1607.
- Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. 7–10.
- Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems. 191–198.
- The YouTube video recommendation system. In Proceedings of the fourth ACM conference on Recommender systems. 293–296.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
- Deep session interest network for click-through rate prediction. In International Joint Conference on Artificial Intelligence (IJCAI). 2301–2307.
- Multimodal-gpt: A vision and language model for dialogue with humans. arXiv preprint arXiv:2305.04790 (2023).
- Mihajlo Grbovic and Haibin Cheng. 2018. Real-time personalization using embeddings for search ranking at airbnb. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 311–320.
- An embedding learning framework for numerical features in ctr prediction. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2910–2918.
- DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247 (2017).
- On the Embedding Collapse when Scaling up Recommendation Models. arXiv preprint arXiv:2310.04400 (2023).
- Inductive representation learning on large graphs. Advances in neural information processing systems 30 (2017).
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
- Practical lessons from predicting clicks on ads at facebook. In International Workshop on Data Mining for Online Advertising (ADKDD). 1–9.
- Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).
- On feature decorrelation in self-supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9598–9608.
- Understanding Dimensional Collapse in Contrastive Self-supervised Learning. In ICLR.
- Field-aware factorization machines for CTR prediction. In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys). 43–50.
- Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In 2018 IEEE International Conference on Data Mining (ICDM). IEEE, 197–206.
- Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30–37.
- Imagenet classification with deep convolutional neural networks. In NeurIPS.
- Click-through prediction for advertising in twitter timeline. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1959–1968.
- CTRL: Connect Tabular and Language Model for CTR Prediction. arXiv preprint arXiv:2306.02841 (2023).
- xdeepfm: Combining explicit and implicit feature interactions for recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD). 1754–1763.
- Understanding the Ranking Loss for Recommendation with Sparse User Feedback. arXiv preprint (2024).
- Disentangled multimodal representation learning for recommendation. IEEE Transactions on Multimedia (2022).
- Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts. In KDD. ACM, 1930–1939.
- FinalMLP: An Enhanced Two-Stream MLP Model for CTR Prediction. arXiv preprint arXiv:2304.00902 (2023).
- Ad click prediction: a view from the trenches. In ACM SIGKDD International conference on Knowledge Discovery & Data Mining (KDD). 1222–1230.
- Deep learning recommendation model for personalization and recommendation systems. arXiv preprint arXiv:1906.00091 (2019).
- Predicting different types of conversions with multi-task learning in online advertising. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2689–2697.
- Field-weighted factorization machines for click-through rate prediction in display advertising. In Proceedings of the 2018 World Wide Web Conference (WWW). 1349–1357.
- Search-based user interest modeling with lifelong sequential behavior data for click-through rate prediction. In ACM International Conference on Information & Knowledge Management (CIKM). 2685–2692.
- Product-based neural networks for user response prediction. In 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 1149–1154.
- Steffen Rendle. 2010. Factorization machines. In 2010 IEEE International Conference on Data Mining (ICDM). IEEE, 995–1000.
- Joint Optimization of Ranking and Calibration with Contextualized Hybrid Model. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4813–4822.
- One model to serve all: Star topology adaptive recommender for multi-domain ctr prediction. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 4104–4113.
- Autoint: Automatic feature interaction learning via self-attentive neural networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM). 1161–1170.
- STEM: Unleashing the Power of Embeddings for Multi-task Recommendation. arXiv preprint arXiv:2308.13537 (2023).
- BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In ACM International Conference on Information and Knowledge Management (CIKM). 1441–1450.
- Fm2: Field-matrixed factorization machines for recommender systems. In Proceedings of the Web Conference 2021. 2828–2837.
- Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations. In RecSys. ACM, 269–278.
- EulerNet: Adaptive Feature Interaction Learning via Euler’s Formula for CTR Prediction. arXiv preprint arXiv:2304.10711 (2023).
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
- Attention is all you need. In NeurIPS.
- CL4CTR: A Contrastive Learning Framework for CTR Prediction. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining. 805–813.
- Locality sensitive hashing revisited: filling the gap between theory and algorithm analysis. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management (San Francisco, California, USA) (CIKM ’13). Association for Computing Machinery, New York, NY, USA, 1969–1978. https://doi.org/10.1145/2505515.2505765
- Display advertising with real-time bidding (RTB) and behavioural targeting. Foundations and Trends® in Information Retrieval 11, 4-5 (2017), 297–435.
- DCN-V2: Improved deep & cross network and practical lessons for web-scale learning to rank systems. In Proceedings of the Web Conference (WWW). 1785–1797.
- Tongzhou Wang and Phillip Isola. 2020. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In International Conference on Machine Learning. PMLR, 9929–9939.
- Disentangled representation learning. arXiv preprint arXiv:2211.11695 (2022).
- Disentangled representation learning for recommendation. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 1 (2022), 408–424.
- Christopher KI Williams and Carl Edward Rasmussen. 2006. Gaussian processes for machine learning. Vol. 2. MIT press Cambridge, MA.
- Yanwu Yang and Panyu Zhai. 2022. Click-through rate prediction in online advertising: A literature review. Information Processing & Management 59, 2 (2022), 102853.
- A feedback shift correction in predicting conversion rates under delayed feedback. In Proceedings of The Web Conference 2020. 2740–2746.
- Deep learning based recommender system: A survey and new perspectives. ACM computing surveys (CSUR) 52, 1 (2019), 1–38.
- Deep learning for click-through rate estimation. arXiv preprint arXiv:2104.10584 (2021).
- A survey of large language models. arXiv preprint arXiv:2303.18223 (2023).
- Adapting large language models by integrating collaborative semantics for recommendation. arXiv preprint arXiv:2311.09049 (2023).
- Deep interest evolution network for click-through rate prediction. In AAAI Conference on Artificial Intelligence (AAAI), Vol. 33. 5941–5948.
- Deep interest network for click-through rate prediction. In ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). 1059–1068.
- Temporal Interest Network for Click-Through Rate Prediction. arXiv preprint arXiv:2308.08487 (2023).
- HiNet: A Novel Multi-Scenario & Multi-Task Learning Approach with Hierarchical Information Extraction. arXiv preprint arXiv:2303.06095 (2023).
- S3-rec: Self-supervised learning for sequential recommendation with mutual information maximization. In Proceedings of the 29th ACM international conference on information & knowledge management. 1893–1902.
- Junwei Pan (29 papers)
- Wei Xue (149 papers)
- Ximei Wang (19 papers)
- Haibin Yu (10 papers)
- Xun Liu (39 papers)
- Shijie Quan (3 papers)
- Xueming Qiu (1 paper)
- Dapeng Liu (21 papers)
- Lei Xiao (68 papers)
- Jie Jiang (246 papers)