CTRL: Connect Collaborative and Language Model for CTR Prediction (2306.02841v4)
Abstract: Traditional click-through rate (CTR) prediction models convert the tabular data into one-hot vectors and leverage the collaborative relations among features for inferring the user's preference over items. This modeling paradigm discards essential semantic information. Though some works like P5 and CTR-BERT have explored the potential of using Pre-trained LLMs (PLMs) to extract semantic signals for CTR prediction, they are computationally expensive and suffer from low efficiency. Besides, the beneficial collaborative relations are not considered, hindering the recommendation performance. To solve these problems, in this paper, we propose a novel framework \textbf{CTRL}, which is industrial-friendly and model-agnostic with superior inference efficiency. Specifically, the original tabular data is first converted into textual data. Both tabular data and converted textual data are regarded as two different modalities and are separately fed into the collaborative CTR model and pre-trained LLM. A cross-modal knowledge alignment procedure is performed to fine-grained align and integrate the collaborative and semantic signals, and the lightweight collaborative model can be deployed online for efficient serving after fine-tuned with supervised signals. Experimental results on three public datasets show that CTRL outperforms the state-of-the-art (SOTA) CTR models significantly. Moreover, we further verify its effectiveness on a large-scale industrial recommender system.
- TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation. arXiv preprint arXiv:2305.00447 (2023).
- Language Models are Few-Shot Learners. https://doi.org/10.48550/ARXIV.2005.14165
- Enhancing Explicit and Implicit Feature Interactions via Information Sharing for Parallel Deep CTR Models. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 3757–3766.
- A simple framework for contrastive learning of visual representations. In Proceedings of ICML. PMLR, 1597–1607.
- Zheng Chen. 2023. PALR: Personalization Aware LLMs for Recommendation. arXiv preprint arXiv:2305.07622 (2023).
- Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. 7–10.
- David R Cox. 1958. The regression analysis of binary sequences. Journal of the Royal Statistical Society: Series B (Methodological) 20, 2 (1958), 215–232.
- M6-Rec: Generative Pretrained Language Models are Open-Ended Recommender Systems. https://doi.org/10.48550/ARXIV.2205.08084
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
- GLM: General Language Model Pretraining with Autoregressive Blank Infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 320–335.
- Bent Fuglede and Flemming Topsoe. 2004. Jensen-Shannon divergence and Hilbert space embedding. In International symposium on Information theory, 2004. ISIT 2004. Proceedings. IEEE, 31.
- Learning piece-wise linear models from large scale data for ad click prediction. arXiv preprint arXiv:1704.05194 (2017).
- SimCSE: Simple Contrastive Learning of Sentence Embeddings. In Proceedings of EMNLP. 6894–6910.
- Recommendation as language processing (rlp): A unified pretrain, personalized prompt & predict paradigm (p5). In Proceedings of the 16th ACM Conference on Recommender Systems. 299–315.
- Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft’s bing search engine. Omnipress.
- An embedding learning framework for numerical features in ctr prediction. In SIGKDD. 2910–2918.
- DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247 (2017).
- Michael Gutmann and Aapo Hyvärinen. 2010. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of AISTATS. JMLR Workshop and Conference Proceedings, 297–304.
- Deep residual learning for image recognition. In CVPR. 770–778.
- Practical lessons from predicting clicks on ads at facebook. In Proceedings of the eighth international workshop on data mining for online advertising. 1–9.
- Aspect-based sentiment analysis using bert. In Proceedings of the 22nd nordic conference on computational linguistics. 187–196.
- Large Language Models are Zero-Shot Rankers for Recommender Systems. arXiv preprint arXiv:2305.08845 (2023).
- Text style transfer: A review and experimental evaluation. ACM SIGKDD Explorations Newsletter 24, 1 (2022), 14–45.
- FiBiNET. In Proceedings of the 13th ACM Conference on Recommender Systems. ACM. https://doi.org/10.1145/3298689.3347043
- FiBiNET: combining feature importance and bilinear feature interaction for click-through rate prediction. In Proceedings of the 13th ACM Conference on Recommender Systems. 169–177.
- Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML. PMLR, 448–456.
- Tinybert: Distilling bert for natural language understanding. arXiv preprint arXiv:1909.10351 (2019).
- Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. https://doi.org/10.48550/ARXIV.1412.6980
- Daniel D Lee and H Sebastian Seung. 1999. Learning the parts of objects by non-negative matrix factorization. Nature 401, 6755 (1999), 788–791.
- IntTower: the Next Generation of Two-Tower Model for Pre-Ranking System. In CIKM. 3292–3301.
- Low Resource Style Transfer via Domain Adaptive Meta Learning. arXiv preprint arXiv:2205.12475 (2022).
- Exploring text-transformers in aaai 2021 shared task: Covid-19 fake news detection in english. In Combating Online Hostile Posts in Regional Languages during Emergency Situation: First International Workshop, CONSTRAINT 2021, Collocated with AAAI 2021, Virtual Event, February 8, 2021, Revised Selected Papers 1. Springer, 106–115.
- xdeepfm: Combining explicit and implicit feature interactions for recommender systems. In SIGKDD. 1754–1763.
- Feature generation by convolutional neural network for click-through rate prediction. In The World Wide Web Conference. 1119–1129.
- PTab: Using the Pre-trained Language Model for Modeling Tabular Data. arXiv preprint arXiv:2209.08060 (2022).
- Is ChatGPT a Good Recommender? A Preliminary Study. arXiv preprint arXiv:2304.10149 (2023).
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
- Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).
- Meta-learning on heterogeneous information networks for cold-start recommendation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1563–1573.
- Ad click prediction: a view from the trenches. In SIGKDD. 1222–1230.
- Marcin Michał Mirończuk and Jarosław Protasiewicz. 2018. A recent overview of the state-of-the-art elements of text classification. Expert Systems with Applications 106 (2018), 36–54.
- CTR-BERT: Cost-effective knowledge distillation for billion-parameter teacher models. In NeurIPS Efficient Natural Language and Speech Processing Workshop.
- Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In EMNLP-IJCNLP. 188–197.
- Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155 (2022).
- Product-based neural networks for user response prediction. In ICDM. IEEE, 1149–1154.
- Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. https://doi.org/10.48550/ARXIV.1910.10683
- Steffen Rendle. 2010. Factorization machines. In 2010 IEEE International conference on data mining. IEEE, 995–1000.
- Autoint: Automatic feature interaction learning via self-attentive neural networks. In CIKM. 1161–1170.
- Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, 1 (2014), 1929–1958.
- Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent. arXiv preprint arXiv:2304.09542 (2023).
- Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, 11 (2008).
- SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems. arXiv:1905.00537 [cs.CL]
- Deep & cross network for ad click predictions. In ADKDD. 1–7.
- CFM: Convolutional Factorization Machines for Context-Aware Recommendation.. In IJCAI, Vol. 19. 3926–3932.
- BERT post-training for review reading comprehension and aspect-based sentiment analysis. arXiv preprint arXiv:1904.02232 (2019).
- FILIP: fine-grained interactive language-image pre-training. arXiv preprint arXiv:2111.07783 (2021).
- A Dual Augmented Two-tower Model for Online Large-scale Recommendation. (2021).
- Adaptive User Modeling with Long and Short-Term Preferences for Personalized Recommendation.. In IJCAI. 4213–4219.
- Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model Recommendation. arXiv preprint arXiv:2305.07609 (2023).
- Recommendation as instruction following: A large language model empowered recommendation approach. arXiv preprint arXiv:2305.07001 (2023).
- Deep learning over multi-field categorical data. In ECIR. Springer, 45–57.
- Optimal real-time bidding for display advertising. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 1077–1086.
- Language models as recommender systems: Evaluations and limitations. (2021).
- Deep interest network for click-through rate prediction. In SIGKDD. 1059–1068.