FLIP: Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction (2310.19453v4)
Abstract: Click-through rate (CTR) prediction plays as a core function module in various personalized online services. The traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality, which capture the collaborative signals via feature interaction modeling. But the one-hot encoding discards the semantic information included in the textual features. Recently, the emergence of Pretrained LLMs(PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality obtained by hard prompt templates and adopts PLMs to extract the semantic knowledge. However, PLMs often face challenges in capturing field-wise collaborative signals and distinguishing features with subtle textual differences. In this paper, to leverage the benefits of both paradigms and meanwhile overcome their limitations, we propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained LLMs(FLIP) for CTR prediction. Unlike most methods that solely rely on global views through instance-level contrastive learning, we design a novel jointly masked tabular/LLMing task to learn fine-grained alignment between tabular IDs and word tokens. Specifically, the masked data of one modality (IDs and tokens) has to be recovered with the help of the other modality, which establishes the feature-level interaction and alignment via sufficient mutual information extraction between dual modalities. Moreover, we propose to jointly finetune the ID-based model and PLM by adaptively combining the output of both models, thus achieving superior performance in downstream CTR prediction tasks. Extensive experiments on three real-world datasets demonstrate that FLIP outperforms SOTA baselines, and is highly compatible with various ID-based models and PLMs. The code is at \url{https://github.com/justarter/FLIP}.
- J. Lin, W. Liu, X. Dai, W. Zhang, S. Li, R. Tang, X. He, J. Hao, and Y. Yu, “A graph-enhanced click model for web search,” in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021, pp. 1259–1268.
- L. Fu, J. Lin, W. Liu, R. Tang, W. Zhang, R. Zhang, and Y. Yu, “An f-shape click model for information retrieval on multi-block mobile pages,” in Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, 2023, pp. 1057–1065.
- X. Dai, J. Lin, W. Zhang, S. Li, W. Liu, R. Tang, X. He, J. Hao, J. Wang, and Y. Yu, “An adversarial imitation click model for information retrieval,” in Proceedings of the Web Conference 2021, 2021, pp. 1809–1820.
- Y. Xi, J. Lin, W. Liu, X. Dai, W. Zhang, R. Zhang, R. Tang, and Y. Yu, “A bird’s-eye view of reranking: from list level to page level,” in Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, 2023, pp. 1075–1083.
- H. Guo, R. Tang, Y. Ye, Z. Li, and X. He, “Deepfm: a factorization-machine based neural network for ctr prediction,” in Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2017, pp. 1725–1731.
- W. Zhang, J. Qin, W. Guo, R. Tang, and X. He, “Deep learning for click-through rate estimation,” IJCAI, 2021.
- J. Lin, Y. Qu, W. Guo, X. Dai, R. Tang, Y. Yu, and W. Zhang, “Map: A model-agnostic pretraining framework for click-through rate prediction,” in Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2023, pp. 1379–1389.
- Y. Xi, W. Liu, J. Lin, J. Zhu, B. Chen, R. Tang, W. Zhang, R. Zhang, and Y. Yu, “Towards open-world recommendation with knowledge augmentation from large language models,” arXiv preprint arXiv:2306.10933, 2023.
- X. Qiu, T. Sun, Y. Xu, Y. Shao, N. Dai, and X. Huang, “Pre-trained models for natural language processing: A survey,” Science China Technological Sciences, vol. 63, no. 10, pp. 1872–1897, 2020.
- J. Lin, X. Dai, Y. Xi, W. Liu, B. Chen, X. Li, C. Zhu, H. Guo, Y. Yu, R. Tang et al., “How can recommender systems benefit from large language models: A survey,” arXiv preprint arXiv:2306.05817, 2023.
- Y.-W. Chang, C.-J. Hsieh, K.-W. Chang, M. Ringgaard, and C.-J. Lin, “Training and testing low-degree polynomial data mappings via linear svm.” Journal of Machine Learning Research, vol. 11, no. 4, 2010.
- S. Rendle, “Factorization machines,” in ICDM, 2010.
- Y. Qu, H. Cai, K. Ren, W. Zhang, Y. Yu, Y. Wen, and J. Wang, “Product-based neural networks for user response prediction,” in 2016 IEEE 16th international conference on data mining (ICDM). IEEE, 2016, pp. 1149–1154.
- R. Wang, R. Shivanna, D. Cheng, S. Jain, D. Lin, L. Hong, and E. Chi, “Dcn v2: Improved deep & cross network and practical lessons for web-scale learning to rank systems,” in Proceedings of the Web Conference 2021, 2021, pp. 1785–1797.
- X. Xin, B. Chen, X. He, D. Wang, Y. Ding, and J. M. Jose, “Cfm: Convolutional factorization machines for context-aware recommendation.” in IJCAI, vol. 19, 2019, pp. 3926–3932.
- B. Liu, R. Tang, Y. Chen, J. Yu, H. Guo, and Y. Zhang, “Feature generation by convolutional neural network for click-through rate prediction,” in WWW, 2019, pp. 1119–1129.
- W. Song, C. Shi, Z. Xiao, Z. Duan, Y. Xu, M. Zhang, and J. Tang, “Autoint: Automatic feature interaction learning via self-attentive neural networks,” in Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019, pp. 1161–1170.
- J. Xiao, H. Ye, X. He, H. Zhang, F. Wu, and T.-S. Chua, “Attentional factorization machines: learning the weight of feature interactions via attention networks,” in Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2017, pp. 3119–3125.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
- Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.
- G. Liu, J. Yang, and L. Wu, “Ptab: Using the pre-trained language model for modeling tabular data,” arXiv preprint arXiv:2209.08060, 2022.
- A. Muhamed, I. Keivanloo, S. Perera, J. Mracek, Y. Xu, Q. Cui, S. Rajagopalan, B. Zeng, and T. Chilimbi, “Ctr-bert: Cost-effective knowledge distillation for billion-parameter teacher models,” in NeurIPS Efficient Natural Language and Speech Processing Workshop, 2021.
- S. Geng, S. Liu, Z. Fu, Y. Ge, and Y. Zhang, “Recommendation as language processing (rlp): A unified pretrain, personalized prompt & predict paradigm (p5),” in Proceedings of the 16th ACM Conference on Recommender Systems, 2022, pp. 299–315.
- S. Geng, J. Tan, S. Liu, Z. Fu, and Y. Zhang, “Vip5: Towards multimodal foundation models for recommendation,” arXiv preprint arXiv:2305.14302, 2023.
- X. Li, B. Chen, L. Hou, and R. Tang, “Ctrl: Connect tabular and language model for ctr prediction,” arXiv preprint arXiv:2306.02841, 2023.
- J. Li, M. Wang, J. Li, J. Fu, X. Shen, J. Shang, and J. McAuley, “Text is all you need: Learning language representations for sequential recommendation,” arXiv preprint arXiv:2305.13731, 2023.
- A. I. Schein, A. Popescul, L. H. Ungar, and D. M. Pennock, “Methods and metrics for cold-start recommendations,” in Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, 2002, pp. 253–260.
- W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong et al., “A survey of large language models,” arXiv preprint arXiv:2303.18223, 2023.
- W. Wei, X. Ren, J. Tang, Q. Wang, L. Su, S. Cheng, J. Wang, D. Yin, and C. Huang, “Llmrec: Large language models with graph augmentation for recommendation,” arXiv preprint arXiv:2311.00423, 2023.
- X. Ren, W. Wei, L. Xia, L. Su, S. Cheng, J. Wang, D. Yin, and C. Huang, “Representation learning with large language models for recommendation,” arXiv preprint arXiv:2310.15950, 2023.
- K. Bao, J. Zhang, Y. Zhang, W. Wang, F. Feng, and X. He, “Tallrec: An effective and efficient tuning framework to align large language model with recommendation,” arXiv preprint arXiv:2305.00447, 2023.
- W. Zhang, H. Liu, Y. Du, C. Zhu, Y. Song, H. Zhu, and Z. Wu, “Bridging the information gap between domain-specific model and general llm for personalized recommendation,” arXiv preprint arXiv:2311.03778, 2023.
- J. Harte, W. Zorgdrager, P. Louridas, A. Katsifodimos, D. Jannach, and M. Fragkoulis, “Leveraging large language models for sequential recommendation,” in Proceedings of the 17th ACM Conference on Recommender Systems, 2023, pp. 1096–1102.
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in International conference on machine learning. PMLR, 2021, pp. 8748–8763.
- Y. Shi, X. Yang, H. Xu, C. Yuan, B. Li, W. Hu, and Z.-J. Zha, “Emscore: Evaluating video captioning via coarse-grained and fine-grained embedding matching,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 17 929–17 938.
- R. Qiu, Z. Huang, H. Yin, and Z. Wang, “Contrastive learning for representation degeneration problem in sequential recommendation,” in Proceedings of the fifteenth ACM international conference on web search and data mining, 2022, pp. 813–823.
- J. Gao, D. He, X. Tan, T. Qin, L. Wang, and T.-Y. Liu, “Representation degeneration problem in training natural language generation models,” arXiv preprint arXiv:1907.12009, 2019.
- B. Li, H. Zhou, J. He, M. Wang, Y. Yang, and L. Li, “On the sentence embeddings from pre-trained language models,” arXiv preprint arXiv:2011.05864, 2020.
- Y. Koren, “Factorization meets the neighborhood: a multifaceted collaborative filtering model,” in SIGKDD. ACM, 2008, pp. 426–434.
- S. Zhang, L. Yao, and A. Sun, “Deep learning based recommender system: A survey and new perspectives,” arXiv preprint arXiv:1707.07435, 2017.
- S. Rendle, “Factorization machines with libfm,” TIST, 2012.
- Y. Juan, Y. Zhuang, W.-S. Chin, and C.-J. Lin, “Field-aware factorization machines for ctr prediction,” in RecSys, 2016, pp. 43–50.
- H.-T. Cheng, L. Koc, J. Harmsen, T. Shaked, T. Chandra, H. Aradhye, G. Anderson, G. Corrado, W. Chai, M. Ispir et al., “Wide & deep learning for recommender systems,” in Proceedings of the 1st workshop on deep learning for recommender systems, 2016, pp. 7–10.
- R. Wang, B. Fu, G. Fu, and M. Wang, “Deep & cross network for ad click predictions,” in Proceedings of the ADKDD’17, 2017, pp. 1–7.
- J. Lian, X. Zhou, F. Zhang, Z. Chen, X. Xie, and G. Sun, “xdeepfm: Combining explicit and implicit feature interactions for recommender systems,” in KDD, 2018, pp. 1754–1763.
- W. Cheng, Y. Shen, and L. Huang, “Adaptive factorization network: Learning adaptive-order feature interactions,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 04, 2020, pp. 3609–3616.
- A. Radford, K. Narasimhan, T. Salimans, I. Sutskever et al., “Improving language understanding by generative pre-training,” 2018.
- A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever et al., “Language models are unsupervised multitask learners,” OpenAI blog, vol. 1, no. 8, p. 9, 2019.
- T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
- P. Liu, L. Zhang, and J. A. Gulla, “Pre-train, prompt and recommendation: A comprehensive survey of language modelling paradigm adaptations in recommender systems,” arXiv preprint arXiv:2302.03735, 2023.
- L. Li, Y. Zhang, D. Liu, and L. Chen, “Large language models for generative recommendation: A survey and visionary discussions,” arXiv preprint arXiv:2309.01157, 2023.
- Y. Zhu, H. Yuan, S. Wang, J. Liu, W. Liu, C. Deng, Z. Dou, and J.-R. Wen, “Large language models for information retrieval: A survey,” arXiv preprint arXiv:2308.07107, 2023.
- J. Chen, Z. Liu, X. Huang, C. Wu, Q. Liu, G. Jiang, Y. Pu, Y. Lei, X. Chen, X. Wang et al., “When large language models meet personalization: Perspectives of challenges and opportunities,” arXiv preprint arXiv:2307.16376, 2023.
- W. Fan, Z. Zhao, J. Li, Y. Liu, X. Mei, Y. Wang, J. Tang, and Q. Li, “Recommender systems in the era of large language models (llms),” arXiv preprint arXiv:2307.02046, 2023.
- L. Wu, Z. Zheng, Z. Qiu, H. Wang, H. Gu, T. Shen, C. Qin, C. Zhu, H. Zhu, Q. Liu et al., “A survey on large language models for recommendation,” arXiv preprint arXiv:2305.19860, 2023.
- J. Chen, “A Survey on Large Language Models for Personalized and Explainable Recommendations,” arXiv e-prints, p. arXiv:2311.12338, Nov. 2023.
- Y. Zhu, H. Yuan, S. Wang, J. Liu, W. Liu, C. Deng, Z. Dou, and J.-R. Wen, “Large language models for information retrieval: A survey by admin one comment.”
- C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” The Journal of Machine Learning Research, vol. 21, no. 1, pp. 5485–5551, 2020.
- Z. Cui, J. Ma, C. Zhou, J. Zhou, and H. Yang, “M6-rec: Generative pretrained language models are open-ended recommender systems,” arXiv preprint arXiv:2205.08084, 2022.
- J. Lin, R. Men, A. Yang, C. Zhou, M. Ding, Y. Zhang, P. Wang, A. Wang, L. Jiang, X. Jia et al., “M6: A chinese multimodal pretrainer,” arXiv preprint arXiv:2103.00823, 2021.
- I. Beltagy, M. E. Peters, and A. Cohan, “Longformer: The long-document transformer,” arXiv preprint arXiv:2004.05150, 2020.
- J. Liu, C. Liu, R. Lv, K. Zhou, and Y. Zhang, “Is chatgpt a good recommender? a preliminary study,” arXiv preprint arXiv:2304.10149, 2023.
- J. Zhang, R. Xie, Y. Hou, W. X. Zhao, L. Lin, and J.-R. Wen, “Recommendation as instruction following: A large language model empowered recommendation approach,” arXiv preprint arXiv:2305.07001, 2023.
- H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar et al., “Llama: Open and efficient foundation language models,” arXiv preprint arXiv:2302.13971, 2023.
- L. Wang and E.-P. Lim, “Zero-shot next-item recommendation using large pretrained language models,” arXiv preprint arXiv:2304.03153, 2023.
- Y. Ma, Y. Cao, Y. Hong, and A. Sun, “Large language model is not a good few-shot information extractor, but a good reranker for hard samples!” arXiv preprint arXiv:2303.08559, 2023.
- W. Hua, Y. Ge, S. Xu, J. Ji, and Y. Zhang, “Up5: Unbiased foundation model for fairness-aware recommendation,” arXiv preprint arXiv:2305.12090, 2023.
- X. He, J. Pan, O. Jin, T. Xu, B. Liu, T. Xu, Y. Shi, A. Atallah, R. Herbrich, S. Bowers et al., “Practical lessons from predicting clicks on ads at facebook,” in ADKDD, 2014.
- Z. Li, W. Cheng, Y. Chen, H. Chen, and W. Wang, “Interpretable click-through rate prediction through hierarchical attention,” in Proceedings of the 13th International Conference on Web Search and Data Mining, 2020, pp. 313–321.
- S. Hegselmann, A. Buendia, H. Lang, M. Agrawal, X. Jiang, and D. Sontag, “Tabllm: Few-shot classification of tabular data with large language models,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2023, pp. 5549–5581.
- X. Liu, F. Zhang, Z. Hou, L. Mian, Z. Wang, J. Zhang, and J. Tang, “Self-supervised learning: Generative or contrastive,” IEEE Transactions on Knowledge and Data Engineering, 2021.
- P. Wang, J. Xu, C. Liu, H. Feng, Z. Li, and J. Ye, “Masked-field pre-training for user intent prediction,” in Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020, pp. 2789–2796.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, 2017, pp. 5998–6008.
- M. Gutmann and A. Hyvärinen, “Noise-contrastive estimation: A new estimation principle for unnormalized statistical models,” in Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2010, pp. 297–304.
- T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
- J. Li, R. Selvaraju, A. Gotmare, S. Joty, C. Xiong, and S. C. H. Hoi, “Align before fuse: Vision and language representation learning with momentum distillation,” Advances in neural information processing systems, vol. 34, pp. 9694–9705, 2021.
- A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.
- P. Bühlmann, “Bagging, boosting and ensemble methods,” Handbook of computational statistics: Concepts and methods, pp. 985–1022, 2012.
- R. E. Schapire, “Explaining adaboost,” in Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik. Springer, 2013, pp. 37–52.
- F. M. Harper and J. A. Konstan, “The movielens datasets: History and context,” Acm transactions on interactive intelligent systems (tiis), vol. 5, no. 4, pp. 1–19, 2015.
- C.-N. Ziegler, S. M. McNee, J. A. Konstan, and G. Lausen, “Improving recommendation lists through topic diversification,” in Proceedings of the 14th international conference on World Wide Web, 2005, pp. 22–32.
- M. Wan and J. McAuley, “Item recommendation on monotonic behavior chains,” in Proceedings of the 12th ACM conference on recommender systems, 2018, pp. 86–94.
- M. Wan, R. Misra, N. Nakashole, and J. McAuley, “Fine-grained spoiler detection from large-scale review corpora,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019, pp. 2605–2610.
- X. Li, B. Chen, H. Guo, J. Li, C. Zhu, X. Long, S. Li, Y. Wang, W. Guo, L. Mao et al., “Inttower: the next generation of two-tower model for pre-ranking system,” in Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022, pp. 3292–3301.
- ——, “Inttower: the next generation of two-tower model for pre-ranking system,” in Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022, pp. 3292–3301.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, 2019.
- X. Jiao, Y. Yin, L. Shang, X. Jiang, X. Chen, L. Li, F. Wang, and Q. Liu, “Tinybert: Distilling bert for natural language understanding,” in Findings of the Association for Computational Linguistics: EMNLP 2020, 2020, pp. 4163–4174.
- I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.
- D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- F. Wang and H. Liu, “Understanding the behaviour of contrastive loss,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 2495–2504.
- T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International conference on machine learning. PMLR, 2020, pp. 1597–1607.
- Hangyu Wang (6 papers)
- Jianghao Lin (47 papers)
- Xiangyang Li (58 papers)
- Bo Chen (309 papers)
- Chenxu Zhu (14 papers)
- Ruiming Tang (171 papers)
- Weinan Zhang (322 papers)
- Yong Yu (219 papers)