Distillation Matters: Empowering Sequential Recommenders to Match the Performance of Large Language Model (2405.00338v3)
Abstract: Owing to their powerful semantic reasoning capabilities, LLMs have been effectively utilized as recommenders, achieving impressive performance. However, the high inference latency of LLMs significantly restricts their practical deployment. To address this issue, this work investigates knowledge distillation from cumbersome LLM-based recommendation models to lightweight conventional sequential models. It encounters three challenges: 1) the teacher's knowledge may not always be reliable; 2) the capacity gap between the teacher and student makes it difficult for the student to assimilate the teacher's knowledge; 3) divergence in semantic space poses a challenge to distill the knowledge from embeddings. To tackle these challenges, this work proposes a novel distillation strategy, DLLM2Rec, specifically tailored for knowledge distillation from LLM-based recommendation models to conventional sequential models. DLLM2Rec comprises: 1) Importance-aware ranking distillation, which filters reliable and student-friendly knowledge by weighting instances according to teacher confidence and student-teacher consistency; 2) Collaborative embedding distillation integrates knowledge from teacher embeddings with collaborative signals mined from the data. Extensive experiments demonstrate the effectiveness of the proposed DLLM2Rec, boosting three typical sequential models with an average improvement of 47.97%, even enabling them to surpass LLM-based recommenders in some cases.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
- Fitnets: Hints for thin deep nets. Proc. ICLR 2, 3 (2015), 1.
- A bi-step grounding paradigm for large language models in recommendation systems. arXiv preprint arXiv:2308.08434 (2023).
- Tallrec: An effective and efficient tuning framework to align large language model with recommendation. In Proceedings of the 17th ACM Conference on Recommender Systems. 1007–1014.
- Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
- Aligning Large Language Models with Recommendation Knowledge. arXiv preprint arXiv:2404.00245 (2024).
- Sequential recommendation with graph neural networks. In Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. 378–387.
- Unbiased knowledge distillation for recommendation. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining. 976–984.
- Sequential recommendation with user memory networks. In Proceedings of the eleventh ACM international conference on web search and data mining. 108–116.
- Jang Hyun Cho and Bharath Hariharan. 2019. On the efficacy of knowledge distillation. In Proceedings of the IEEE/CVF international conference on computer vision. 4794–4802.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
- Agree to disagree: Adaptive ensemble knowledge distillation in gradient space. advances in neural information processing systems 33 (2020), 12345–12355.
- Deep learning for sequential recommendation: Algorithms, influential factors, and evaluations. ACM Transactions on Information Systems (TOIS) 39, 1 (2020), 1–42.
- Chat-rec: Towards interactive and explainable llms-augmented recommender system. arXiv preprint arXiv:2303.14524 (2023).
- Breaking the Length Barrier: LLM-Enhanced CTR Prediction in Long Textual User Behaviors. arXiv preprint arXiv:2403.19347 (2024).
- Knowledge distillation: A survey. International Journal of Computer Vision 129, 6 (2021), 1789–1819.
- Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 639–648.
- Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
- Learning vector-quantized item representation for transferable sequential recommenders. In Proceedings of the ACM Web Conference 2023. 1162–1171.
- Towards universal sequence representation learning for recommender systems. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 585–593.
- Large language models are zero-shot rankers for recommender systems. In European Conference on Information Retrieval. Springer, 364–381.
- Exact and Efficient Unlearning for Large Language Model-based Recommendation. arXiv:2404.10327 [cs.IR]
- Knowledge distillation from a stronger teacher. Advances in Neural Information Processing Systems 35 (2022), 33716–33727.
- Item-side Fairness of Large Language Model-based Recommendation System. arXiv preprint arXiv:2402.15215 (2024).
- DE-RRD: A knowledge distillation framework for recommender system. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 605–614.
- Topology distillation for recommender system. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 829–839.
- Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In 2018 IEEE international conference on data mining (ICDM). IEEE, 197–206.
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
- Collaborative distillation for top-N recommendation. In 2019 IEEE International Conference on Data Mining (ICDM). IEEE, 369–378.
- Youngjune Lee and Kee-Eung Kim. 2021. Dual correction strategy for ranking distillation in top-n recommender system. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 3186–3190.
- Time interval aware self-attention for sequential recommendation. In Proceedings of the 13th international conference on web search and data mining. 322–330.
- Prompt distillation for efficient llm-based recommendation. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 1348–1357.
- Citation-Enhanced Generation for LLM-based Chatbots. arXiv:2402.16063 [cs.CL]
- Exploring fine-tuning chatgpt for news recommendation. arXiv preprint arXiv:2311.05850 (2023).
- PAP-REC: Personalized Automatic Prompt for Recommendation Language Model. arXiv:2402.00284 [cs.IR]
- Llara: Aligning large language models with sequential recommenders. arXiv preprint arXiv:2312.02445 (2023).
- Rella: Retrieval-enhanced large language models for lifelong sequential behavior comprehension in recommendation. arXiv preprint arXiv:2308.11131 (2023).
- A multi-facet paradigm to bridge large language model and recommendation. arXiv preprint arXiv:2310.06491 (2023).
- Data-efficient Fine-tuning for LLM-based Recommendation. arXiv preprint arXiv:2401.17197 (2024).
- A first look at llm-powered generative news recommendation. arXiv preprint arXiv:2305.06566 (2023).
- Accurate sum and dot product. SIAM Journal on Scientific Computing 26, 6 (2005), 1955–1988.
- Allan Pinkus. 1999. Approximation theory of the MLP model in neural networks. Acta numerica 8 (1999), 143–195.
- D2K: Turning Historical Data into Retrievable Knowledge for Recommender Systems. arXiv preprint arXiv:2401.11478 (2024).
- Recommender systems with generative retrieval. Advances in Neural Information Processing Systems 36 (2024).
- Representation learning with large language models for recommendation. arXiv preprint arXiv:2310.15950 (2023).
- Usha Ruby and Vamsidhar Yendapalli. 2020. Binary cross entropy with deep learning technique for image classification. Int. J. Adv. Trends Comput. Sci. Eng 9, 10 (2020).
- Enhancing Long-Term Recommendation with Bi-level Learnable Large Language Model Planning. arXiv preprint arXiv:2403.00843 (2024).
- BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management. 1441–1450.
- Large Language Models Enhanced Collaborative Filtering. arXiv preprint arXiv:2403.17688 (2024).
- Towards LLM-RecSys Alignment with Textual ID Learning. arXiv:2403.19021 [cs.IR]
- Jiaxi Tang and Ke Wang. 2018a. Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the eleventh ACM international conference on web search and data mining. 565–573.
- Jiaxi Tang and Ke Wang. 2018b. Ranking distillation: Learning compact ranking models with high performance for recommender system. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2289–2298.
- Revisiting graph based social recommendation: A distillation enhanced social graph network. In Proceedings of the ACM Web Conference 2022. 2830–2838.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
- Towards Efficient and Effective Unlearning of Large Language Models for Recommendation. arXiv:2403.03536 [cs.IR]
- ALT: Towards Fine-grained Alignment between Language and CTR Models for Click-Through Rate Prediction. arXiv preprint arXiv:2310.19453 (2023).
- Lei Wang and Ee-Peng Lim. 2023. Zero-shot next-item recommendation using large pretrained language models. arXiv preprint arXiv:2304.03153 (2023).
- Sequential recommender systems: challenges, progress and prospects. arXiv preprint arXiv:2001.04830 (2019).
- Enhanced Generative Recommendation via Content and Collaboration Integration. arXiv preprint arXiv:2403.18480 (2024).
- Can Small Language Models be Good Reasoners for Sequential Recommendation? arXiv preprint arXiv:2403.04260 (2024).
- To Recommend or Not: Recommendability Identification in Conversations with Pre-trained Language Models. arXiv:2403.18628 [cs.IR]
- Llmrec: Large language models with graph augmentation for recommendation. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining. 806–815.
- Towards open-world recommendation with knowledge augmentation from large language models. arXiv preprint arXiv:2306.10933 (2023).
- Contrastive learning for sequential recommendation. In 2022 IEEE 38th international conference on data engineering (ICDE). IEEE, 1259–1273.
- Enhancing Content-based Recommendation via Large Language Model. arXiv preprint arXiv:2404.00236 (2024).
- Ukd: Debiasing conversion rate estimation via uncertainty-regularized knowledge distillation. In Proceedings of the ACM Web Conference 2022. 2078–2087.
- Common Sense Enhanced Knowledge-based Recommendation with Large Language Model. arXiv preprint arXiv:2403.18325 (2024).
- A generic learning framework for sequential recommendation with distribution shifts. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 331–340.
- Learning from multiple teacher networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1285–1294.
- Prospect Personalized Recommendation on Large Language Model-based Agent Platform. arXiv:2402.18240 [cs.IR]
- Recommendation as instruction following: A large language model empowered recommendation approach. arXiv preprint arXiv:2305.07001 (2023).
- Tired of Plugins? Large Language Models Can Be End-To-End Recommenders. arXiv preprint arXiv:2404.00702 (2024).
- Adapting large language models by integrating collaborative semantics for recommendation. arXiv preprint arXiv:2311.09049 (2023).
- Harnessing Large Language Models for Text-Rich Sequential Recommendation. arXiv preprint arXiv:2403.13325 (2024).
- Ensembled CTR prediction via knowledge distillation. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2941–2958.
- Collaborative large language model for recommender systems. arXiv preprint arXiv:2311.01343 (2023).
- Yu Cui (29 papers)
- Feng Liu (1212 papers)
- Pengbo Wang (8 papers)
- Bohao Wang (11 papers)
- Heng Tang (7 papers)
- Yi Wan (35 papers)
- Jun Wang (990 papers)
- Jiawei Chen (160 papers)