SLMRec: Empowering Small Language Models for Sequential Recommendation (2405.17890v2)
Abstract: Sequential Recommendation (SR) task involves predicting the next item a user is likely to interact with, given their past interactions. The SR models examine the sequence of a user's actions to discern more complex behavioral patterns and temporal dynamics. Recent research demonstrates the great impact of LLMs on sequential recommendation systems, either viewing sequential recommendation as LLMing or serving as the backbone for user representation. Although these methods deliver outstanding performance, there is scant evidence of the necessity of a LLM and how large the LLM is needed, especially in the sequential recommendation scene. Meanwhile, due to the huge size of LLMs, it is inefficient and impractical to apply a LLM-based model in real-world platforms that often need to process billions of traffic logs daily. In this paper, we explore the influence of LLMs' depth by conducting extensive experiments on large-scale industry datasets. Surprisingly, our motivational experiments reveal that most intermediate layers of LLMs are redundant, indicating that pruning the remaining layers can still maintain strong performance. Motivated by this insight, we empower small LLMs for SR, namely SLMRec, which adopt a simple yet effective knowledge distillation method. Moreover, SLMRec is orthogonal to other post-training efficiency techniques, such as quantization and pruning, so that they can be leveraged in combination. Comprehensive experimental results illustrate that the proposed SLMRec model attains the best performance using only 13% of the parameters found in LLM-based recommendation models while simultaneously achieving up to 6.6x and 8.0x speedups in training and inference time costs, respectively. Besides, we provide a theoretical justification for why small LLMs can perform comparably to LLMs in SR.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Fitnets: Hints for thin deep nets. Proc. ICLR, 2(3):1, 2015.
- Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023.
- Slicegpt: Compress large language models by deleting rows and columns. arXiv preprint arXiv:2401.15024, 2024.
- Tallrec: An effective and efficient tuning framework to align large language model with recommendation. In Proceedings of the 17th ACM Conference on Recommender Systems, pages 1007–1014, 2023.
- Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828, 2013.
- Knowledge neurons in pretrained transformers. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8493–8502, 2022.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Tinystories: How small can language models be and still speak coherent english? arXiv preprint arXiv:2305.07759, 2023.
- Reducing transformer depth on demand with structured dropout. arXiv preprint arXiv:1909.11556, 2019.
- Specializing smaller language models towards multi-step reasoning. In International Conference on Machine Learning, pages 10421–10430. PMLR, 2023.
- Recommendation as language processing (rlp): A unified pretrain, personalized prompt & predict paradigm (p5). In Proceedings of the 16th ACM Conference on Recommender Systems, pages 299–315, 2022.
- Knowledge distillation: A survey. International Journal of Computer Vision, 129(6):1789–1819, 2021.
- The unreasonable ineffectiveness of the deeper layers. arXiv preprint arXiv:2403.17887, 2024.
- MiniLLM: Knowledge distillation of large language models. In The Twelfth International Conference on Learning Representations, 2024.
- Does localization inform editing? surprising differences in causality-based localization vs. knowledge editing in language models. Advances in Neural Information Processing Systems, 36, 2024.
- Second order derivatives for network pruning: Optimal brain surgeon. Advances in neural information processing systems, 5, 1992.
- Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939, 2015.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
- Training compute-optimal large language models. arXiv preprint arXiv:2203.15556, 2022.
- Dynabert: Dynamic bert with adaptive width and depth. Advances in Neural Information Processing Systems, 33:9782–9793, 2020.
- Parameter-efficient transfer learning for nlp. In International conference on machine learning, pages 2790–2799. PMLR, 2019.
- Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes. arXiv preprint arXiv:2305.02301, 2023.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2021.
- Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems (TOIS), 20(4):422–446, 2002.
- Genrec: Large language model for generative recommendation. In European Conference on Information Retrieval, pages 494–502. Springer, 2024.
- Tinybert: Distilling bert for natural language understanding. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4163–4174, 2020.
- Self-attentive sequential recommendation. In 2018 IEEE international conference on data mining (ICDM), pages 197–206. IEEE, 2018.
- Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
- Fastformers: Highly efficient transformer models for natural language understanding. arXiv preprint arXiv:2010.13382, 2020.
- Turning dross into gold loss: is bert4rec really better than sasrec? In Proceedings of the 17th ACM Conference on Recommender Systems, pages 1120–1125, 2023.
- On sampled metrics for item recommendation. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1748–1757, 2020.
- The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2021.
- E4srec: An elegant effective efficient extensible solution of large language models for sequential recommendation. arXiv preprint arXiv:2312.02443, 2023.
- Textbooks are all you need ii: phi-1.5 technical report. arXiv preprint arXiv:2309.05463, 2023.
- Less is more: Task-aware layer-wise distillation for language model compression. In International Conference on Machine Learning, pages 20852–20867. PMLR, 2023.
- Llara: Aligning large language models with sequential recommenders. arXiv preprint arXiv:2312.02445, 2023.
- P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 61–68, 2022.
- Shortgpt: Layers in large language models are more redundant than you expect. arXiv preprint arXiv:2403.03853, 2024.
- Locating and editing factual associations in gpt. Advances in Neural Information Processing Systems, 35:17359–17372, 2022.
- Are sixteen heads really better than one? Advances in neural information processing systems, 32, 2019.
- gsasrec: Reducing overconfidence in sequential recommendation trained with negative sampling. In Proceedings of the 17th ACM Conference on Recommender Systems, pages 116–128, 2023.
- On the effect of dropping layers of pre-trained transformer models. Computer Speech & Language, 77:101429, 2023.
- Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on World Wide Web, pages 285–295, 2001.
- Bert4rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management, pages 1441–1450, 2019.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. arXiv preprint arXiv:1905.09418, 2019.
- Rethinking large language model architectures for sequential recommendations. arXiv preprint arXiv:2402.09543, 2024.
- Llmrec: Large language models with graph augmentation for recommendation. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining, pages 806–815, 2024.
- Recurrent recommender networks. In Proceedings of the tenth ACM international conference on web search and data mining, pages 495–503, 2017.
- Fairly evaluating large language model-based recommendation needs revisit the cross-entropy loss. arXiv preprint arXiv:2402.06216, 2024.
- Openp5: Benchmarking foundation models for recommendation. arXiv preprint arXiv:2306.11134, 2023.
- Actions speak louder than words: Trillion-parameter sequential transducers for generative recommendations. arXiv preprint arXiv:2402.17152, 2024.
- Accelerating training of transformer-based language models with progressive layer dropping. Advances in Neural Information Processing Systems, 33:14011–14023, 2020.
- Collm: Integrating collaborative embeddings into large language models for recommendation. arXiv preprint arXiv:2310.19488, 2023.
- Revisiting alternative experimental settings for evaluating top-n item recommendation algorithms. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pages 2329–2332, 2020.
- Collaborative large language model for recommender systems. arXiv preprint arXiv:2311.01343, 2023.
- Wujiang Xu (19 papers)
- Zujie Liang (13 papers)
- Jiaojiao Han (1 paper)
- Xuying Ning (6 papers)
- Wenfang Lin (5 papers)
- Yongfeng Zhang (163 papers)
- Qitian Wu (29 papers)
- Yunxiao Shi (20 papers)