Exploring Adapter-based Transfer Learning for Recommender Systems: Empirical Studies and Practical Insights (2305.15036v2)
Abstract: Adapters, a plug-in neural network module with some tunable parameters, have emerged as a parameter-efficient transfer learning technique for adapting pre-trained models to downstream tasks, especially for NLP and computer vision (CV) fields. Meanwhile, learning recommendation models directly from raw item modality features -- e.g., texts of NLP and images of CV -- can enable effective and transferable recommender systems (called TransRec). In view of this, a natural question arises: can adapter-based learning techniques achieve parameter-efficient TransRec with good performance? To this end, we perform empirical studies to address several key sub-questions. First, we ask whether the adapter-based TransRec performs comparably to TransRec based on standard full-parameter fine-tuning? does it hold for recommendation with different item modalities, e.g., textual RS and visual RS. If yes, we benchmark these existing adapters, which have been shown to be effective in NLP and CV tasks, in item recommendation tasks. Third, we carefully study several key factors for the adapter-based TransRec in terms of where and how to insert these adapters? Finally, we look at the effects of adapter-based TransRec by either scaling up its source training data or scaling down its target training data. Our paper provides key insights and practical guidance on unified & transferable recommendation -- a less studied recommendation scenario. We release our codes and other materials at: https://github.com/westlake-repl/Adapter4Rec/.
- Tallrec: An effective and efficient tuning framework to align large language model with recommendation. arXiv preprint arXiv:2305.00447 (2023).
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021).
- Tom Brown et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
- Revisiting parameter-efficient tuning: Are we really there yet? arXiv preprint arXiv:2202.07962 (2022).
- Vision transformer adapter for dense predictions. arXiv preprint arXiv:2205.08534 (2022).
- An Image Dataset for Benchmarking Recommender Systems with Raw Pixels. arXiv preprint arXiv:2309.06789 (2023).
- M6-rec: Generative pretrained language models are open-ended recommender systems. arXiv preprint arXiv:2205.08084 (2022).
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
- Zero-shot recommender systems. arXiv preprint arXiv:2105.08318 (2021).
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
- Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:2304.15010 (2023).
- ALGRNet: Multi-Relational Adaptive Facial Action Unit Modelling for Face Representation and Relevant Recognitions. IEEE Transactions on Biometrics, Behavior, and Identity Science (2023).
- Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5). arXiv preprint arXiv:2203.13366 (2022).
- VIP5: Towards Multimodal Foundation Models for Recommendation. arXiv preprint arXiv:2305.14302 (2023).
- The adressa dataset for news recommendation. In Proceedings of the international conference on web intelligence. 1042–1048.
- Robust transfer learning with pretrained language models through adapters. arXiv preprint arXiv:2108.02340 (2021).
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16000–16009.
- Ruining He and Julian McAuley. 2016. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In proceedings of the 25th international conference on world wide web. 507–517.
- Parameter-efficient Fine-tuning for Vision Transformers. https://doi.org/10.48550/ARXIV.2203.16329
- Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web. 173–182.
- Towards Universal Sequence Representation Learning for Recommender Systems. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 585–593.
- Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning. PMLR, 2790–2799.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
- Visual prompt tuning. In European Conference on Computer Vision. Springer, 709–727.
- Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In 2018 IEEE international conference on data mining (ICDM). IEEE, 197–206.
- Compacter: Efficient low-rank hypercomplex adapter layers. Advances in Neural Information Processing Systems 34 (2021), 1022–1035.
- Walid Krichene and Steffen Rendle. 2020. On Sampled Metrics for Item Recommendation. In KDD.
- The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691 (2021).
- Transfer learning for collaborative filtering via a rating-matrix generative model. In Proceedings of the 26th annual international conference on machine learning. 617–624.
- Personalized prompt learning for explainable recommendation. arXiv preprint arXiv:2202.07371 (2022).
- Prompt Tuning Large Language Models on Personalized Aspect Extraction for Recommendations. arXiv preprint arXiv:2306.01475 (2023).
- Exploring the Upper Limits of Text-Based Collaborative Filtering Using Large Language Models: Discoveries and Insights. arXiv preprint arXiv:2305.11700 (2023).
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
- ID Embedding as Subtle Features of Content and Structure for Multimodal Recommendation. arXiv preprint arXiv:2311.05956 (2023).
- Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 10012–10022.
- XPrompt: Exploring the Extreme of Prompt Tuning. arXiv preprint arXiv:2210.04457 (2022).
- Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1930–1939.
- Scattered or Connected? An Optimized Parameter-efficient Tuning Approach for Information Retrieval. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 1471–1480.
- Image-based recommendations on styles and substitutes. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. 43–52.
- A Content-Driven Micro-Video Recommendation Dataset at Scale. arXiv preprint arXiv:2309.15379 (2023).
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).
- Transfer learning to predict missing ratings via heterogeneous user feedbacks. In Twenty-Second International Joint Conference on Artificial Intelligence.
- Transfer learning in collaborative filtering for sparsity reduction. In Proceedings of the AAAI conference on artificial intelligence, Vol. 24. 230–235.
- Adapterhub: A framework for adapting transformers. arXiv preprint arXiv:2007.07779 (2020).
- Mad-x: An adapter-based framework for multi-task cross-lingual transfer. arXiv preprint arXiv:2005.00052 (2020).
- External Knowledge Infusion for Tabular Pre-training Models with Dual-adapters. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1401–1409.
- Thoroughly Modeling Multi-domain Pre-trained Recommendation as Language. arXiv preprint arXiv:2310.13540 (2023).
- Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
- Recommender Systems with Generative Retrieval. arXiv preprint arXiv:2305.05065 (2023).
- One model to serve all: Star topology adaptive recommender for multi-domain ctr prediction. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 4104–4113.
- One4all user representation for recommender systems in e-commerce. arXiv preprint arXiv:2106.00573 (2021).
- Scaling Law for Recommendation Models: Towards General-purpose User Representations. arXiv preprint arXiv:2111.11294 (2021).
- BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management. 1441–1450.
- Multi-modal knowledge graphs for recommender systems. In Proceedings of the 29th ACM international conference on information & knowledge management. 1405–1414.
- VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5227–5237.
- Attention is all you need. Advances in neural information processing systems 30 (2017).
- Transrec: Learning transferable recommendation from mixture-of-modality feedback. arXiv preprint arXiv:2206.06190 (2022).
- MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation. In Proceedings of the 31st ACM International Conference on Multimedia. 6548–6557.
- K-adapter: Infusing knowledge into pre-trained models with adapters. arXiv preprint arXiv:2002.01808 (2020).
- Towards Unified Conversational Recommender Systems via Knowledge-Enhanced Prompt Learning. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1929–1937.
- Multi-Modal Self-Supervised Learning for Recommendation. In Proceedings of the ACM Web Conference 2023. 790–800.
- Empowering news recommendation with pre-trained language models. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1652–1656.
- Mm-rec: multimodal news recommendation. arXiv preprint arXiv:2104.07407 (2021).
- Two birds with one stone: Unified model learning for both recall and ranking in news recommendation. arXiv preprint arXiv:2104.07404 (2021).
- End-to-end Learnable Diversity-aware News Recommendation. arXiv preprint arXiv:2204.00539 (2022).
- Mind: A large-scale dataset for news recommendation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 3597–3606.
- Selective fairness in recommendation via prompts. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2657–2662.
- Personalized Prompts for Sequential Recommendation. arXiv preprint arXiv:2205.09666 (2022).
- Training large-scale news recommenders with pretrained language models in the loop. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4215–4225.
- Rethinking Reinforcement Learning for Recommendation: A Prompt Perspective. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1347–1357.
- Collaborative Word-based Pre-trained Item Representation for Transferable Recommendation. arXiv preprint arXiv:2311.10501 (2023).
- Parameter-Efficient Transfer from Sequential Behaviors for User Modeling and Recommendation. arXiv e-prints (2020), arXiv–2001.
- One person, one model, one world: Learning continual user representation without forgetting. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 696–705.
- Where to go next for recommender systems? id-vs. modality-based recommender models revisited. arXiv preprint arXiv:2303.13835 (2023).
- NineRec: A Benchmark Dataset Suite for Evaluating Transferable Recommendation. arXiv preprint arXiv:2309.07705 (2023).
- Llama-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199 (2023).
- Zizhuo Zhang and Bang Wang. 2023. Prompt learning for news recommendation. arXiv preprint arXiv:2304.05263 (2023).
- Serial or parallel? plug-able adapter for multilingual machine translation. arXiv preprint arXiv:2104.08154 6, 3 (2021).