PromptMM: Multi-Modal Knowledge Distillation for Recommendation with Prompt-Tuning (2402.17188v3)
Abstract: Multimedia online platforms (e.g., Amazon, TikTok) have greatly benefited from the incorporation of multimedia (e.g., visual, textual, and acoustic) content into their personal recommender systems. These modalities provide intuitive semantics that facilitate modality-aware user preference modeling. However, two key challenges in multi-modal recommenders remain unresolved: i) The introduction of multi-modal encoders with a large number of additional parameters causes overfitting, given high-dimensional multi-modal features provided by extractors (e.g., ViT, BERT). ii) Side information inevitably introduces inaccuracies and redundancies, which skew the modality-interaction dependency from reflecting true user preference. To tackle these problems, we propose to simplify and empower recommenders through Multi-modal Knowledge Distillation (PromptMM) with the prompt-tuning that enables adaptive quality distillation. Specifically, PromptMM conducts model compression through distilling u-i edge relationship and multi-modal node content from cumbersome teachers to relieve students from the additional feature reduction parameters. To bridge the semantic gap between multi-modal context and collaborative signals for empowering the overfitting teacher, soft prompt-tuning is introduced to perform student task-adaptive. Additionally, to adjust the impact of inaccuracies in multimedia data, a disentangled multi-modal list-wise distillation is developed with modality-aware re-weighting mechanism. Experiments on real-world data demonstrate PromptMM's superiority over existing techniques. Ablation tests confirm the effectiveness of key components. Additional tests show the efficiency and effectiveness.
- Tallrec: An effective and efficient tuning framework to align large language model with recommendation. arXiv preprint arXiv:2305.00447 (2023).
- Language models are few-shot learners. Neural Information Processing Systems 33 (2020), 1877–1901.
- Attentive collaborative filtering: Multimedia recommendation with item-and component-level attention. In ACM Special Interest Group on Information Retrieval. 335–344.
- Heterogeneous Graph Contrastive Learning for Recommendation. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining. 544–552.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
- Invariant Representation Learning for Multimedia Recommendation. In ACM Multimedia Conference. 619–628.
- Jerome H Friedman. 1997. On bias, variance, 0/1—loss, and the curse-of-dimensionality. Data mining and knowledge discovery 1 (1997), 55–77.
- Rong Ge and Tengyu Ma. 2017. On the optimization landscape of tensor decompositions. Neural Information Processing Systems 30 (2017).
- Learning image and user features for recommendation in social networks. In IEEE/CVF International Conference on Computer Vision. 4274–4282.
- Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In AISTATS. JMLR Workshop and Conference Proceedings, 249–256.
- Knowledge distillation: A survey. International Journal of Computer Vision 129 (2021), 1789–1819.
- Deep residual learning for image recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 770–778.
- Ruining He and Julian McAuley. 2016a. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In proceedings of the 25th international conference on world wide web. 507–517.
- Ruining He and Julian McAuley. 2016b. VBPR: visual bayesian personalized ranking from implicit feedback. In Association for the Advancement of Artificial Intelligence, Vol. 30.
- Lightgcn: Simplifying and powering graph convolution network for recommendation. In ACM Special Interest Group on Information Retrieval. 639–648.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
- Graphmae: Self-supervised masked graph autoencoders. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining.
- Visual prompt tuning. In ECCV. Springer, 709–727.
- Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia. 675–678.
- Distillation from Heterogeneous Models for Top-K Recommendation. In ACM International World Wide Web Conference. 801–811.
- Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Conference of the North American Chapter of the Association for Computational Linguistics. 4171–4186.
- On large-batch training for deep learning: Generalization gap and sharp minima. International Conference on Learning Representations (2017).
- Solomon Kullback and Richard A Leibler. 1951. On information and sufficiency. The annals of mathematical statistics 22, 1 (1951), 79–86.
- The power of scale for parameter-efficient prompt tuning. Conference on Empirical Methods in Natural Language Processing (2021).
- Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190 (2021).
- Knowledge Graph Contrastive Learning Based on Relation-Symmetrical Structure. IEEE Transactions on Knowledge and Data Engineering (2023).
- Learn from relational correlations and periodic events for temporal knowledge graph reasoning. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1559–1568.
- Reasoning over different types of knowledge graphs: Static, temporal and multi-modal. arXiv preprint arXiv:2212.05767 (2022).
- Structure guided multi-modal pre-trained transformer for knowledge graph reasoning. arXiv preprint arXiv:2307.03591 (2023).
- Disentangled multimodal representation learning for recommendation. IEEE Transactions on Multimedia (2022).
- Pre-train, Prompt and Recommendation: A Comprehensive Survey of Language Modelling Paradigm Adaptations in Recommender Systems. arXiv preprint arXiv:2302.03735 (2023).
- GraphPrompt: Unifying Pre-Training and Downstream Tasks for Graph Neural Networks. ACM International World Wide Web Conference (2023).
- Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. In International Conference on Learning Representations.
- Causalrec: Causal inference for visual debiasing in visually-aware recommendation. In ACM Multimedia Conference. ACM, 3844–3852.
- Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Conference on Empirical Methods in Natural Language Processing.
- Representation learning with large language models for recommendation. arXiv preprint arXiv:2310.15950 (2023).
- SSLRec: A Self-Supervised Learning Library for Recommendation. arXiv preprint arXiv:2308.05697 (2023).
- BPR: Bayesian personalized ranking from implicit feedback. In UAI.
- Gppt: Graph pre-training and prompt tuning to generalize graph neural networks. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1717–1727.
- Graphgpt: Graph instruction tuning for large language models. arXiv preprint arXiv:2310.13023 (2023).
- Self-supervised Learning for Multimedia Recommendation. Transactions on Multimedia (TMM) (2022).
- Learning to Denoise Unreliable Interactions for Graph Collaborative Filtering. In Proceedings of the 45th International ACM ACM Special Interest Group on Information Retrieval Conference on Research and Development in Information Retrieval. 122–132.
- Heterogeneous graph masked autoencoders. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 9997–10005.
- TinyLLM: Learning a Small Student from Multiple Large Language Models. arXiv preprint arXiv:2402.04616 (2024).
- Knowledge Distillation on Graphs: A Survey. arXiv preprint arXiv:2302.00219 (2023).
- Graph neural prompting with large language models. arXiv preprint arXiv:2309.15427 (2023).
- Target Interest Distillation for Multi-Interest Recommendation. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 2007–2016.
- DualGNN: Dual Graph Neural Network for Multimedia Recommendation. Transactions on Multimedia (TMM) (2021).
- Enhancing collaborative filtering with generative augmentation. In Proceedings of the 25th ACM SIGACM SIGKDD Conference on Knowledge Discovery and Data Mining International Conference on Knowledge Discovery & Data Mining. 548–556.
- Neural Graph Collaborative Filtering. In ACM Special Interest Group on Information Retrieval.
- PLATE: A Prompt-Enhanced Paradigm for Multi-Scenario Recommendations. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1498–1507.
- Contrastive meta learning with behavior multiplicity for recommendation. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. 1120–1128.
- Multi-Modal Self-Supervised Learning for Recommendation. In ACM International World Wide Web Conference. 790–800.
- Llmrec: Large language models with graph augmentation for recommendation. arXiv preprint arXiv:2311.00423 (2023).
- Hierarchical user intent graph network for multimedia recommendation. Transactions on Multimedia (TMM) (2021).
- Contrastive learning for cold-start recommendation. In ACM Multimedia Conference. 5382–5390.
- Graph-refined convolutional network for multimedia recommendation with implicit feedback. In ACM Multimedia Conference. 3541–3549.
- MMGCN: Multi-modal graph convolution network for personalized recommendation of micro-video. In ACM Multimedia Conference. 1437–1445.
- Selective fairness in recommendation via prompts. In ACM Special Interest Group on Information Retrieval. 2657–2662.
- On-Device Next-Item Recommendation with Self-Supervised Knowledge Distillation. In ACM Special Interest Group on Information Retrieval. 546–555.
- Rethinking Reinforcement Learning for Recommendation: A Prompt Perspective. In ACM Special Interest Group on Information Retrieval. 1347–1357.
- Why do we click: visual impression-aware news recommendation. In Proceedings of the 29th ACM International Conference on Multimedia. 3881–3890.
- Multi-modal Graph Contrastive Learning for Micro-video Recommendation. In ACM Special Interest Group on Information Retrieval. 1807–1811.
- Mining Latent Structures for Multimedia Recommendation. In ACM Multimedia Conference. 3872–3880.
- Latent Structure Mining with Contrastive Modality Fusion for Multimedia Recommendation. IEEE Transactions on Knowledge and Data Engineering (2022).
- Structure Pretraining and Prompt Tuning for Knowledge Graph Transfer. ACM International World Wide Web Conference (2023).