Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DREAM: A Dual Representation Learning Model for Multimodal Recommendation (2404.11119v2)

Published 17 Apr 2024 in cs.IR and cs.MM

Abstract: Multimodal recommendation focuses primarily on effectively exploiting both behavioral and multimodal information for the recommendation task. However, most existing models suffer from the following issues when fusing information from two different domains: (1) Previous works do not pay attention to the sufficient utilization of modal information by only using direct concatenation, addition, or simple linear layers for modal information extraction. (2) Previous works treat modal features as learnable embeddings, which causes the modal embeddings to gradually deviate from the original modal features during learning. We refer to this issue as Modal Information Forgetting. (3) Previous approaches fail to account for the significant differences in the distribution between behavior and modality, leading to the issue of representation misalignment. To address these challenges, this paper proposes a novel Dual REpresentAtion learning model for Multimodal Recommendation called DREAM. For sufficient information extraction, we introduce separate dual lines, including Behavior Line and Modal Line, in which the Modal-specific Encoder is applied to empower modal representations. To address the issue of Modal Information Forgetting, we introduce the Similarity Supervised Signal to constrain the modal representations. Additionally, we design a Behavior-Modal Alignment module to fuse the dual representations through Intra-Alignment and Inter-Alignment. Extensive experiments on three public datasets demonstrate that the proposed DREAM method achieves state-of-the-art (SOTA) results. The source code will be available upon acceptance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Deep canonical correlation analysis. In International conference on machine learning. PMLR, 1247–1255.
  2. Attentive collaborative filtering: Multimedia recommendation with item-and component-level attention. In Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval. 335–344.
  3. Heterogeneous graph contrastive learning for recommendation. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining. 544–552.
  4. Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 249–256.
  5. LGMRec: Local and Global Graph Learning for Multimodal Recommendation. arXiv:2312.16400 [cs.IR]
  6. Canonical correlation analysis: An overview with application to learning methods. Neural computation 16, 12 (2004), 2639–2664.
  7. Ruining He and Julian McAuley. 2016a. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In proceedings of the 25th international conference on world wide web. 507–517.
  8. Ruining He and Julian McAuley. 2016b. VBPR: visual bayesian personalized ranking from implicit feedback. In Proceedings of the AAAI conference on artificial intelligence, Vol. 30.
  9. Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 639–648.
  10. Harold Hotelling. 1992. Relations between two sets of variates. In Breakthroughs in statistics: methodology and distribution. Springer, 162–190.
  11. Longlong Jing and Yingli Tian. 2020. Self-supervised visual feature learning with deep neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence 43, 11 (2020), 4037–4058.
  12. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
  13. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30–37.
  14. Bootstrapping user and item representations for one-class collaborative filtering. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 317–326.
  15. A Survey of Multi-View Representation Learning. IEEE Transactions on Knowledge and Data Engineering 31, 10 (oct 2019), 1863–1883. https://doi.org/10.1109/tkde.2018.2872063
  16. User Diverse Preference Modeling by Multimodal Attentive Metric Learning. In Proceedings of the 27th ACM International Conference on Multimedia. ACM. https://doi.org/10.1145/3343031.3350953
  17. Deepstyle: Learning user preferences for visual recommendation. In Proceedings of the 40th international acm sigir conference on research and development in information retrieval. 841–844.
  18. Self-supervised learning: Generative or contrastive. IEEE Transactions on Knowledge and Data Engineering 35, 1 (2021), 857–876.
  19. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).
  20. BPR: Bayesian personalized ranking from implicit feedback. arXiv preprint arXiv:1205.2618 (2012).
  21. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
  22. Self-supervised learning for multimedia recommendation. IEEE Transactions on Multimedia (2022).
  23. Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, 11 (2008).
  24. Dualgnn: Dual graph neural network for multimedia recommendation. IEEE Transactions on Multimedia (2021).
  25. Neural Graph Collaborative Filtering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM. https://doi.org/10.1145/3331184.3331267
  26. Graph-refined convolutional network for multimedia recommendation with implicit feedback. In Proceedings of the 28th ACM international conference on multimedia. 3541–3549.
  27. MMGCN: Multi-modal graph convolution network for personalized recommendation of micro-video. In Proceedings of the 27th ACM international conference on multimedia. 1437–1445.
  28. Graph neural networks in recommender systems: a survey. Comput. Surveys 55, 5 (2022), 1–37.
  29. Multi-View Graph Convolutional Network for Multimedia Recommendation. arXiv preprint arXiv:2308.03588 (2023).
  30. Mining latent structures for multimedia recommendation. In Proceedings of the 29th ACM International Conference on Multimedia. 3872–3880.
  31. Latent structure mining with contrastive modality fusion for multimedia recommendation. IEEE Transactions on Knowledge and Data Engineering (2022).
  32. Enhancing Dyadic Relations with Homogeneous Graphs for Multimodal Recommendation. arXiv preprint arXiv:2301.12097 (2023).
  33. A Comprehensive Survey on Multimodal Recommender Systems: Taxonomy, Evaluation, and Future Directions. arXiv preprint arXiv:2302.04473 (2023).
  34. Xin Zhou. 2022. A Tale of Two Graphs: Freezing and Denoising Graph Structures for Multimodal Recommendation. arXiv preprint arXiv:2211.06924 (2022).
  35. Bootstrap latent representations for multi-modal recommendation. In Proceedings of the ACM Web Conference 2023. 845–854.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Kangning Zhang (7 papers)
  2. Yingjie Qin (5 papers)
  3. Ruilong Su (4 papers)
  4. Yifan Liu (135 papers)
  5. Jiarui Jin (23 papers)
  6. Weinan Zhang (322 papers)
  7. Yong Yu (219 papers)

Summary

We haven't generated a summary for this paper yet.