Disentangled Graph Variational Auto-Encoder for Multimodal Recommendation with Interpretability (2402.16110v1)
Abstract: Multimodal recommender systems amalgamate multimodal information (e.g., textual descriptions, images) into a collaborative filtering framework to provide more accurate recommendations. While the incorporation of multimodal information could enhance the interpretability of these systems, current multimodal models represent users and items utilizing entangled numerical vectors, rendering them arduous to interpret. To address this, we propose a Disentangled Graph Variational Auto-Encoder (DGVAE) that aims to enhance both model and recommendation interpretability. DGVAE initially projects multimodal information into textual contents, such as converting images to text, by harnessing state-of-the-art multimodal pre-training technologies. It then constructs a frozen item-item graph and encodes the contents and interactions into two sets of disentangled representations utilizing a simplified residual graph convolutional network. DGVAE further regularizes these disentangled representations through mutual information maximization, aligning the representations derived from the interactions between users and items with those learned from textual content. This alignment facilitates the interpretation of user binary interactions via text. Our empirical analysis conducted on three real-world datasets demonstrates that DGVAE significantly surpasses the performance of state-of-the-art baselines by a margin of 10.02%. We also furnish a case study from a real-world dataset to illustrate the interpretability of DGVAE. Code is available at: \url{https://github.com/enoche/DGVAE}.
- Y. Koren, “Factorization meets the neighborhood: a multifaceted collaborative filtering model,” in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008, pp. 426–434.
- J. Jing, Y. Zhang, X. Zhou, and Z. Shen, “Capturing popularity trends: A simplistic non-personalized approach for enhanced item recommendation,” in Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023, pp. 1014–1024.
- X. Zhou, A. Sun, Y. Liu, J. Zhang, and C. Miao, “Selfcf: A simple framework for self-supervised collaborative filtering,” ACM Transactions on Recommender Systems, vol. 1, no. 2, pp. 1–25, 2023.
- R. He and J. McAuley, “Vbpr: visual bayesian personalized ranking from implicit feedback,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30, no. 1, 2016.
- Q. Liu, S. Wu, and L. Wang, “Deepstyle: Learning user preferences for visual recommendation,” in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017, pp. 841–844.
- J. Chen, H. Zhang, X. He, L. Nie, W. Liu, and T.-S. Chua, “Attentive collaborative filtering: Multimedia recommendation with item-and component-level attention,” in Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval, 2017, pp. 335–344.
- F. Liu, Z. Cheng, C. Sun, Y. Wang, L. Nie, and M. Kankanhalli, “User diverse preference modeling by multimodal attentive metric learning,” in Proceedings of the 27th ACM International Conference on Multimedia, 2019, pp. 1526–1534.
- X. Chen, H. Chen, H. Xu, Y. Zhang, Y. Cao, Z. Qin, and H. Zha, “Personalized fashion recommendation with visual explanations based on multimodal attention network: Towards visually explainable recommendation,” in Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019, pp. 765–774.
- X. He, K. Deng, X. Wang, Y. Li, Y. Zhang, and M. Wang, “Lightgcn: Simplifying and powering graph convolution network for recommendation,” in Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020, pp. 639–648.
- S. Wu, F. Sun, W. Zhang, X. Xie, and B. Cui, “Graph neural networks in recommender systems: a survey,” ACM Computing Surveys, vol. 55, no. 5, pp. 1–37, 2022.
- L. Zhang, Y. Liu, X. Zhou, C. Miao, G. Wang, and H. Tang, “Diffusion-based graph contrastive learning for recommendation with implicit feedback,” in International Conference on Database Systems for Advanced Applications. Springer, 2022, pp. 232–247.
- X. Zhou, D. Lin, Y. Liu, and C. Miao, “Layer-refined graph convolutional networks for recommendation,” in 2023 IEEE 39th International Conference on Data Engineering. IEEE, 2023, pp. 1247–1259.
- Y. Wei, X. Wang, L. Nie, X. He, R. Hong, and T.-S. Chua, “Mmgcn: Multi-modal graph convolution network for personalized recommendation of micro-video,” in Proceedings of the 27th ACM International Conference on Multimedia, 2019, pp. 1437–1445.
- Y. Wei, X. Wang, L. Nie, X. He, and T.-S. Chua, “Graph-refined convolutional network for multimedia recommendation with implicit feedback,” in Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 3541–3549.
- Q. Wang, Y. Wei, J. Yin, J. Wu, X. Song, and L. Nie, “Dualgnn: Dual graph neural network for multimedia recommendation,” IEEE Transactions on Multimedia, vol. 25, pp. 1074–1084, 2021.
- J. Zhang, Y. Zhu, Q. Liu, S. Wu, S. Wang, and L. Wang, “Mining latent structures for multimedia recommendation,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 3872–3880.
- H. Zhou, X. Zhou, L. Zhang, and Z. Shen, “Enhancing dyadic relations with homogeneous graphs for multimodal recommendation,” in ECAI, 2023, pp. 3123–3130.
- J. Yi and Z. Chen, “Multi-modal variational graph auto-encoder for recommendation systems,” IEEE Transactions on Multimedia, vol. 24, pp. 1067–1079, 2021.
- T. N. Kipf and M. Welling, “Variational graph auto-encoders,” arXiv preprint arXiv:1611.07308, 2016.
- N. Tintarev, “Explanations of recommendations,” in Proceedings of the 2007 ACM conference on recommender systems, 2007, pp. 203–206.
- M. Caro-Martínez, G. Jiménez-Díaz, and J. A. Recio-Garcia, “A graph-based approach for minimising the knowledge requirement of explainable recommender systems,” Knowledge and Information Systems, pp. 1–31, 2023.
- Y. Zhang and X. Chen, “Explainable recommendation: A survey and new perspectives,” Foundations and Trends® in Information Retrieval, vol. 14, no. 1, pp. 1–101, 2020.
- H. Zhou, X. Zhou, Z. Zeng, L. Zhang, and Z. Shen, “A comprehensive survey on multimodal recommender systems: Taxonomy, evaluation, and future directions,” arXiv preprint arXiv:2302.04473, 2023.
- X. Zhou and Z. Shen, “A tale of two graphs: Freezing and denoising graph structures for multimodal recommendation,” in Proceedings of the 31st ACM International Conference on Multimedia, 2023, pp. 935–943.
- F. Liu, H. Chen, Z. Cheng, A. Liu, L. Nie, and M. Kankanhalli, “Disentangled multimodal representation learning for recommendation,” IEEE Transactions on Multimedia, vol. 25, pp. 7149–7159, 2022.
- Z. Tao, X. Liu, Y. Xia, X. Wang, L. Yang, X. Huang, and T.-S. Chua, “Self-supervised learning for multimedia recommendation,” IEEE Transactions on Multimedia, vol. 25, pp. 5107–5116, 2022.
- X. Zhou, H. Zhou, Y. Liu, Z. Zeng, C. Miao, P. Wang, Y. You, and F. Jiang, “Bootstrap latent representations for multi-modal recommendation,” in Proceedings of the ACM Web Conference, 2023, pp. 845–854.
- D. Liang, R. G. Krishnan, M. D. Hoffman, and T. Jebara, “Variational autoencoders for collaborative filtering,” in The World Wide Web Conference, 2018, pp. 689–698.
- B. Askari, J. Szlichta, and A. Salehi-Abari, “Variational autoencoders for top-k recommendation with implicit feedback,” in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021, pp. 2061–2065.
- X. Li and J. She, “Collaborative variational autoencoder for recommender systems,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, pp. 305–314.
- Y. Zhu and Z. Chen, “Mutually-regularized dual collaborative variational auto-encoder for recommendation systems,” in Proceedings of the ACM Web Conference, 2022, pp. 2379–2387.
- J. Ma, C. Zhou, P. Cui, H. Yang, and W. Zhu, “Learning disentangled representations for recommendation,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- N.-T. Tran and H. W. Lauw, “Aligning dual disentangled user representations from ratings and textual content,” in Proceedings of the 28th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2022, pp. 1798–1806.
- J. Tian, K. Wang, X. Xu, Z. Cao, F. Shen, and H. T. Shen, “Multimodal disentanglement variational autoencoders for zero-shot cross-modal retrieval,” in Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2022, pp. 960–969.
- S. Schneider, A. Baevski, R. Collobert, and M. Auli, “wav2vec: Unsupervised pre-training for speech recognition,” arXiv preprint arXiv:1904.05862, 2019.
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in International Conference on Machine Learning, 2021, pp. 8748–8763.
- R. He and J. McAuley, “Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering,” in The World Wide Web Conference, 2016, pp. 507–517.
- N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019, pp. 3980–3990.
- Y. Tay, A. T. Luu, A. Zhang, S. Wang, and S. C. Hui, “Compositional de-attention networks,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- R. D. Hjelm, A. Fedorov, S. Lavoie-Marchildon, K. Grewal, P. Bachman, A. Trischler, and Y. Bengio, “Learning deep representations by mutual information estimation and maximization,” International Conference on Learning Representations, 2019.
- D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
- X. Wang, H. Jin, A. Zhang, X. He, T. Xu, and T.-S. Chua, “Disentangled graph collaborative filtering,” in Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020, pp. 1001–1010.
- X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010, pp. 249–256.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations, 2015.
- X. Zhou, “Mmrec: Simplifying multimodal recommendation,” in Proceedings of the 5th ACM International Conference on Multimedia in Asia Workshops, 2023.
- G. E. Hinton, “Training products of experts by minimizing contrastive divergence,” Neural Computation, vol. 14, no. 8, pp. 1771–1800, 2002.
- L. Zhang, X. Zhou, and Z. Shen, “Multimodal pre-training framework for sequential recommendation via contrastive learning,” arXiv preprint arXiv:2303.11879, 2023.
- L. Zhang, X. Zhou, Z. Zeng, and Z. Shen, “Are id embeddings necessary? whitening pre-trained text embeddings for effective sequential recommendation,” arXiv preprint arXiv:2402.10602, 2024.
- J. Li, D. Li, S. Savarese, and S. Hoi, “Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models,” arXiv preprint arXiv:2301.12597, 2023.