Papers
Topics
Authors
Recent
Search
2000 character limit reached

Mirror Gradient: Towards Robust Multimodal Recommender Systems via Exploring Flat Local Minima

Published 17 Feb 2024 in cs.IR and cs.LG | (2402.11262v1)

Abstract: Multimodal recommender systems utilize various types of information to model user preferences and item features, helping users discover items aligned with their interests. The integration of multimodal information mitigates the inherent challenges in recommender systems, e.g., the data sparsity problem and cold-start issues. However, it simultaneously magnifies certain risks from multimodal information inputs, such as information adjustment risk and inherent noise risk. These risks pose crucial challenges to the robustness of recommendation models. In this paper, we analyze multimodal recommender systems from the novel perspective of flat local minima and propose a concise yet effective gradient strategy called Mirror Gradient (MG). This strategy can implicitly enhance the model's robustness during the optimization process, mitigating instability risks arising from multimodal information inputs. We also provide strong theoretical evidence and conduct extensive empirical experiments to show the superiority of MG across various multimodal recommendation models and benchmarks. Furthermore, we find that the proposed MG can complement existing robust training methods and be easily extended to diverse advanced recommendation models, making it a promising new and fundamental paradigm for training multimodal recommender systems. The code is released at https://github.com/Qrange-group/Mirror-Gradient.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Harnessing multimodal data integration to advance precision oncology. Nature Reviews Cancer 22, 2 (2022), 114–126.
  2. Léon Bottou. 2012. Stochastic gradient descent tricks. In Neural Networks: Tricks of the Trade: Second Edition. Springer, 421–436.
  3. Huiyuan Chen and Jing Li. 2019. Adversarial tensor factorization for context-aware recommendation. In Proceedings of the 13th ACM Conference on Recommender Systems. 363–367.
  4. Bias and debias in recommender system: A survey and future directions. ACM Transactions on Information Systems 41, 3 (2023), 1–39.
  5. A survey on adversarial recommender systems: from attack/defense strategies to generative adversarial networks. ACM Computing Surveys (CSUR) 54, 2 (2021), 1–38.
  6. Benoit et al Dherin. 2021. The Geometric Occam’s Razor Implicit in Deep Learning. arXiv preprint arXiv:2111.15090 (2021).
  7. Efficient sharpness-aware minimization for improved training of neural networks. arXiv preprint arXiv:2110.03141 (2021).
  8. Enhancing the robustness of neural collaborative filtering systems under malicious attacks. IEEE Transactions on Multimedia 21, 3 (2018), 555–565.
  9. Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research 12, 7 (2011).
  10. Sharpness-aware minimization for efficiently improving generalization. arXiv preprint arXiv:2010.01412 (2020).
  11. Graph neural networks for recommender system. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. 1623–1625.
  12. Alex Graves. 2013. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013).
  13. Dynamically Expandable Graph Convolution for Streaming Recommendation. arXiv preprint arXiv:2303.11700 (2023).
  14. Asymmetric valleys: Beyond sharp and flat local minima. Advances in neural information processing systems 32 (2019).
  15. Ruining He and Julian McAuley. 2016. VBPR: visual bayesian personalized ranking from implicit feedback. In Proceedings of the AAAI conference on artificial intelligence, Vol. 30.
  16. Blending pruning criteria for convolutional neural networks. In ICANN 2021: 30th International Conference on Artificial Neural Networks, 2021. Springer, 3–15.
  17. Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 639–648.
  18. Sepp Hochreiter and Jürgen Schmidhuber. 1994. Simplifying neural nets by discovering flat minima. Advances in neural information processing systems 7 (1994).
  19. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Flat minima. Neural computation 9, 1 (1997), 1–42.
  20. AlterSGD: Finding Flat Minima for Continual Learning by Alternative Training. arXiv preprint arXiv:2107.05804 (2021).
  21. Understanding Self-attention Mechanism via Dynamical System Perspective. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1412–1422.
  22. Layer-wise shared attention network on dynamical system perspective. arXiv preprint arXiv:2210.16101 (2022).
  23. Dianet: Dense-and-implicit attention network. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 4206–4214.
  24. Rethinking the pruning criteria for convolutional neural network. Advances in Neural Information Processing Systems 34 (2021), 16305–16318.
  25. Scalelong: Towards more stable training of diffusion model via scaling network long skip connection. Advances in Neural Information Processing Systems 36 (2024).
  26. When do flat minima optimizers work? Advances in Neural Information Processing Systems 35 (2022), 16577–16595.
  27. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
  28. Asam: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks. In International Conference on Machine Learning. PMLR, 5905–5914.
  29. Addressing cold-start problem in recommendation systems. In Proceedings of the 2nd international conference on Ubiquitous information management and communication. 208–211.
  30. Visualizing the loss landscape of neural nets. Advances in neural information processing systems 31 (2018).
  31. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597 (2023).
  32. Adversarial learning to compare: Self-attentive prospective customer recommendation in location based social networks. In Proceedings of the 13th International Conference on Web Search and Data Mining. 349–357.
  33. Instance enhancement batch normalization: An adaptive regulator of batch noise. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 4819–4827.
  34. Disentangling the Performance Puzzle of Multimodal-aware Recommender Systems. In EvalRS@ KDD (CEUR Workshop Proceedings, Vol. 3450). CEUR-WS. org.
  35. Image-based recommendations on styles and substitutes. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. 43–52.
  36. Make sharpness-aware minimization stronger: A sparsified perturbation approach. Advances in Neural Information Processing Systems 35 (2022), 30950–30962.
  37. A two-stage embedding model for recommendation with multimodal auxiliary information. Information Sciences 582 (2022), 22–37.
  38. BPR: Bayesian personalized ranking from implicit feedback. arXiv preprint arXiv:1205.2618 (2012).
  39. Cornac: A comparative framework for multimodal recommender systems. The Journal of Machine Learning Research 21, 1 (2020), 3803–3807.
  40. Methods and metrics for cold-start recommendations. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. 253–260.
  41. Overcoming catastrophic forgetting in incremental few-shot learning by finding flat minima. Advances in neural information processing systems 34 (2021), 6747–6761.
  42. Enhancing Hierarchy-Aware Graph Networks with Deep Dual Clustering for Session-based Recommendation. In Proceedings of the ACM Web Conference 2023. 165–176.
  43. On the importance of initialization and momentum in deep learning. In International conference on machine learning. PMLR, 1139–1147.
  44. Adversarial training towards robust multimedia recommender system. IEEE Transactions on Knowledge and Data Engineering 32, 5 (2019), 855–867.
  45. Jiaxi Tang and Ke Wang. 2018. Ranking distillation: Learning compact ranking models with high performance for recommender system. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2289–2298.
  46. Self-supervised learning for multimedia recommendation. IEEE Transactions on Multimedia (2022).
  47. Collaborative deep learning for recommender systems. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 1235–1244.
  48. Dualgnn: Dual graph neural network for multimedia recommendation. IEEE Transactions on Multimedia (2021).
  49. Graph-refined convolutional network for multimedia recommendation with implicit feedback. In Proceedings of the 28th ACM international conference on multimedia. 3541–3549.
  50. MMGCN: Multi-modal graph convolution network for personalized recommendation of micro-video. In Proceedings of the 27th ACM international conference on multimedia. 1437–1445.
  51. Fight fire with fire: towards robust recommender systems via adversarial poisoning training. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1074–1083.
  52. ConsRec: Learning Consensus Behind Interactions for Group Recommendation. In Proceedings of the ACM Web Conference 2023. 240–250.
  53. Collaborative knowledge base embedding for recommender systems. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 353–362.
  54. Diffusion-based graph contrastive learning for recommendation with implicit feedback. In International Conference on Database Systems for Advanced Applications. Springer, 232–247.
  55. Penalizing gradient norm for efficiently improving generalization in deep learning. In International Conference on Machine Learning. PMLR, 26982–26992.
  56. Let’s Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation. arXiv preprint arXiv:2312.02439 (2023).
  57. ASR: Attention-alike Structural Re-parameterization. arXiv preprint arXiv:2304.06345 (2023).
  58. Sur-adapter: Enhancing text-to-image pre-trained diffusion models with large language models. In Proceedings of the 31st ACM International Conference on Multimedia. 567–578.
  59. CEM: Machine-Human Chatting Handoff via Causal-Enhance Module. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 3242–3253.
  60. Enhancing Dyadic Relations with Homogeneous Graphs for Multimodal Recommendation. arXiv preprint arXiv:2301.12097 (2023).
  61. A Comprehensive Survey on Multimodal Recommender Systems: Taxonomy, Evaluation, and Future Directions. arXiv preprint arXiv:2302.04473 (2023).
  62. Xin Zhou. 2022. A tale of two graphs: Freezing and denoising graph structures for multimodal recommendation. arXiv preprint arXiv:2211.06924 (2022).
  63. Evaluating reputation of web services under rating scarcity. In 2016 IEEE International Conference on Services Computing (SCC). IEEE, 211–218.
  64. Layer-refined graph convolutional networks for recommendation. In 2023 IEEE 39th International Conference on Data Engineering (ICDE). IEEE, 1247–1259.
  65. Selfcf: A simple framework for self-supervised collaborative filtering. ACM Transactions on Recommender Systems 1, 2 (2023), 1–25.
  66. Bootstrap latent representations for multi-modal recommendation. In Proceedings of the ACM Web Conference 2023. 845–854.
  67. Surrogate gap minimization improves sharpness-aware training. arXiv preprint arXiv:2203.08065 (2022).
Citations (4)

Summary

  • The paper's main contribution is the introduction of Mirror Gradient (MG) which steers the training process towards flat local minima to enhance robustness.
  • The methodology adjusts gradient directions to balance between escaping sharp minima and settling into flatter regions, reducing sensitivity to noise.
  • Experimental results demonstrate improved top-5 recommendation performance and resilience against information noise across different models.

Mirror Gradient: Towards Robust Multimodal Recommender Systems via Exploring Flat Local Minima

Introduction

The paper "Mirror Gradient: Towards Robust Multimodal Recommender Systems via Exploring Flat Local Minima" explores enhancing the robustness of multimodal recommender systems by leveraging the concept of flat local minima. Multimodal recommenders integrate various information forms, such as text and images, to model user preferences and item features, addressing issues like data sparsity and cold-start. However, these systems face challenges from information adjustment and inherent noise risks. This paper proposes a new gradient strategy, Mirror Gradient (MG), to enhance model robustness against these risks.

Multimodal Recommender Systems

Multimodal recommender systems aim to integrate data from different modalities (e.g., text, images) to improve recommendation quality. They are advantageous in addressing the data sparsity problem inherent in traditional recommendation systems, which typically rely on sparse user-item interaction data. Despite these benefits, multimodal systems must handle the complexities introduced by integrating disparate data types, such as dealing with potential noise and frequent changes in item presentation (e.g., visual and textual adjustments). Figure 1

Figure 1: An illustrative example of multimodal risks.

Flat Local Minima and Robustness

The paper investigates robustness through the lens of flat local minima. Flat local minima refer to regions in the loss landscape where small changes in the input result in minimal changes in the loss, indicating robustness to input perturbations. In contrast, sharp minima are vulnerable to such perturbations, potentially degrading model performance during inference when the input data slightly deviates from expected distributions. Figure 2

Figure 2: Illustration of flat local minima showing robust vs. vulnerable parameter settings.

Mirror Gradient (MG) Strategy

MG is designed to guide the learning process toward flatter minima, enhancing robustness. It strategically adjusts gradient directions during training to balance between aggressive parameter updates required to escape sharp minima and the controlled steps needed to settle into flat minima. This approach maintains computational efficiency while improving model robustness across various scenarios. Unlike prior adversarial methods that explicitly model data perturbations, MG offers a theoretically grounded, implicit regularization technique.

Theoretical Foundations

The paper provides rigorous theoretical insights supporting MG's effectiveness. Through the lens of loss landscape shaping, it illustrates how MG modifies the effective loss function to penalize sharp minima indirectly. The strategy effectively leads to optimizing an objective that integrates terms favoring flat loss regions, thus promoting robustness naturally.

Experimental Results

Performance Evaluation

Extensive experiments conducted on datasets like Baby, Sports, and Clothing reveal MG's efficacy. The strategy consistently improves top-5 recommendation performance across various models, including VBPR, GRCN, and DualGNN. Figure 3

Figure 3: Visualization of local minima illustrating the improved stability with MG.

Mitigating Information Noise and Adjustment

MG demonstrates significant resilience to input noise and dynamic information adjustments typically encountered in real-world scenarios. Controlled experiments introduce Gaussian noise into embeddings and simulate textual alterations, consistently showing reduced performance degradation in MG-trained models. Figure 4

Figure 4: Convergence of MG on the Baby dataset, highlighting improved loss stabilization.

Compatibility and Versatility

MG is compatible with various optimization algorithms (e.g., SGD, Adam), making it versatile for integration into existing training workflows. It complements other robust training techniques, such as adversarial training, without additional computational overhead beyond regular SGD updates.

Conclusion

Mirror Gradient introduces a practical approach for enhancing the robustness of multimodal recommender systems. By encouraging flat local minima, the methodology offers a straightforward yet powerful means to mitigate common robustness issues like information noise and adjustment risks. Future explorations could explore adaptive mechanisms for MG's hyperparameters or extending its utility to other AI domains where robustness is paramount.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.