Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Masked Graph Transformer for Large-Scale Recommendation (2405.04028v1)

Published 7 May 2024 in cs.IR

Abstract: Graph Transformers have garnered significant attention for learning graph-structured data, thanks to their superb ability to capture long-range dependencies among nodes. However, the quadratic space and time complexity hinders the scalability of Graph Transformers, particularly for large-scale recommendation. Here we propose an efficient Masked Graph Transformer, named MGFormer, capable of capturing all-pair interactions among nodes with a linear complexity. To achieve this, we treat all user/item nodes as independent tokens, enhance them with positional embeddings, and feed them into a kernelized attention module. Additionally, we incorporate learnable relative degree information to appropriately reweigh the attentions. Experimental results show the superior performance of our MGFormer, even with a single attention layer.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Structure-aware transformer for graph representation learning. In ICLR.
  2. Denoising self-attentive sequential recommendation. In RecSys.
  3. Structured graph convolutional networks with stochastic masks for recommender systems. In SIGIR.
  4. Sharpness-Aware Graph Collaborative Filtering. In SIGIR.
  5. Graph neural transport networks with non-local attentions for recommender systems. In Proceedings of the ACM Web Conference 2022.
  6. Rethinking Attention with Performers. In ICLR.
  7. Graph Neural Networks with Learnable Structural and Positional Representations. In ICLR.
  8. Transformers meet directed graphs. In International Conference on Machine Learning.
  9. Flatten transformer: Vision transformer using focused linear attention. In CVPR.
  10. Bobby He and Thomas Hofmann. 2024. Simplifying Transformer Blocks. In ICLR.
  11. Lightgcn: Simplifying and powering graph convolution network for recommendation. In SIGIR.
  12. Tailoring Self-Attention for Graph via Rooted Subtrees. In NeurIPS.
  13. Mixgcf: An improved training method for graph neural network-based recommender systems. In KDD.
  14. Global self-attention as a replacement for graph convolution. In KDD.
  15. Llm maybe longlm: Self-extend llm context window without tuning. arXiv preprint arXiv:2401.01325 (2024).
  16. Transformers are rnns: Fast autoregressive transformers with linear attention. In ICML.
  17. Pure transformers are powerful graph learners. In NeurIPS.
  18. Reformer: The Efficient Transformer. In ICLR.
  19. Rethinking Graph Transformers with Spectral Attention. In NeurIPS.
  20. Enhancing Transformers without Self-supervised Learning: A Loss Landscape Perspective in Sequential Recommendation. In RecSys.
  21. Graph Transformer for Recommendation. In SIGIR.
  22. Deeper insights into graph convolutional networks for semi-supervised learning. In AAAI.
  23. Towards a unified analysis of random Fourier features. In ICML.
  24. Graph inductive biases in transformers without message passing. In ICML.
  25. UltraGCN: ultra simplification of graph convolutional networks for recommendation. In CIKM.
  26. cosFormer: Rethinking Softmax In Attention. In ICLR.
  27. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research (2020).
  28. Recipe for a general, powerful, scalable graph transformer. In NeurIPS.
  29. Simplex random features. In ICML.
  30. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the 25th conference on uncertainty in artificial intelligence.
  31. Self-Attention with Relative Position Representations. In NAACL-HLT.
  32. Exphormer: Sparse transformers for graphs. In ICML.
  33. Attention is all you need. In NeurIPS.
  34. Towards Representation Alignment and Uniformity in Collaborative Filtering. In KDD.
  35. Federated Few-Shot Learning. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.
  36. Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768 (2020).
  37. Tongzhou Wang and Phillip Isola. 2020. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In ICML.
  38. Improving fairness in graph neural networks via mitigating sensitive attribute leakage. In KDD.
  39. Lightgt: A light graph transformer for multimedia recommendation. In SIGIR.
  40. Self-supervised graph learning for recommendation. In SIGIR.
  41. Nodeformer: A scalable graph structure learning transformer for node classification. In NeurIPS.
  42. Kernel Ridge Regression-Based Graph Dataset Distillation. In KDD.
  43. From Trainable Negative Depth to Edge Heterophily in Graphs. In NeurIPS.
  44. PaCEr: Network Embedding From Positional to Structural. In Proceedings of the ACM Web Conference 2024.
  45. Toward a foundation model for time series data. In CIKM.
  46. Embedding Compression with Hashing for Efficient Representation Learning in Large-Scale Graph. In KDD.
  47. Do transformers really perform badly for graph representation?. In NeurIPS.
  48. Orthogonal random features. In NeurIPS.
  49. Are graph augmentations necessary? simple graph contrastive learning for recommendation. In SIGIR.
  50. Big bird: Transformers for longer sequences. In NeurIPS.
  51. Leveraging Opposite Gender Interaction Ratio as a Path towards Fairness in Online Dating Recommendations Based on User Sexual Orientation. In AAAI.
  52. Can One Embedding Fit All? A Multi-Interest Learning Paradigm Towards Improving User Interest Diversity Fairness. In Proceedings of the ACM Web Conference 2024.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Huiyuan Chen (43 papers)
  2. Zhe Xu (199 papers)
  3. Chin-Chia Michael Yeh (43 papers)
  4. Vivian Lai (28 papers)
  5. Yan Zheng (102 papers)
  6. Minghua Xu (7 papers)
  7. Hanghang Tong (137 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.