Masked Graph Transformer for Large-Scale Recommendation (2405.04028v1)
Abstract: Graph Transformers have garnered significant attention for learning graph-structured data, thanks to their superb ability to capture long-range dependencies among nodes. However, the quadratic space and time complexity hinders the scalability of Graph Transformers, particularly for large-scale recommendation. Here we propose an efficient Masked Graph Transformer, named MGFormer, capable of capturing all-pair interactions among nodes with a linear complexity. To achieve this, we treat all user/item nodes as independent tokens, enhance them with positional embeddings, and feed them into a kernelized attention module. Additionally, we incorporate learnable relative degree information to appropriately reweigh the attentions. Experimental results show the superior performance of our MGFormer, even with a single attention layer.
- Structure-aware transformer for graph representation learning. In ICLR.
- Denoising self-attentive sequential recommendation. In RecSys.
- Structured graph convolutional networks with stochastic masks for recommender systems. In SIGIR.
- Sharpness-Aware Graph Collaborative Filtering. In SIGIR.
- Graph neural transport networks with non-local attentions for recommender systems. In Proceedings of the ACM Web Conference 2022.
- Rethinking Attention with Performers. In ICLR.
- Graph Neural Networks with Learnable Structural and Positional Representations. In ICLR.
- Transformers meet directed graphs. In International Conference on Machine Learning.
- Flatten transformer: Vision transformer using focused linear attention. In CVPR.
- Bobby He and Thomas Hofmann. 2024. Simplifying Transformer Blocks. In ICLR.
- Lightgcn: Simplifying and powering graph convolution network for recommendation. In SIGIR.
- Tailoring Self-Attention for Graph via Rooted Subtrees. In NeurIPS.
- Mixgcf: An improved training method for graph neural network-based recommender systems. In KDD.
- Global self-attention as a replacement for graph convolution. In KDD.
- Llm maybe longlm: Self-extend llm context window without tuning. arXiv preprint arXiv:2401.01325 (2024).
- Transformers are rnns: Fast autoregressive transformers with linear attention. In ICML.
- Pure transformers are powerful graph learners. In NeurIPS.
- Reformer: The Efficient Transformer. In ICLR.
- Rethinking Graph Transformers with Spectral Attention. In NeurIPS.
- Enhancing Transformers without Self-supervised Learning: A Loss Landscape Perspective in Sequential Recommendation. In RecSys.
- Graph Transformer for Recommendation. In SIGIR.
- Deeper insights into graph convolutional networks for semi-supervised learning. In AAAI.
- Towards a unified analysis of random Fourier features. In ICML.
- Graph inductive biases in transformers without message passing. In ICML.
- UltraGCN: ultra simplification of graph convolutional networks for recommendation. In CIKM.
- cosFormer: Rethinking Softmax In Attention. In ICLR.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research (2020).
- Recipe for a general, powerful, scalable graph transformer. In NeurIPS.
- Simplex random features. In ICML.
- BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the 25th conference on uncertainty in artificial intelligence.
- Self-Attention with Relative Position Representations. In NAACL-HLT.
- Exphormer: Sparse transformers for graphs. In ICML.
- Attention is all you need. In NeurIPS.
- Towards Representation Alignment and Uniformity in Collaborative Filtering. In KDD.
- Federated Few-Shot Learning. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.
- Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768 (2020).
- Tongzhou Wang and Phillip Isola. 2020. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In ICML.
- Improving fairness in graph neural networks via mitigating sensitive attribute leakage. In KDD.
- Lightgt: A light graph transformer for multimedia recommendation. In SIGIR.
- Self-supervised graph learning for recommendation. In SIGIR.
- Nodeformer: A scalable graph structure learning transformer for node classification. In NeurIPS.
- Kernel Ridge Regression-Based Graph Dataset Distillation. In KDD.
- From Trainable Negative Depth to Edge Heterophily in Graphs. In NeurIPS.
- PaCEr: Network Embedding From Positional to Structural. In Proceedings of the ACM Web Conference 2024.
- Toward a foundation model for time series data. In CIKM.
- Embedding Compression with Hashing for Efficient Representation Learning in Large-Scale Graph. In KDD.
- Do transformers really perform badly for graph representation?. In NeurIPS.
- Orthogonal random features. In NeurIPS.
- Are graph augmentations necessary? simple graph contrastive learning for recommendation. In SIGIR.
- Big bird: Transformers for longer sequences. In NeurIPS.
- Leveraging Opposite Gender Interaction Ratio as a Path towards Fairness in Online Dating Recommendations Based on User Sexual Orientation. In AAAI.
- Can One Embedding Fit All? A Multi-Interest Learning Paradigm Towards Improving User Interest Diversity Fairness. In Proceedings of the ACM Web Conference 2024.
- Huiyuan Chen (43 papers)
- Zhe Xu (199 papers)
- Chin-Chia Michael Yeh (43 papers)
- Vivian Lai (28 papers)
- Yan Zheng (102 papers)
- Minghua Xu (7 papers)
- Hanghang Tong (137 papers)