Gradformer: Graph Transformer with Exponential Decay (2404.15729v1)
Abstract: Graph Transformers (GTs) have demonstrated their advantages across a wide range of tasks. However, the self-attention mechanism in GTs overlooks the graph's inductive biases, particularly biases related to structure, which are crucial for the graph tasks. Although some methods utilize positional encoding and attention bias to model inductive biases, their effectiveness is still suboptimal analytically. Therefore, this paper presents Gradformer, a method innovatively integrating GT with the intrinsic inductive bias by applying an exponential decay mask to the attention matrix. Specifically, the values in the decay mask matrix diminish exponentially, correlating with the decreasing node proximities within the graph structure. This design enables Gradformer to retain its ability to capture information from distant nodes while focusing on the graph's local details. Furthermore, Gradformer introduces a learnable constraint into the decay mask, allowing different attention heads to learn distinct decay masks. Such an design diversifies the attention heads, enabling a more effective assimilation of diverse structural information within the graph. Extensive experiments on various benchmarks demonstrate that Gradformer consistently outperforms the Graph Neural Network and GT baseline models in various graph classification and regression tasks. Additionally, Gradformer has proven to be an effective method for training deep GT models, maintaining or even enhancing accuracy compared to shallow models as the network deepens, in contrast to the significant accuracy drop observed in other GT models.Codes are available at \url{https://github.com/LiuChuang0059/Gradformer}.
- Entity alignment with reliable path reasoning and relation-aware heterogeneous graph transformer. In IJCAI, 2022.
- Multi-modal dynamic graph transformer for visual grounding. In CVPR, 2022.
- Learning attributed graph representation with communicative message passing transformer. In IJCAI, 2021.
- Structure-aware transformer for graph representation learning. In ICML, 2022.
- From block-toeplitz matrices to differential equations on graphs: towards a general theory for scalable masked transformers. In ICML, 2022.
- Relational attention: Generalizing transformers for graph-structured tasks. In ICLR, 2023.
- Self-attention with cross-lingual position representation. In ACL, 2020.
- A generalization of transformer networks to graphs. In AAAI Workshop, 2021.
- Benchmarking graph neural networks. Journal of Machine Learning Research, 2023.
- Fast graph representation learning with PyTorch Geometric. In ICLR Workshop, 2019.
- Unleashing the power of transformer for graphs. arXiv:2202.10581, 2022.
- Multimodal graph transformer for multimodal question answering. In ACL, 2023.
- Open graph benchmark: Datasets for machine learning on graphs. arXiv:2005.00687, 2020.
- Graph decision transformer. arXiv preprint arXiv:2303.03747, 2023.
- Global self-attention as a replacement for graph convolution. In SIGKDD, 2022.
- Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.
- Semi-supervised classification with graph convolutional networks. In ICLR, 2017.
- Rethinking graph transformers with spectral attention. In NeurIPS, 2021.
- Deeper exploiting graph structure information by discrete ricci curvature in a graph transformer. Entropy, 2023.
- Gated graph sequence neural networks. In ICLR, 2016.
- Graph transformer for recommendation. In SIGIR, 2023.
- Gapformer: Graph transformer with graph pooling for node classification. In IJCAI, 2023.
- Exploring sparsity in graph transformers. Neural Networks, 2024.
- Transformer for graphs: An overview from architecture perspective. arXiv:2202.08455, 2022.
- Tudataset: A collection of benchmark datasets for learning with graphs. arXiv:2007.08663, 2020.
- Deformable graph transformer. arXiv:2206.14337, 2022.
- Recipe for a general, powerful, scalable graph transformer. In NeurIPS, 2022.
- Self-supervised graph transformer on large-scale molecular data. In NeurIPS, 2020.
- Retentive network: A successor to transformer for large language models. arXiv preprint arXiv:2307.08621, 2023.
- Graph transformer gans for graph-constrained house generation. In CVPR, 2023.
- Attention is all you need. In NeurIPS, 2017.
- Graph attention networks. In ICLR, 2018.
- Representing long-range context for graph neural networks with global attention. In NeurIPS, 2021.
- Nodeformer: A scalable graph structure learning transformer for node classification. In NeurIPS, 2022.
- DIFFormer: Scalable (graph) transformers induced by energy constrained diffusion. In ICLR, 2023.
- Kdlgt: A linear graph transformer framework via kernel decomposition approach. In IJCAI, 2023.
- Graph contextualized self-attention network for session-based recommendation. In IJCAI, 2019.
- How powerful are graph neural networks? In ICLR, 2019.
- Lgi-gt: Graph transformers with local and global operators interleaving. In IJCAI, 2023.
- Do transformers really perform badly for graph representation? In NeurIPS, 2021.
- Gophormer: Ego-graph transformer for node classification. arXiv:2110.13094, 2021.
- Are more layers beneficial to graph transformers? In ICLR, 2023.
- Beyond homophily in graph neural networks: Current limitations and effective designs. In NeurIPS, 2020.
- Posegtac: Graph transformer encoder-decoder with atrous convolution for 3d human pose estimation. In IJCAI, 2021.
- Chuang Liu (71 papers)
- Zelin Yao (4 papers)
- Yibing Zhan (73 papers)
- Xueqi Ma (13 papers)
- Shirui Pan (198 papers)
- Wenbin Hu (50 papers)