Towards Principled Graph Transformers (2401.10119v4)
Abstract: Graph learning architectures based on the k-dimensional Weisfeiler-Leman (k-WL) hierarchy offer a theoretically well-understood expressive power. However, such architectures often fail to deliver solid predictive performance on real-world tasks, limiting their practical impact. In contrast, global attention-based models such as graph transformers demonstrate strong performance in practice, but comparing their expressive power with the k-WL hierarchy remains challenging, particularly since these architectures rely on positional or structural encodings for their expressivity and predictive performance. To address this, we show that the recently proposed Edge Transformer, a global attention model operating on node pairs instead of nodes, has at least 3-WL expressive power. Empirically, we demonstrate that the Edge Transformer surpasses other theoretically aligned architectures regarding predictive performance while not relying on positional or structural encodings. Our code is available at https://github.com/luis-mueller/towards-principled-gts
- W. Azizian and M. Lelarge. Characterizing the expressive power of invariant and equivariant graph neural networks. In International Conference on Learning Representations, 2021.
- Layer normalization. CoRR, abs/1607.06450, 2016.
- L. Babai. Lectures on graph isomorphism. University of Toronto, Department of Computer Science. Mimeographed lecture notes, October 1979, 1979.
- L. Babai and L. Kucera. Canonical labelling of graphs in linear average time. In Symposium on Foundations of Computer Science, pages 39–46, 1979.
- Random graph isomorphism. SIAM Journal on Computing, pages 628–635, 1980.
- Systematic generalization with edge transformers. In Advances in Neural Information Processing Systems, pages 1390–1402, 2021.
- Specformer: Spectral graph neural networks meet transformers. In The Eleventh International Conference on Learning Representations, ICLR 2023, 2023.
- Weisfeiler and Lehman go cellular: CW networks. In Advances in Neural Information Processing Systems, pages 2625–2640, 2021.
- An optimal lower bound on the number of variables for graph identifications. Combinatorica, 12(4):389–410, 1992.
- Learning to extract symbolic knowledge from the world wide web. In Proceedings of the Fifteenth National Conference on Artificial Intelligence and Tenth Innovative Applications of Artificial Intelligence Conference, AAAI 98, pages 509–516, 1998.
- Benchmarking graph neural networks. ArXiv preprint, 2020.
- Graph neural networks with learnable structural and positional representations. In International Conference on Learning Representations, 2022.
- Faith and fate: Limits of transformers on compositionality. In 37th Conference on Neural Information Processing Systems, 2023.
- Understanding and extending subgraph GNNs by rethinking their symmetries. ArXiv preprint, 2022.
- Neural message passing for quantum chemistry. In International Conference on Machine Learning, pages 1263–1272, 2017.
- D. Glickman and E. Yahav. Diffusing graph attention. ArXiv preprint, 2023.
- M. Grohe. The logic of graph neural networks. In Symposium on Logic in Computer Science, pages 1–17, 2021.
- A generalization of ViT/MLP-Mixer to graphs. ArXiv preprint, 2022.
- On the stability of expressive positional encodings for graph neural networks. CoRR, abs/2310.02579, 2023.
- Measuring compositional generalization: A comprehensive method on realistic data. In International Conference on Learning Representations, 2020.
- Transformers generalize deepsets and can be extended to graphs & hypergraphs. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, 2021.
- Pure transformers are powerful graph learners. ArXiv preprint, 2022.
- T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, 2017.
- Sign and basis invariant networks for spectral graph representation learning. ArXiv preprint, 2022.
- Global attention improves graph networks generalization. ArXiv preprint, 2020.
- Graph Inductive Biases in Transformers without Message Passing. In International Conference on Machine Learning, ICML 2023, 2023.
- Provably powerful graph networks. In Advances in Neural Information Processing Systems, pages 2153–2164, 2019a.
- Invariant and equivariant graph networks. In International Conference on Learning Representations, 2019b.
- On the universality of invariant networks. In International Conference on Machine Learning, pages 4363–4371, 2019c.
- Weisfeiler and Leman go neural: Higher-order graph neural networks. In AAAI Conference on Artificial Intelligence, pages 4602–4609, 2019.
- Weisfeiler and Leman go sparse: Towards higher-order graph embeddings. In Advances in Neural Information Processing Systems, 2020.
- Weisfeiler and Leman go machine learning: The story so far. ArXiv preprint, 2021.
- SpeqNets: Sparsity-aware permutation-equivariant graph networks. In International Conference on Machine Learning, pages 16017–16042, 2022.
- Attending to graph transformers. ArXiv preprint, 2023.
- Equivariant polynomials for graph neural networks. ArXiv preprint, 2023.
- Recipe for a general, powerful, scalable graph transformer. Advances in Neural Information Processing Systems, pages 14501–14515, 2022.
- Improving compositional generalization using iterated learning and simplicial embeddings. In 37th Conference on Neural Information Processing Systems, 2023.
- The graph neural network model. IEEE Transactions on Neural Networks, 20(1):61–80, 2009.
- Graph learning with 1D convolutions on random walks. ArXiv preprint, 2021.
- Attention is all you need. In Advances in Neural Information Processing Systems, pages 5998–6008, 2017.
- Order matters: Sequence to sequence for sets. In International Conference on Learning Representations, 2016.
- Y. Wang and M. Zhang. Towards better evaluation of GNN expressiveness with BREC dataset. CoRR, abs/2304.07702, 2023.
- B. Weisfeiler and A. Leman. The reduction of a graph to canonical form and the algebra which appears therein. Nauchno-Technicheskaya Informatsia, 2(9):12–16, 1968.
- How powerful are graph neural networks? In International Conference on Learning Representations, 2019a.
- Cross-lingual knowledge graph alignment via graph matching neural network. In Annual Meeting of the Association for Computational Linguistics, pages 3156–3161, 2019b.
- Do transformers really perform badly for graph representation? In Advances in Neural Information Processing System, 2021.
- Rethinking the expressive power of gnns via graph biconnectivity. In The Eleventh International Conference on Learning Representations, ICLR 2023, 2023.