Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Comparing Graph Transformers via Positional Encodings (2402.14202v4)

Published 22 Feb 2024 in cs.LG

Abstract: The distinguishing power of graph transformers is closely tied to the choice of positional encoding: features used to augment the base transformer with information about the graph. There are two primary types of positional encoding: absolute positional encodings (APEs) and relative positional encodings (RPEs). APEs assign features to each node and are given as input to the transformer. RPEs instead assign a feature to each pair of nodes, e.g., graph distance, and are used to augment the attention block. A priori, it is unclear which method is better for maximizing the power of the resulting graph transformer. In this paper, we aim to understand the relationship between these different types of positional encodings. Interestingly, we show that graph transformers using APEs and RPEs are equivalent in terms of distinguishing power. In particular, we demonstrate how to interchange APEs and RPEs while maintaining their distinguishing power in terms of graph transformers. Based on our theoretical results, we provide a study on several APEs and RPEs (including the resistance distance and the recently introduced stable and expressive positional encoding (SPE)) and compare their distinguishing power in terms of transformers. We believe our work will help navigate the huge number of choices of positional encoding and will provide guidance on the future design of positional encodings for graph transformers.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Can graph neural networks count substructures? Advances in neural information processing systems, 33:10383–10395, 2020.
  2. From block-Toeplitz matrices to differential equations on graphs: towards a general theory for scalable masked transformers. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S. (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp.  3962–3983. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/choromanski22a.html.
  3. A generalization of transformer networks to graphs. arXiv preprint arXiv:2012.09699, 2020.
  4. Benchmarking graph neural networks. Journal of Machine Learning Research, 24(43):1–48, 2023.
  5. Magnetic eigenmaps for the visualization of directed networks. Applied and Computational Harmonic Analysis, 44(1):189–199, 2018.
  6. Neural message passing for quantum chemistry. In International conference on machine learning, pp. 1263–1272. PMLR, 2017.
  7. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.
  8. A short tutorial on the Weisfeiler-Lehman test and its variants. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.  8533–8537. IEEE, 2021.
  9. On the stability of expressive positional encodings for graph neural networks, 2023.
  10. Rethinking graph transformers with spectral attention. Advances in Neural Information Processing Systems, 34:21618–21629, 2021.
  11. Sign and basis invariant networks for spectral graph representation learning. In The Eleventh International Conference on Learning Representations, 2022.
  12. Graph inductive biases in transformers without message passing. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  23321–23337. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/ma23c.html.
  13. Provably powerful graph networks. Advances in neural information processing systems, 32, 2019a.
  14. Invariant and equivariant graph networks. In International Conference on Learning Representations, 2019b.
  15. Graphit: Encoding graph structure in transformers, 2021.
  16. Attending to graph transformers. arXiv preprint arXiv:2302.04181, 2023.
  17. Recipe for a general, powerful, scalable graph transformer. Advances in Neural Information Processing Systems, 35:14501–14515, 2022.
  18. Three iterations of (d−1)𝑑1(d-1)( italic_d - 1 )-wl test distinguish non isometric clouds of d𝑑ditalic_d-dimensional points, 2023.
  19. Self-attention with relative position representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp.  464–468, 2018.
  20. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  21. The reduction of a graph to canonical form and the algebra which appears therein. nti, Series, 2(9):12–16, 1968.
  22. How powerful are graph neural networks? In International Conference on Learning Representations, 2018.
  23. Do transformers really perform badly for graph representation? Advances in Neural Information Processing Systems, 34:28877–28888, 2021.
  24. Deep sets. Advances in neural information processing systems, 30, 2017.
  25. Rethinking the expressive power of GNNs via graph biconnectivity. arXiv preprint arXiv:2301.09505, 2023.
  26. On structural expressive power of graph transformers. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’23, pp.  3628–3637, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9798400701030. doi: 10.1145/3580305.3599451. URL https://doi.org/10.1145/3580305.3599451.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Mitchell Black (26 papers)
  2. Zhengchao Wan (22 papers)
  3. Gal Mishne (37 papers)
  4. Amir Nayyeri (23 papers)
  5. Yusu Wang (85 papers)
Citations (5)
X Twitter Logo Streamline Icon: https://streamlinehq.com