Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Distinguished In Uniform: Self Attention Vs. Virtual Nodes (2405.11951v1)

Published 20 May 2024 in cs.LG

Abstract: Graph Transformers (GTs) such as SAN and GPS are graph processing models that combine Message-Passing GNNs (MPGNNs) with global Self-Attention. They were shown to be universal function approximators, with two reservations: 1. The initial node features must be augmented with certain positional encodings. 2. The approximation is non-uniform: Graphs of different sizes may require a different approximating network. We first clarify that this form of universality is not unique to GTs: Using the same positional encodings, also pure MPGNNs and even 2-layer MLPs are non-uniform universal approximators. We then consider uniform expressivity: The target function is to be approximated by a single network for graphs of all sizes. There, we compare GTs to the more efficient MPGNN + Virtual Node architecture. The essential difference between the two model definitions is in their global computation method -- Self-Attention Vs Virtual Node. We prove that none of the models is a uniform-universal approximator, before proving our main result: Neither model's uniform expressivity subsumes the other's. We demonstrate the theory with experiments on synthetic data. We further augment our study with real-world datasets, observing mixed results which indicate no clear ranking in practice as well.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. The surprising power of graph neural networks with random node initialization. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, 2021.
  2. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
  3. The logical expressiveness of graph neural networks. In 8th International Conference on Learning Representations (ICLR 2020), 2020.
  4. Graph neural networks with local graph parameters. Advances in Neural Information Processing Systems, 34:25280–25293, 2021.
  5. Residual gated graph convnets. arXiv preprint arXiv:1711.07553, 2017.
  6. On the connection between mpnn and graph transformer. arXiv preprint arXiv:2301.11956, 2023.
  7. On the equivalence between graph isomorphism testing and function approximation with gnns. In Neural Information Processing Systems, 2019. URL https://api.semanticscholar.org/CorpusID:168169990.
  8. George Cybenko. Approximation by superpositions of a sigmoidal function. Math. Control. Signals Syst., 2(4):303–314, 1989.
  9. Coloring graph neural networks for node disambiguation. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, pp.  2126–2132, 2020.
  10. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
  11. A generalization of transformer networks to graphs. arXiv preprint arXiv:2012.09699, 2020.
  12. Graph neural networks with learnable structural and positional representations. In International Conference on Learning Representations, 2021.
  13. Long range graph benchmark. Advances in Neural Information Processing Systems, 35:22326–22340, 2022.
  14. Neural message passing for quantum chemistry. In Proceedings of the Thirty-Fourth International Conference on Machine Learning (ICML), pp.  1263–1272, 2017.
  15. C. Godsil and G. Royle. Algebraic Graph Theory. Springer, 2001.
  16. Are targeted messages more effective?, 2024.
  17. Kurt Hornik. Approximation capabilities of multilayer feedforward networks. Neural Networks, 4(2):251–257, 1991.
  18. Open graph benchmark: Datasets for machine learning on graphs. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, 2020a.
  19. Strategies for pre-training graph neural networks. In International Conference on Learning Representations, 2020b.
  20. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pp. 448–456. pmlr, 2015.
  21. Pure transformers are powerful graph learners. Advances in Neural Information Processing Systems, 35:14582–14595, 2022.
  22. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), 2017.
  23. Rethinking graph transformers with spectral attention. Advances in Neural Information Processing Systems, 34:21618–21629, 2021.
  24. Weisfeiler and Leman go neural: Higher-order graph neural networks. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI), pp.  4602–4609, 2019.
  25. Weisfeiler and leman go sparse: Towards scalable higher-order graph embeddings. Advances in Neural Information Processing Systems, 33:21824–21840, 2020.
  26. WL meet VC. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  25275–25302. PMLR, 2023.
  27. Attending to Graph Transformers. arXiv e-prints, art. arXiv:2302.04181, February 2023. doi: 10.48550/arXiv.2302.04181.
  28. Multiresolution graph transformers and wavelet positional encoding for learning long-range and hierarchical structures. The Journal of Chemical Physics, 159(3), 2023.
  29. Recipe for a general, powerful, scalable graph transformer. Advances in Neural Information Processing Systems, 35:14501–14515, 2022.
  30. Some might say all you need is sum. arXiv preprint arXiv:2302.11603, 2023.
  31. Exphormer: Sparse transformers for graphs. In International Conference on Machine Learning, 2023.
  32. Novel positional encodings to enable tree-based transformers. Advances in neural information processing systems, 32, 2019.
  33. Where did the gap go? reassessing the long-range graph benchmark. arXiv preprint arXiv:2309.00367, 2023.
  34. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  35. How powerful are graph neural networks? In Proceedings of the Seventh International Conference on Learning Representations (ICLR), 2019.
  36. Lgi-gt: Graph transformers with local and global operators interleaving. In Edith Elkind (ed.), Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, pp. 4504–4512. International Joint Conferences on Artificial Intelligence Organization, 8 2023. doi: 10.24963/ijcai.2023/501. URL https://doi.org/10.24963/ijcai.2023/501. Main Track.
  37. Are transformers universal approximators of sequence-to-sequence functions? In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=ByxRM0Ntvr.
  38. Deep sets. Advances in neural information processing systems, 30, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Eran Rosenbluth (6 papers)
  2. Jan Tönshoff (9 papers)
  3. Martin Ritzert (17 papers)
  4. Berke Kisin (2 papers)
  5. Martin Grohe (92 papers)
Citations (9)

Summary

We haven't generated a summary for this paper yet.