Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Theoretical Expressive Power and the Design Space of Higher-Order Graph Transformers (2404.03380v1)

Published 4 Apr 2024 in cs.LG, cs.CG, and math.GN

Abstract: Graph transformers have recently received significant attention in graph learning, partly due to their ability to capture more global interaction via self-attention. Nevertheless, while higher-order graph neural networks have been reasonably well studied, the exploration of extending graph transformers to higher-order variants is just starting. Both theoretical understanding and empirical results are limited. In this paper, we provide a systematic study of the theoretical expressive power of order-$k$ graph transformers and sparse variants. We first show that, an order-$k$ graph transformer without additional structural information is less expressive than the $k$-Weisfeiler Lehman ($k$-WL) test despite its high computational cost. We then explore strategies to both sparsify and enhance the higher-order graph transformers, aiming to improve both their efficiency and expressiveness. Indeed, sparsification based on neighborhood information can enhance the expressive power, as it provides additional information about input graph structures. In particular, we show that a natural neighborhood-based sparse order-$k$ transformer model is not only computationally efficient, but also expressive -- as expressive as $k$-WL test. We further study several other sparse graph attention models that are computationally efficient and provide their expressiveness analysis. Finally, we provide experimental results to show the effectiveness of the different sparsification strategies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Sumformer: Universal approximation for efficient transformers. ArXiv, abs/2307.02301.
  2. Expressive power of invariant and equivariant graph neural networks. In International Conference on Learning Representations.
  3. Directional graph networks. In International Conference on Machine Learning.
  4. Longformer: The long-document transformer. ArXiv, abs/2004.05150.
  5. Specformer: Spectral graph neural networks meet transformers. ArXiv, abs/2303.01028.
  6. Weisfeiler and lehman go cellular: Cw networks. In Neural Information Processing Systems.
  7. Weisfeiler and lehman go topological: Message passing simplicial networks. ArXiv, abs/2103.03212.
  8. Improving graph neural network expressivity via subgraph isomorphism counting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1):657–668.
  9. Residual gated graph convnets. ArXiv, abs/1711.07553.
  10. Briand, E. (2004). When is the algebra of multisymmetric polynomials generated by the elementary multisymmetric polynomials.
  11. On the connection between mpnn and graph transformer. ArXiv, abs/2301.11956.
  12. An optimal lower bound on the number of variables for graph identification. Combinatorica, 12:389–410.
  13. Alchemy: A quantum chemistry dataset for benchmarking ai models. arXiv preprint arXiv:1906.09427.
  14. Can graph neural networks count substructures? CoRR, abs/2002.04025.
  15. Rethinking attention with performers. ArXiv, abs/2009.14794.
  16. Adaptively sparse transformers. In Conference on Empirical Methods in Natural Language Processing.
  17. Principal neighbourhood aggregation for graph nets. ArXiv, abs/2004.05718.
  18. Cybenko, G. V. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2:303–314.
  19. Computational Topology for Data Analysis. Cambridge University Press. 452 pages.
  20. An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv, abs/2010.11929.
  21. Benchmarking graph neural networks. ArXiv, abs/2003.00982.
  22. Graph neural networks with learnable structural and positional representations. ArXiv, abs/2110.07875.
  23. Long range graph benchmark. ArXiv, abs/2206.08164.
  24. Simplicial neural networks. ArXiv, abs/2010.03633.
  25. Geerts, F. (2020a). The expressive power of kth-order invariant graph networks. ArXiv, abs/2007.12035.
  26. Geerts, F. (2020b). Walk message passing neural networks and second-order graph neural networks. ArXiv, abs/2006.09499.
  27. Ghani, R. (2001). Cmu. world wide knowledge base (web-kb) project.
  28. Simplicial attention neural networks. ArXiv, abs/2203.07485.
  29. Simplicial attention networks. ArXiv, abs/2204.09455.
  30. Hornik, K. (1991). Approximation capabilities of multilayer feedforward networks. Neural Networks, 4:251–257.
  31. Open graph benchmark: Datasets for machine learning on graphs. ArXiv, abs/2005.00687.
  32. On the stability of expressive positional encodings for graph neural networks. arXiv preprint arXiv:2310.02579.
  33. Global self-attention as a replacement for graph convolution. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.
  34. Transformers are rnns: Fast autoregressive transformers with linear attention. In International Conference on Machine Learning.
  35. Pure transformers are powerful graph learners. ArXiv, abs/2207.02505.
  36. Transformers generalize deepsets and can be extended to graphs and hypergraphs. ArXiv, abs/2110.14416.
  37. Semi-supervised classification with graph convolutional networks. ArXiv, abs/1609.02907.
  38. Reformer: The efficient transformer. ArXiv, abs/2001.04451.
  39. Knill, O. (2023). Spectral monotonicity of the hodge laplacian. ArXiv, abs/2304.00901.
  40. Rethinking graph transformers with spectral attention. ArXiv, abs/2106.03893.
  41. Sign and basis invariant networks for spectral graph representation learning. arXiv preprint arXiv:2202.13013.
  42. Laplacian canonization: A minimalist approach to sign and basis invariant spectral embedding. arXiv preprint arXiv:2310.18716.
  43. Graph inductive biases in transformers without message passing. ArXiv, abs/2305.17589.
  44. Provably powerful graph networks. ArXiv, abs/1905.11136.
  45. Invariant and equivariant graph networks. ArXiv, abs/1812.09902.
  46. Weisfeiler and leman go sparse: Towards scalable higher-order graph embeddings. In Advances in Neural Information Processing Systems.
  47. Attending to graph transformers. ArXiv, abs/2302.04181.
  48. Recipe for a general, powerful, scalable graph transformer. ArXiv, abs/2205.12454.
  49. Hodgenet: Graph neural networks for edge data. 2019 53rd Asilomar Conference on Signals, Systems, and Computers, pages 220–224.
  50. Efficient content-based sparse attention with routing transformers. Transactions of the Association for Computational Linguistics, 9:53–68.
  51. Representational strengths and limitations of transformers. ArXiv, abs/2306.02896.
  52. On universal equivariant set networks. ArXiv, abs/1910.02421.
  53. Exphormer: Sparse transformers for graphs. ArXiv, abs/2303.06147.
  54. Spielman, D. (2009). Spectral graph theory. In lecture notes.
  55. Attention is all you need. In NIPS.
  56. Graph attention networks. ArXiv, abs/1710.10903.
  57. Linformer: Self-attention with linear complexity. ArXiv, abs/2006.04768.
  58. How powerful are graph neural networks? ArXiv, abs/1810.00826.
  59. Convolutional learning on simplicial complexes. ArXiv, abs/2301.11163.
  60. Do transformers really perform bad for graph representation? In Neural Information Processing Systems.
  61. Big bird: Transformers for longer sequences. ArXiv, abs/2007.14062.
  62. Deep sets.
  63. From stars to subgraphs: Uplifting any gnn with local structure awareness. ArXiv, abs/2110.03753.
  64. Facilitating graph neural networks with random walk on simplicial complexes. In Thirty-seventh Conference on Neural Information Processing Systems.
  65. From relational pooling to subgraph gnns: A universal framework for more expressive graph neural networks. In International Conference on Machine Learning.
Citations (6)

Summary

We haven't generated a summary for this paper yet.