Polynormer: Polynomial-Expressive Graph Transformer in Linear Time (2403.01232v3)
Abstract: Graph transformers (GTs) have emerged as a promising architecture that is theoretically more expressive than message-passing graph neural networks (GNNs). However, typical GT models have at least quadratic complexity and thus cannot scale to large graphs. While there are several linear GTs recently proposed, they still lag behind GNN counterparts on several popular graph datasets, which poses a critical concern on their practical expressivity. To balance the trade-off between expressivity and scalability of GTs, we propose Polynormer, a polynomial-expressive GT model with linear complexity. Polynormer is built upon a novel base model that learns a high-degree polynomial on input features. To enable the base model permutation equivariant, we integrate it with graph topology and node features separately, resulting in local and global equivariant attention models. Consequently, Polynormer adopts a linear local-to-global attention scheme to learn high-degree equivariant polynomials whose coefficients are controlled by attention scores. Polynormer has been evaluated on $13$ homophilic and heterophilic datasets, including large graphs with millions of nodes. Our extensive experiment results show that Polynormer outperforms state-of-the-art GNN and GT baselines on most datasets, even without the use of nonlinear activation functions.
- On the bottleneck of graph neural networks and its practical implications. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=i80OPhOCVH2.
- Expressive power of invariant and equivariant graph neural networks. arXiv preprint arXiv:2006.15646, 2020.
- Specformer: Spectral graph neural networks meet transformers. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=0pdSt3oyJa1.
- Scaling graph neural networks with approximate pagerank. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2464–2473, 2020.
- Structure-aware transformer for graph representation learning. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp. 3469–3489. PMLR, 17–23 Jul 2022a. URL https://proceedings.mlr.press/v162/chen22r.html.
- Nagphormer: A tokenized graph transformer for node classification in large graphs. In The Eleventh International Conference on Learning Representations, 2022b.
- Simple and deep graph convolutional networks. In International conference on machine learning, pp. 1725–1735. PMLR, 2020.
- On the equivalence between graph isomorphism testing and function approximation with gnns. Advances in neural information processing systems, 32, 2019.
- Adaptive universal generalized pagerank graph neural network. arXiv preprint arXiv:2006.07988, 2020.
- From block-toeplitz matrices to differential equations on graphs: towards a general theory for scalable masked transformers. In International Conference on Machine Learning, 2021.
- P-nets: Deep polynomial neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7325–7335, 2020.
- Augmenting deep classifiers with polynomial neural networks. In European Conference on Computer Vision, pp. 692–716. Springer, 2022.
- How does over-squashing affect the power of gnns? arXiv preprint arXiv:2306.03589, 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=YicbFdNTTy.
- Graph neural networks with learnable structural and positional representations. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=wTTjnvGphYj.
- Fast graph representation learning with pytorch geometric. arXiv preprint arXiv:1903.02428, 2019.
- Predict then propagate: Graph neural networks meet personalized pagerank. arXiv preprint arXiv:1810.05997, 2018.
- Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141, 2018.
- Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems, 33:22118–22133, 2020.
- High-order pooling for graph neural networks with tensor decomposition. Advances in Neural Information Processing Systems, 35:6021–6033, 2022.
- Alexey Grigorevich Ivakhnenko. Polynomial theory of complex systems. IEEE transactions on Systems, Man, and Cybernetics, pp. 364–378, 1971.
- Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
- GOAT: A global transformer on large-scale graphs. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp. 17375–17390. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/kong23a.html.
- Rethinking graph transformers with spectral attention. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=huAdB-Tj4yG.
- Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
- Chien-Kuo Li. A sigma-pi-sigma neural network (spsnn). Neural Processing Letters, 17:1–19, 2003.
- Finding global homophily in graph neural networks when meeting heterophily. In International Conference on Machine Learning, pp. 13242–13256. PMLR, 2022.
- Large scale learning on non-homophilous graphs: New benchmarks and strong simple methods. Advances in Neural Information Processing Systems, 34:20887–20902, 2021.
- Graph inductive biases in transformers without message passing. ArXiv, abs/2305.17589, 2023.
- Invariant and equivariant graph networks. arXiv preprint arXiv:1812.09902, 2018.
- Provably powerful graph networks. Advances in neural information processing systems, 32, 2019.
- Simplifying approach to node classification in graph neural networks. Journal of Computational Science, 62:101695, 2022.
- Wiki-cs: A wikipedia-based benchmark for graph neural networks. arXiv preprint arXiv:2007.02901, 2020.
- Graph neural networks exponentially lose expressive power for node classification. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=S1ldO2EFPr.
- Tpugraphs: A performance prediction dataset on large tensor computational graphs. Advances in Neural Information Processing Systems, 36, 2024.
- A critical look at the evaluation of GNNs under heterophily: Are we really making progress? In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=tJbbQfw-5wv.
- Equivariant polynomials for graph neural networks. ArXiv, abs/2302.11556, 2023.
- cosformer: Rethinking softmax in attention. arXiv preprint arXiv:2202.08791, 2022.
- Recipe for a general, powerful, scalable graph transformer. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=lMMaNf6oxKM.
- Edge directionality improves learning on heterophilic graphs. arXiv preprint arXiv:2305.10498, 2023.
- Gradient gating for deep multi-rate learning on graphs. arXiv preprint arXiv:2210.00513, 2022.
- Ryoma Sato. A survey on the expressive power of graph neural networks. arXiv preprint arXiv:2003.04078, 2020.
- The pi-sigma network: An efficient higher-order neural network for pattern classification and function approximation. In IJCNN-91-Seattle international joint conference on neural networks, volume 1, pp. 13–18. IEEE, 1991.
- Exphormer: Sparse transformers for graphs. ArXiv, abs/2303.06147, 2023.
- Ordered gnn: Ordering message passing to deal with heterophily and over-smoothing. arXiv preprint arXiv:2302.01524, 2023.
- Marshall H Stone. The generalized weierstrass approximation theorem. Mathematics Magazine, 21(5):237–254, 1948.
- Attention is all you need. In Advances in Neural Information Processing Systems, 2017.
- Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
- Ridge polynomial networks in pattern recognition. In Proceedings EC-VIP-MC 2003. 4th EURASIP Conference focused on Video/Image Processing and Multimedia Communications (IEEE Cat. No. 03EX667), volume 2, pp. 519–524. IEEE, 2003.
- Deep graph library: A graph-centric, highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315, 2019.
- Dcn v2: Improved deep & cross network and practical lessons for web-scale learning to rank systems. In Proceedings of the web conference 2021, pp. 1785–1797, 2021.
- Nodeformer: A scalable graph structure learning transformer for node classification. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=sMezXGG5So.
- DIFFormer: Scalable (graph) transformers induced by energy constrained diffusion. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=j6zUzrapY3L.
- How powerful are graph neural networks? arXiv preprint arXiv:1810.00826, 2018.
- Two sides of the same coin: Heterophily and oversmoothing in graph convolutional neural networks. In 2022 IEEE International Conference on Data Mining (ICDM), pp. 1287–1292. IEEE, 2022.
- Do transformers really perform badly for graph representation? In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=OeWooOxFwDa.
- Are transformers universal approximators of sequence-to-sequence functions? arXiv preprint arXiv:1912.10077, 2019.
- Hierarchical graph transformer with adaptive node sampling. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=x3JsaghSj0v.
- Are more layers beneficial to graph transformers? In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=uagC-X9XMi8.
- Beyond homophily in graph neural networks: Current limitations and effective designs. Advances in neural information processing systems, 33:7793–7804, 2020.