Transformers are efficient hierarchical chemical graph learners (2310.01704v1)
Abstract: Transformers, adapted from natural language processing, are emerging as a leading approach for graph representation learning. Contemporary graph transformers often treat nodes or edges as separate tokens. This approach leads to computational challenges for even moderately-sized graphs due to the quadratic scaling of self-attention complexity with token count. In this paper, we introduce SubFormer, a graph transformer that operates on subgraphs that aggregate information by a message-passing mechanism. This approach reduces the number of tokens and enhances learning long-range interactions. We demonstrate SubFormer on benchmarks for predicting molecular properties from chemical structures and show that it is competitive with state-of-the-art graph transformers at a fraction of the computational cost, with training times on the order of minutes on a consumer-grade graphics card. We interpret the attention weights in terms of chemical structures. We show that SubFormer exhibits limited over-smoothing and avoids over-squashing, which is prevalent in traditional graph neural networks.
- Probing graph representations. In International Conference on Artificial Intelligence and Statistics, pages 11630–11649. PMLR, 2023.
- On the Bottleneck of Graph Neural Networks and its Practical Implications. arXiv e-prints, art. arXiv:2006.05205, June 2020. doi: 10.48550/arXiv.2006.05205.
- Expressive Power of Invariant and Equivariant Graph Neural Networks. arXiv e-prints, art. arXiv:2006.15646, June 2020. doi: 10.48550/arXiv.2006.15646.
- László Babai. Lectures on graph isomorphism. Mimeographed lecture notes, 1977.
- Random graph isomorphism. SIAM Journal on Computing, 9(3):628–635, 1980. doi: 10.1137/0209047. URL https://doi.org/10.1137/0209047.
- How attentive are graph attention networks? arXiv preprint arXiv:2105.14491, 2021.
- Rethinking Attention with Performers. arXiv e-prints, art. arXiv:2009.14794, September 2020. doi: 10.48550/arXiv.2009.14794.
- On Over-Squashing in Message Passing Neural Networks: The Impact of Width, Depth, and Topology. arXiv e-prints, art. arXiv:2302.02941, February 2023. doi: 10.48550/arXiv.2302.02941.
- Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth. arXiv e-prints, art. arXiv:2103.03404, March 2021. doi: 10.48550/arXiv.2103.03404.
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv e-prints, art. arXiv:2010.11929, October 2020. doi: 10.48550/arXiv.2010.11929.
- A Generalization of Transformer Networks to Graphs. arXiv e-prints, art. arXiv:2012.09699, December 2020. doi: 10.48550/arXiv.2012.09699.
- Long range graph benchmark. Advances in Neural Information Processing Systems, 35:22326–22340, 2022.
- Geometry-enhanced molecular representation learning for property prediction. Nature Machine Intelligence, 4(2):127–134, 2022.
- Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
- Hierarchical Inter-Message Passing for Learning on Molecular Graphs. arXiv e-prints, art. arXiv:2006.12179, June 2020. doi: 10.48550/arXiv.2006.12179.
- Neural message passing for quantum chemistry. In International conference on machine learning, pages 1263–1272. PMLR, 2017.
- Anti-symmetric dgn: A stable architecture for deep graph networks. arXiv preprint arXiv:2210.09789, 2022.
- Exploring network structure, dynamics, and function using networkx. In Gaël Varoquaux, Travis Vaught, and Jarrod Millman, editors, Proceedings of the 7th Python in Science Conference, pages 11 – 15, Pasadena, CA USA, 2008.
- Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.
- Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems, 33:22118–22133, 2020.
- Describing Graphs: A First-Order Approach to Graph Canonization, pages 59–81. Springer New York, New York, NY, 1990. doi: 10.1007/978-1-4612-4478-3_5. URL https://doi.org/10.1007/978-1-4612-4478-3_5.
- Zinc: a free tool to discover chemistry for biology. Journal of chemical information and modeling, 52(7):1757–1768, 2012.
- Junction Tree Variational Autoencoder for Molecular Graph Generation. arXiv e-prints, art. arXiv:1802.04364, February 2018. doi: 10.48550/arXiv.1802.04364.
- Richard M. Karp. Reducibility among Combinatorial Problems, pages 85–103. Springer US, Boston, MA, 1972. doi: 10.1007/978-1-4684-2001-2_9. URL https://doi.org/10.1007/978-1-4684-2001-2_9.
- Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning, pages 5156–5165. PMLR, 2020.
- Sandra Kiefer. Power and limits of the Weisfeiler-Leman algorithm. PhD thesis, RWTH Aachen University, 2020. URL https://publications.rwth-aachen.de/record/785831.
- Pure Transformers are Powerful Graph Learners. arXiv e-prints, art. arXiv:2207.02505, July 2022. doi: 10.48550/arXiv.2207.02505.
- Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
- A Survey on Oversmoothing in Graph Neural Networks. arXiv e-prints, art. arXiv:2303.10993, March 2023. doi: 10.48550/arXiv.2303.10993.
- Rethinking graph transformers with spectral attention. Advances in Neural Information Processing Systems, 34:21618–21629, 2021.
- Rethinking Graph Transformers with Spectral Attention. arXiv e-prints, art. arXiv:2106.03893, June 2021. doi: 10.48550/arXiv.2106.03893.
- Ludek Kucera. Canonical labeling of regular graphs in linear average time. In 28th Annual Symposium on Foundations of Computer Science (sfcs 1987), pages 271–279, 1987. doi: 10.1109/SFCS.1987.11.
- The computational materials repository. Computing in Science & Engineering, 14(6):51–57, 2012.
- Layer Normalization. arXiv e-prints, art. arXiv:1607.06450, July 2016. doi: 10.48550/arXiv.1607.06450.
- Thirty years of density functional theory in computational chemistry: an overview and extensive assessment of 200 density functionals. Molecular physics, 115(19):2315–2372, 2017.
- Invariant and Equivariant Graph Networks. arXiv e-prints, art. arXiv:1812.09902, December 2018. doi: 10.48550/arXiv.1812.09902.
- Provably Powerful Graph Networks. arXiv e-prints, art. arXiv:1905.11136, May 2019. doi: 10.48550/arXiv.1905.11136.
- H. L. Morgan. The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. Journal of Chemical Documentation, 5(2):107–113, May 1965. ISSN 0021-9576, 1541-5732. doi: 10.1021/c160017a018. URL https://pubs.acs.org/doi/abs/10.1021/c160017a018.
- Grpe: Relative positional encoding for graph transformer. arXiv preprint arXiv:2201.12787, 2022.
- Fractional isomorphism of graphs. Discrete Mathematics, 132(1-3):247–265, September 1994. ISSN 0012365X. doi: 10.1016/0012-365X(94)90241-0. URL https://linkinghub.elsevier.com/retrieve/pii/0012365X94902410.
- Recipe for a general, powerful, scalable graph transformer. Advances in Neural Information Processing Systems, 35:14501–14515, 2022.
- Recipe for a General, Powerful, Scalable Graph Transformer. arXiv e-prints, art. arXiv:2205.12454, May 2022. doi: 10.48550/arXiv.2205.12454.
- Revisiting Over-smoothing in BERT from the Perspective of Graph. arXiv e-prints, art. arXiv:2202.08625, February 2022. doi: 10.48550/arXiv.2202.08625.
- Autobahn: Automorphism-based graph neural nets. Advances in Neural Information Processing Systems, 34:29922–29934, 2021.
- Understanding over-squashing and bottlenecks on graphs via curvature. arXiv e-prints, art. arXiv:2111.14522, November 2021. doi: 10.48550/arXiv.2111.14522.
- Attention Is All You Need. arXiv e-prints, art. arXiv:1706.03762, June 2017. doi: 10.48550/arXiv.1706.03762.
- Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
- Equivariant and Stable Positional Encoding for More Powerful Graph Neural Networks. arXiv e-prints, art. arXiv:2203.00199, February 2022. doi: 10.48550/arXiv.2203.00199.
- The reduction of a graph to canonical form and the algebra which appears therein. Nauchno-Technicheskaya Informatsiya,, 1968.
- Gaussian processes for machine learning, volume 2. MIT press Cambridge, MA, 2006.
- Moleculenet: a benchmark for molecular machine learning. Chemical science, 9(2):513–530, 2018.
- Understanding and Improving Layer Normalization. arXiv e-prints, art. arXiv:1911.07013, November 2019. doi: 10.48550/arXiv.1911.07013.
- How Powerful are Graph Neural Networks? arXiv e-prints, art. arXiv:1810.00826, October 2018. doi: 10.48550/arXiv.1810.00826.
- How powerful are graph neural networks? arXiv preprint arXiv:1810.00826, 2018.
- Do Transformers Really Perform Bad for Graph Representation? arXiv e-prints, art. arXiv:2106.05234, June 2021. doi: 10.48550/arXiv.2106.05234.
- Scaling Vision Transformers. arXiv e-prints, art. arXiv:2106.04560, June 2021. doi: 10.48550/arXiv.2106.04560.
- Are More Layers Beneficial to Graph Transformers? arXiv e-prints, art. arXiv:2303.00579, March 2023. doi: 10.48550/arXiv.2303.00579.
- On Structural Expressive Power of Graph Transformers. arXiv e-prints, art. arXiv:2305.13987, May 2023. doi: 10.48550/arXiv.2305.13987.
- Zihan Pengmei (6 papers)
- Zimu Li (14 papers)
- Chih-chan Tien (5 papers)
- Risi Kondor (38 papers)
- Aaron R. Dinner (42 papers)