Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Transformers are efficient hierarchical chemical graph learners (2310.01704v1)

Published 2 Oct 2023 in cs.LG

Abstract: Transformers, adapted from natural language processing, are emerging as a leading approach for graph representation learning. Contemporary graph transformers often treat nodes or edges as separate tokens. This approach leads to computational challenges for even moderately-sized graphs due to the quadratic scaling of self-attention complexity with token count. In this paper, we introduce SubFormer, a graph transformer that operates on subgraphs that aggregate information by a message-passing mechanism. This approach reduces the number of tokens and enhances learning long-range interactions. We demonstrate SubFormer on benchmarks for predicting molecular properties from chemical structures and show that it is competitive with state-of-the-art graph transformers at a fraction of the computational cost, with training times on the order of minutes on a consumer-grade graphics card. We interpret the attention weights in terms of chemical structures. We show that SubFormer exhibits limited over-smoothing and avoids over-squashing, which is prevalent in traditional graph neural networks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. Probing graph representations. In International Conference on Artificial Intelligence and Statistics, pages 11630–11649. PMLR, 2023.
  2. On the Bottleneck of Graph Neural Networks and its Practical Implications. arXiv e-prints, art. arXiv:2006.05205, June 2020. doi: 10.48550/arXiv.2006.05205.
  3. Expressive Power of Invariant and Equivariant Graph Neural Networks. arXiv e-prints, art. arXiv:2006.15646, June 2020. doi: 10.48550/arXiv.2006.15646.
  4. László Babai. Lectures on graph isomorphism. Mimeographed lecture notes, 1977.
  5. Random graph isomorphism. SIAM Journal on Computing, 9(3):628–635, 1980. doi: 10.1137/0209047. URL https://doi.org/10.1137/0209047.
  6. How attentive are graph attention networks? arXiv preprint arXiv:2105.14491, 2021.
  7. Rethinking Attention with Performers. arXiv e-prints, art. arXiv:2009.14794, September 2020. doi: 10.48550/arXiv.2009.14794.
  8. On Over-Squashing in Message Passing Neural Networks: The Impact of Width, Depth, and Topology. arXiv e-prints, art. arXiv:2302.02941, February 2023. doi: 10.48550/arXiv.2302.02941.
  9. Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth. arXiv e-prints, art. arXiv:2103.03404, March 2021. doi: 10.48550/arXiv.2103.03404.
  10. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv e-prints, art. arXiv:2010.11929, October 2020. doi: 10.48550/arXiv.2010.11929.
  11. A Generalization of Transformer Networks to Graphs. arXiv e-prints, art. arXiv:2012.09699, December 2020. doi: 10.48550/arXiv.2012.09699.
  12. Long range graph benchmark. Advances in Neural Information Processing Systems, 35:22326–22340, 2022.
  13. Geometry-enhanced molecular representation learning for property prediction. Nature Machine Intelligence, 4(2):127–134, 2022.
  14. Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
  15. Hierarchical Inter-Message Passing for Learning on Molecular Graphs. arXiv e-prints, art. arXiv:2006.12179, June 2020. doi: 10.48550/arXiv.2006.12179.
  16. Neural message passing for quantum chemistry. In International conference on machine learning, pages 1263–1272. PMLR, 2017.
  17. Anti-symmetric dgn: A stable architecture for deep graph networks. arXiv preprint arXiv:2210.09789, 2022.
  18. Exploring network structure, dynamics, and function using networkx. In Gaël Varoquaux, Travis Vaught, and Jarrod Millman, editors, Proceedings of the 7th Python in Science Conference, pages 11 – 15, Pasadena, CA USA, 2008.
  19. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.
  20. Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems, 33:22118–22133, 2020.
  21. Describing Graphs: A First-Order Approach to Graph Canonization, pages 59–81. Springer New York, New York, NY, 1990. doi: 10.1007/978-1-4612-4478-3_5. URL https://doi.org/10.1007/978-1-4612-4478-3_5.
  22. Zinc: a free tool to discover chemistry for biology. Journal of chemical information and modeling, 52(7):1757–1768, 2012.
  23. Junction Tree Variational Autoencoder for Molecular Graph Generation. arXiv e-prints, art. arXiv:1802.04364, February 2018. doi: 10.48550/arXiv.1802.04364.
  24. Richard M. Karp. Reducibility among Combinatorial Problems, pages 85–103. Springer US, Boston, MA, 1972. doi: 10.1007/978-1-4684-2001-2_9. URL https://doi.org/10.1007/978-1-4684-2001-2_9.
  25. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning, pages 5156–5165. PMLR, 2020.
  26. Sandra Kiefer. Power and limits of the Weisfeiler-Leman algorithm. PhD thesis, RWTH Aachen University, 2020. URL https://publications.rwth-aachen.de/record/785831.
  27. Pure Transformers are Powerful Graph Learners. arXiv e-prints, art. arXiv:2207.02505, July 2022. doi: 10.48550/arXiv.2207.02505.
  28. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
  29. A Survey on Oversmoothing in Graph Neural Networks. arXiv e-prints, art. arXiv:2303.10993, March 2023. doi: 10.48550/arXiv.2303.10993.
  30. Rethinking graph transformers with spectral attention. Advances in Neural Information Processing Systems, 34:21618–21629, 2021.
  31. Rethinking Graph Transformers with Spectral Attention. arXiv e-prints, art. arXiv:2106.03893, June 2021. doi: 10.48550/arXiv.2106.03893.
  32. Ludek Kucera. Canonical labeling of regular graphs in linear average time. In 28th Annual Symposium on Foundations of Computer Science (sfcs 1987), pages 271–279, 1987. doi: 10.1109/SFCS.1987.11.
  33. The computational materials repository. Computing in Science & Engineering, 14(6):51–57, 2012.
  34. Layer Normalization. arXiv e-prints, art. arXiv:1607.06450, July 2016. doi: 10.48550/arXiv.1607.06450.
  35. Thirty years of density functional theory in computational chemistry: an overview and extensive assessment of 200 density functionals. Molecular physics, 115(19):2315–2372, 2017.
  36. Invariant and Equivariant Graph Networks. arXiv e-prints, art. arXiv:1812.09902, December 2018. doi: 10.48550/arXiv.1812.09902.
  37. Provably Powerful Graph Networks. arXiv e-prints, art. arXiv:1905.11136, May 2019. doi: 10.48550/arXiv.1905.11136.
  38. H. L. Morgan. The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. Journal of Chemical Documentation, 5(2):107–113, May 1965. ISSN 0021-9576, 1541-5732. doi: 10.1021/c160017a018. URL https://pubs.acs.org/doi/abs/10.1021/c160017a018.
  39. Grpe: Relative positional encoding for graph transformer. arXiv preprint arXiv:2201.12787, 2022.
  40. Fractional isomorphism of graphs. Discrete Mathematics, 132(1-3):247–265, September 1994. ISSN 0012365X. doi: 10.1016/0012-365X(94)90241-0. URL https://linkinghub.elsevier.com/retrieve/pii/0012365X94902410.
  41. Recipe for a general, powerful, scalable graph transformer. Advances in Neural Information Processing Systems, 35:14501–14515, 2022.
  42. Recipe for a General, Powerful, Scalable Graph Transformer. arXiv e-prints, art. arXiv:2205.12454, May 2022. doi: 10.48550/arXiv.2205.12454.
  43. Revisiting Over-smoothing in BERT from the Perspective of Graph. arXiv e-prints, art. arXiv:2202.08625, February 2022. doi: 10.48550/arXiv.2202.08625.
  44. Autobahn: Automorphism-based graph neural nets. Advances in Neural Information Processing Systems, 34:29922–29934, 2021.
  45. Understanding over-squashing and bottlenecks on graphs via curvature. arXiv e-prints, art. arXiv:2111.14522, November 2021. doi: 10.48550/arXiv.2111.14522.
  46. Attention Is All You Need. arXiv e-prints, art. arXiv:1706.03762, June 2017. doi: 10.48550/arXiv.1706.03762.
  47. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
  48. Equivariant and Stable Positional Encoding for More Powerful Graph Neural Networks. arXiv e-prints, art. arXiv:2203.00199, February 2022. doi: 10.48550/arXiv.2203.00199.
  49. The reduction of a graph to canonical form and the algebra which appears therein. Nauchno-Technicheskaya Informatsiya,, 1968.
  50. Gaussian processes for machine learning, volume 2. MIT press Cambridge, MA, 2006.
  51. Moleculenet: a benchmark for molecular machine learning. Chemical science, 9(2):513–530, 2018.
  52. Understanding and Improving Layer Normalization. arXiv e-prints, art. arXiv:1911.07013, November 2019. doi: 10.48550/arXiv.1911.07013.
  53. How Powerful are Graph Neural Networks? arXiv e-prints, art. arXiv:1810.00826, October 2018. doi: 10.48550/arXiv.1810.00826.
  54. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826, 2018.
  55. Do Transformers Really Perform Bad for Graph Representation? arXiv e-prints, art. arXiv:2106.05234, June 2021. doi: 10.48550/arXiv.2106.05234.
  56. Scaling Vision Transformers. arXiv e-prints, art. arXiv:2106.04560, June 2021. doi: 10.48550/arXiv.2106.04560.
  57. Are More Layers Beneficial to Graph Transformers? arXiv e-prints, art. arXiv:2303.00579, March 2023. doi: 10.48550/arXiv.2303.00579.
  58. On Structural Expressive Power of Graph Transformers. arXiv e-prints, art. arXiv:2305.13987, May 2023. doi: 10.48550/arXiv.2305.13987.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Zihan Pengmei (6 papers)
  2. Zimu Li (14 papers)
  3. Chih-chan Tien (5 papers)
  4. Risi Kondor (38 papers)
  5. Aaron R. Dinner (42 papers)