Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Scalability of GNNs for Molecular Graphs (2404.11568v4)

Published 17 Apr 2024 in cs.LG

Abstract: Scaling deep learning models has been at the heart of recent revolutions in LLMling and image generation. Practitioners have observed a strong relationship between model size, dataset size, and performance. However, structure-based architectures such as Graph Neural Networks (GNNs) are yet to show the benefits of scale mainly due to the lower efficiency of sparse operations, large data requirements, and lack of clarity about the effectiveness of various architectures. We address this drawback of GNNs by studying their scaling behavior. Specifically, we analyze message-passing networks, graph Transformers, and hybrid architectures on the largest public collection of 2D molecular graphs. For the first time, we observe that GNNs benefit tremendously from the increasing scale of depth, width, number of molecules, number of labels, and the diversity in the pretraining datasets. We further demonstrate strong finetuning scaling behavior on 38 highly competitive downstream tasks, outclassing previous large models. This gives rise to MolGPS, a new graph foundation model that allows to navigate the chemical space, outperforming the previous state-of-the-arts on 26 out the 38 downstream tasks. We hope that our work paves the way for an era where foundational GNNs drive pharmaceutical drug discovery.

Examining the Scalability of Graph Neural Networks for Molecular Graphs

Introduction

The recent work under examination focuses extensively on the scalability of Graph Neural Networks (GNNs) for interpreting and predicting properties of molecular graphs. Despite the expansive use and success of GNNs in various domains, their ability to scale effectively, especially concerning molecular data for pharmaceutical applications, has been relatively unexplored. This paper addresses this gap by analyzing multiple GNN architectures on the largest public collection of 2D molecular graphs.

Methodology

The paper encompasses an array of GNN architectures including message-passing networks, graph Transformers, and hybrid models, evaluating them against a vast dataset comprising five million molecules with an extensive array of labels across various tasks:

  • Architectures: Three principal models were analyzed: MPNN++ (Message Passing Neural Networks), a graph Transformer, and a hybrid model integrating features of both previous models.
  • Dataset Preparation: Utilizing the LargeMix dataset, the research splits the molecular data into various tasks and labels, ensuring the diversity and comprehensiveness of the dataset suited for robust large-scale training.
  • Scaling Parameters: The paper explores scaling across several dimensions—model size (width), complexity (depth), amount of training data (number of molecules), diversity in the training data, and the number of labels.
  • Training and Evaluation: All models were assessed in a supervised pretraining setting followed by finetuning on downstream tasks, including 38 distinct benchmarks for molecular property prediction.

Key Results

The findings highlight significant enhancements in model performance with increased scale:

  • Performance Gains: There was a reported improvement of up to 30.25% when scaling models to 1 billion parameters and a 28.98% enhancement when the dataset size was expanded eightfold.
  • Depth and Width Effects: Both the depth and width of the models showed profound impacts on the model performance, reinforcing the benefit of larger and more complex models.
  • Data Scaling: Increasing the number of molecules consistently improved model performance across all architectures, with the graph Transformer and hybrid models benefiting the most in lower data regimes.

Implications and Future Work

The research undeniably pushes the boundary of GNN applications in drug discovery by showcasing the potential of scalability in molecular graph analysis. The practical implications for pharmaceutical industries are vast, potentially accelerating drug discovery processes and reducing costs through more effective predictive models.

Looking forward, the paper suggests further exploration into other scalability factors, like the optimization of aggregation functions in GNNs, which could reveal deeper insights into improving the efficiency and accuracy of these models in high-dimensional spaces.

Conclusion

In summary, this paper presents a thorough analysis of GNN scalability for molecular graphs, showing not just performance improvements with increased scale but also laying a groundwork for future research in the field. The exploration of varied architectural frameworks and extensive dataset provides a substantial foundation for advancing GNN applications in pharmaceuticals and other fields requiring molecular-level precision.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (91)
  1. Scaling laws for generative mixed-modal language models. arXiv preprint arXiv:2301.03728, 2023.
  2. Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35:23716–23736, 2022.
  3. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
  4. Explaining neural scaling laws. arXiv preprint arXiv:2102.06701, 2021.
  5. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, 2018.
  6. Directional graph networks. In International Conference on Machine Learning, pp.  748–758. PMLR, 2021.
  7. Towards foundational models for molecular learning on large-scale multi-task datasets. ICLR 2024, 2024.
  8. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. arXiv preprint arXiv:2104.13478, 2021.
  9. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
  10. Broken neural scaling laws. arXiv preprint arXiv:2210.14891, 2022.
  11. Learning large graph property prediction via graph segment training. arXiv preprint arXiv:2305.12322, 2023.
  12. Reproducible scaling laws for contrastive language-image learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  2818–2829, June 2023.
  13. Diffdock: Diffusion steps, twists, and turns for molecular docking, 2023.
  14. scgpt: Towards building a foundation model for single-cell multi-omics using generative ai. bioRxiv, 2023.
  15. Scaling laws do not scale. arXiv preprint arXiv:2307.03201, 2023.
  16. Setting the record straight on transformer oversmoothing. arXiv preprint arXiv:2401.04301, 2024.
  17. Graph neural networks with learnable structural and positional representations. CoRR, abs/2110.07875, 2021.
  18. In-context learning for few-shot molecular property prediction. arXiv preprint arXiv:2310.08863, 2023.
  19. Scaling laws for sparsely-connected foundation models. arXiv preprint arXiv:2309.08520, 2023.
  20. Towards foundation models for knowledge graph reasoning. arXiv preprint arXiv:2310.04562, 2023.
  21. Neural message passing for quantum chemistry. In International conference on machine learning, pp.  1263–1272. PMLR, 2017.
  22. Simple gnn regularisation for 3d molecular property prediction & beyond. arXiv preprint arXiv:2106.07971, 2021.
  23. Grindrod, P. Scaling laws for properties of random graphs that grow via successive combination. 03 2022.
  24. Few-shot graph learning for molecular property prediction. In Proceedings of the web conference 2021, pp.  2559–2567, 2021.
  25. Hamilton, W. L. Graph representation learning. Morgan & Claypool Publishers, 2020.
  26. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016a.
  27. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016b.
  28. Scaling laws for transfer, 2021.
  29. Therapeutics data commons: Machine learning datasets and tasks for drug discovery and development. Proceedings of Neural Information Processing Systems, NeurIPS Datasets and Benchmarks, 2021.
  30. A unified system for molecular property predictions: Oloren chemengine and its applications.
  31. Scaling laws for neural language models, 2020.
  32. Pubchem 2023 update. Nucleic acids research, 51(D1):D1373–D1380, 2023.
  33. Semi-supervised classification with graph convolutional networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017.
  34. Rethinking graph transformers with spectral attention. Advances in Neural Information Processing Systems, 34:21618–21629, 2021.
  35. Retrognn: Approximating retrosynthesis by graph neural networks for de novo drug design. arXiv preprint arXiv:2011.13042, 2020.
  36. Towards graph foundation models: A survey and beyond. arXiv preprint arXiv:2310.11829, 2023a.
  37. Graph positional and structural encoder, 2023b.
  38. Pre-training molecular graph representation with 3d geometry. arXiv preprint arXiv:2110.07728, 2021.
  39. Graph self-supervised learning: A survey. IEEE Transactions on Knowledge and Data Engineering, 35(6):5879–5900, 2022.
  40. Biogpt: generative pre-trained transformer for biomedical text generation and mining. Briefings in Bioinformatics, 23(6), September 2022. ISSN 1477-4054. doi: 10.1093/bib/bbac409.
  41. Molfm: A multimodal molecular foundation model, 2023.
  42. Large language models generate functional protein sequences across diverse families. Nature Biotechnology, pp.  1–8, 2023.
  43. Gps++: An optimised hybrid mpnn/transformer for molecular property prediction. arXiv preprint arXiv:2212.02229, 2022.
  44. Mole: A molecular foundation model for drug discovery. arXiv preprint arXiv:2211.02657, 2022.
  45. Generative molecular design in low data regimes. Nature Machine Intelligence, 2(3):171–180, 2020.
  46. Leveraging molecular structure and bioactivity with chemical language models for de novo drug design. Nature Communications, 14(1):114, 2023.
  47. Weisfeiler and leman go neural: Higher-order graph neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pp.  4602–4609, 2019.
  48. Mole: a molecular foundation model for drug discovery, 2022.
  49. Attending to graph transformers. arXiv preprint arXiv:2302.04181, 2023.
  50. Progen2: exploring the boundaries of protein language models. Cell Systems, 14(11):968–978, 2023.
  51. Admet property prediction through combinations of molecular fingerprints. arXiv preprint arXiv:2310.00174, 2023.
  52. OpenAI. Gpt-4 technical report, 2023.
  53. The role and potential of computer-aided drug discovery strategies in the discovery of novel antimicrobials. Computers in biology and medicine, pp.  107927, 2024.
  54. Pražnikar, J. Scaling laws of graphs of 3d protein structures. Journal of bioinformatics and computational biology, 19(03):2050050, 2021.
  55. Enhancing admet property models performance through combinatorial fusion analysis. 2023.
  56. Improving language understanding by generative pre-training. 2018.
  57. Language models are unsupervised multitask learners. 2019.
  58. Zero-shot text-to-image generation. ICML 2021, 2021.
  59. Recipe for a general, powerful, scalable graph transformer. Advances in Neural Information Processing Systems, 35:14501–14515, 2022.
  60. Transformer protein language models are unsupervised structure learners. bioRxiv, 2020. doi: 10.1101/2020.12.15.422761.
  61. High-resolution image synthesis with latent diffusion models. CVPR 2022, 2022.
  62. Self-supervised graph transformer on large-scale molecular data. Advances in Neural Information Processing Systems, 33:12559–12571, 2020.
  63. Materials design and discovery with high-throughput density functional theory: the open quantum materials database (oqmd). Jom, 65:1501–1509, 2013.
  64. Correlation coefficients: appropriate use and interpretation. Anesthesia & analgesia, 126(5):1763–1768, 2018.
  65. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958, 2014.
  66. Highway networks. arXiv preprint arXiv:1505.00387, 2015.
  67. 3d infomax improves gnns for molecular property prediction. ICML 2022, 2022a.
  68. Equibind: Geometric deep learning for drug binding structure prediction. ICML 2022, 2022b.
  69. Large-scale representation learning on graphs via bootstrapping. arXiv preprint arXiv:2102.06514, 2021.
  70. Llama 2: Open foundation and fine-tuned chat models, 2023.
  71. Learning functional properties of proteins with language models. Nature Machine Intelligence, 4(3):227–245, 2022.
  72. Comprehensive characterization of cytochrome p450 isozyme selectivity across chemical libraries. Nature biotechnology, 27(11):1050–1055, 2009.
  73. Graph attention networks. ICLR 2018, 2017.
  74. Walters, P. We need better benchmarks for machine learning in drug discovery, 2023. 2024-01-18.
  75. Scientific discovery in the age of artificial intelligence. Nature, 620(7972):47–60, 2023a.
  76. Graph Neural Networks for Molecules, pp.  21–66. Springer International Publishing, 2023b. ISBN 9783031371967. doi: 10.1007/978-3-031-37196-7_2.
  77. Moleculenet: a benchmark for molecular machine learning. Chemical science, 9(2):513–530, 2018.
  78. How powerful are graph neural networks? ICLR, 2019.
  79. How neural networks extrapolate: From feedforward to graph neural networks. ICLR, 2021a.
  80. Self-supervised graph-level representation learning with local and global structure. In International Conference on Machine Learning, pp.  11548–11558. PMLR, 2021b.
  81. Graph and geometry generative modeling for drug discovery. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp.  5833–5834, 2023.
  82. Tensor programs v: Tuning large neural networks via zero-shot hyperparameter transfer, 2022.
  83. Tensor programs vi: Feature learning in infinite-depth neural networks, 2023.
  84. Analyzing learned molecular representations for property prediction. Journal of chemical information and modeling, 59(8):3370–3388, 2019.
  85. Do transformers really perform badly for graph representation? Advances in Neural Information Processing Systems, 34:28877–28888, 2021a.
  86. Do transformers really perform badly for graph representation? Advances in Neural Information Processing Systems, 34:28877–28888, 2021b.
  87. Hierarchical graph representation learning with differentiable pooling. Advances in neural information processing systems, 31, 2018.
  88. Yuan, Y. On the power of foundation models. In International Conference on Machine Learning, pp.  40519–40530. PMLR, 2023.
  89. Graph transformer networks. Advances in neural information processing systems, 32, 2019.
  90. Artificial intelligence for science in quantum, atomistic, and continuum systems. arXiv preprint arXiv:2307.08423, 2023.
  91. Graph neural networks: A review of methods and applications. AI open, 1:57–81, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Maciej Sypetkowski (9 papers)
  2. Frederik Wenkel (14 papers)
  3. Farimah Poursafaei (11 papers)
  4. Nia Dickson (1 paper)
  5. Karush Suri (12 papers)
  6. Philip Fradkin (3 papers)
  7. Dominique Beaini (27 papers)
Citations (7)
Youtube Logo Streamline Icon: https://streamlinehq.com