Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Uncovering Neural Scaling Laws in Molecular Representation Learning (2309.15123v2)

Published 15 Sep 2023 in physics.chem-ph and cs.LG

Abstract: Molecular Representation Learning (MRL) has emerged as a powerful tool for drug and materials discovery in a variety of tasks such as virtual screening and inverse design. While there has been a surge of interest in advancing model-centric techniques, the influence of both data quantity and quality on molecular representations is not yet clearly understood within this field. In this paper, we delve into the neural scaling behaviors of MRL from a data-centric viewpoint, examining four key dimensions: (1) data modalities, (2) dataset splitting, (3) the role of pre-training, and (4) model capacity. Our empirical studies confirm a consistent power-law relationship between data volume and MRL performance across these dimensions. Additionally, through detailed analysis, we identify potential avenues for improving learning efficiency. To challenge these scaling laws, we adapt seven popular data pruning strategies to molecular data and benchmark their performance. Our findings underline the importance of data-centric MRL and highlight possible directions for future research.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (88)
  1. Materials discovery and design using machine learning. Journal of Materiomics, 3(3):159–177, 2017.
  2. Molecular property prediction: recent trends in the era of artificial intelligence. Drug Discovery Today: Technologies, 32:29–36, 2019.
  3. A compact review of molecular property prediction with graph neural networks. Drug Discovery Today: Technologies, 37:1–12, 2020.
  4. Artificial intelligence for science in quantum, atomistic, and continuum systems. arXiv preprint arXiv:2307.08423, 2023.
  5. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826, 2018.
  6. Neural message passing for quantum chemistry. In International conference on machine learning, pages 1263–1272. PMLR, 2017.
  7. Extended-connectivity fingerprints. Journal of chemical information and modeling, 50(5):742–754, 2010.
  8. Principal neighbourhood aggregation for graph nets. Advances in Neural Information Processing Systems, 33:13260–13271, 2020.
  9. Deepergcn: All you need to train deeper gcns. arXiv preprint arXiv:2006.07739, 2020.
  10. Directional graph networks. In International Conference on Machine Learning, pages 748–758. PMLR, 2021.
  11. Weisfeiler and lehman go cellular: Cw networks. Advances in Neural Information Processing Systems, 34:2625–2640, 2021.
  12. Do transformers really perform badly for graph representation? Advances in Neural Information Processing Systems, 34:28877–28888, 2021.
  13. Deep molecular representation learning via fusing physical and chemical information. Advances in Neural Information Processing Systems, 34:16346–16357, 2021.
  14. Improving graph neural network expressivity via subgraph isomorphism counting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1):657–668, 2022.
  15. Molecular representation learning via heterogeneous motif graph neural networks. In International Conference on Machine Learning, pages 25581–25594. PMLR, 2022.
  16. Featurizations matter: A multiview contrastive learning approach to molecular pretraining. In ICML 2022 2nd AI for Science Workshop.
  17. Molgensurvey: A systematic survey in machine learning models for molecule design. arXiv preprint arXiv:2203.14500, 2022a.
  18. Deep learning scaling is predictable, empirically. arXiv preprint arXiv:1712.00409, 2017.
  19. Scaling laws for a multi-agent reinforcement learning model. arXiv preprint arXiv:2210.00849, 2022.
  20. Scaling laws for deep learning based image reconstruction. arXiv preprint arXiv:2209.13435, 2022.
  21. Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE international conference on computer vision, pages 843–852, 2017.
  22. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
  23. Harry L Morgan. The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service. Journal of chemical documentation, 5(2):107–113, 1965.
  24. Strategies for pre-training graph neural networks. arXiv preprint arXiv:1905.12265, 2019.
  25. David Weininger. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences, 28(1):31–36, 1988.
  26. Chemberta: Large-scale self-supervised pretraining for molecular property prediction. arXiv preprint arXiv:2010.09885, 2020.
  27. Smiles-bert: large scale unsupervised pre-training for molecular property prediction. In Proceedings of the 10th ACM international conference on bioinformatics, computational biology and health informatics, pages 429–436, 2019.
  28. Large-scale chemical language representations capture molecular structure and properties. Nature Machine Intelligence, 4(12):1256–1264, 2022.
  29. Molecular contrastive learning of representations via graph neural networks. Nature Machine Intelligence, 4(3):279–287, 2022.
  30. Pre-training molecular graph representation with 3d geometry. arXiv preprint arXiv:2110.07728, 2021a.
  31. Geometry-enhanced molecular representation learning for property prediction. Nature Machine Intelligence, 4(2):127–134, 2022.
  32. Geomgcl: Geometric graph contrastive learning for molecular property prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 4541–4549, 2022.
  33. Graphmae: Self-supervised masked graph autoencoders. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 594–604, 2022.
  34. Unified 2d and 3d pre-training of molecular representations. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2626–2636, 2022.
  35. Coresets for data-efficient training of machine learning models. In International Conference on Machine Learning, pages 6950–6960. PMLR, 2020.
  36. Coverage-centric coreset selection for high pruning rates. In The Eleventh International Conference on Learning Representations, 2022.
  37. Moderate coreset: A universal method of data selection for real-world data-efficient deep learning. In The Eleventh International Conference on Learning Representations, 2022.
  38. Moleculenet: a benchmark for molecular machine learning. Chemical science, 9(2):513–530, 2018.
  39. AIDS Antiviral Screen Data. URL https://wiki.nci.nih.gov/display/NCIDTPdata/AIDS+Antiviral+Screen+Data.
  40. Maximum Unbiased Validation (MUV) Data Sets for Virtual Screening Based on PubChem Bioactivity Data. J. Chem. Inf. Model., 49(2):169–184, 2009.
  41. Pubchem’s bioassay database. Nucleic acids research, 40(D1):D400–D412, 2012.
  42. Enumeration of 166 billion organic small molecules in the chemical universe database gdb-17. Journal of chemical information and modeling, 52(11):2864–2875, 2012.
  43. Schnet: A continuous-filter convolutional neural network for modeling quantum interactions. Advances in neural information processing systems, 30, 2017.
  44. rdkit/rdkit: 2022_03_2 (q1 2022) release, 2022.
  45. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  46. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249–256. JMLR Workshop and Conference Proceedings, 2010.
  47. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  48. Four types of learning curves. Neural Computation, 4(4):605–618, 1992.
  49. Pubchemqc project: a large-scale first-principles electronic structure database for data-driven chemistry. Journal of chemical information and modeling, 57(6):1300–1308, 2017.
  50. Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems, 33:22118–22133, 2020.
  51. Equivariant message passing for the prediction of tensorial properties and molecular spectra. In International Conference on Machine Learning, pages 9377–9388. PMLR, 2021.
  52. Spherical message passing for 3d graph networks. arXiv preprint arXiv:2102.05013, 2021b.
  53. A systematic survey of chemical pre-trained models.
  54. Scaling laws for transfer. arXiv preprint arXiv:2102.01293, 2021.
  55. Deep learning for the life sciences: applying deep learning to genomics, microscopy, drug discovery, and more. " O’Reilly Media, Inc.", 2019.
  56. Super-samples from kernel herding. arXiv preprint arXiv:1203.3472, 2012.
  57. A sequential algorithm for training text classifiers. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1994.
  58. An empirical study of example forgetting during deep neural network learning. arXiv preprint arXiv:1812.05159, 2018.
  59. Deep learning on a data diet: Finding important examples early in training. Advances in Neural Information Processing Systems, 34:20596–20607, 2021.
  60. Beyond neural scaling laws: beating power law scaling via data pruning. Advances in Neural Information Processing Systems, 35:19523–19536, 2022.
  61. A Reduction of a Graph to a Canonical Form and an Algebra Arising During This Reduction. Nauchno-Technicheskaya Informatsia, 2(9):12–16, 1968.
  62. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  63. Fast graph representation learning with pytorch geometric. arXiv preprint arXiv:1903.02428, 2019.
  64. Greg Landrum et al. Rdkit: Open-source cheminformatics software. 2016. URL http://www. rdkit. org/, https://github. com/rdkit/rdkit, 149(150):650, 2016.
  65. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771, 2019.
  66. Quantum chemistry structures and properties of 134 kilo molecules. Scientific Data, 1, 2014.
  67. Deepcore: A comprehensive library for coreset selection in deep learning. In Database and Expert Systems Applications: 33rd International Conference, DEXA 2022, Vienna, Austria, August 22–24, 2022, Proceedings, Part I, pages 181–195. Springer, 2022.
  68. Deep learning in virtual screening: recent applications and developments. International Journal of Molecular Sciences, 22(9):4435, 2021.
  69. Machine learning of accurate energy-conserving molecular force fields. Science advances, 3(5):e1603015, 2017.
  70. Directional message passing for molecular graphs. arXiv preprint arXiv:2003.03123, 2020.
  71. Gemnet: Universal directional graph neural networks for molecules. Advances in Neural Information Processing Systems, 34:6790–6802, 2021.
  72. E (n) equivariant graph neural networks. In International conference on machine learning, pages 9323–9332. PMLR, 2021.
  73. Equivariant graph neural networks for 3d macromolecular structure. arXiv preprint arXiv:2106.03843, 2021.
  74. Se (3) equivariant graph neural networks with complete local frames. In International Conference on Machine Learning, pages 5583–5608. PMLR, 2022b.
  75. A new perspective on building efficient and expressive 3d equivariant graph neural networks. arXiv preprint arXiv:2304.04757, 2023.
  76. e3nn: Euclidean neural networks. arXiv preprint arXiv:2207.09453, 2022.
  77. A functional approach to rotation equivariant non-linearities for tensor field networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13174–13183, 2021.
  78. Se (3)-transformers: 3d roto-translation equivariant attention networks. Advances in neural information processing systems, 33:1970–1981, 2020.
  79. Equiformer: Equivariant graph attention transformer for 3d atomistic graphs. arXiv preprint arXiv:2206.11990, 2022.
  80. Mace: Higher order equivariant message passing neural networks for fast and accurate force fields. Advances in Neural Information Processing Systems, 35:11423–11436, 2022.
  81. David Haussler. Quantifying inductive bias: Ai learning algorithms and valiant’s learning framework. Artificial intelligence, 36(2):177–221, 1988.
  82. A general lower bound on the number of examples needed for learning. Information and Computation, 82(3):247–261, 1989.
  83. Learnability and the vapnik-chervonenkis dimension. Journal of the ACM (JACM), 36(4):929–965, 1989.
  84. Rigorous learning curve bounds from statistical mechanics. In Proceedings of the seventh annual conference on Computational learning theory, pages 76–87, 1994.
  85. Scaling to very very large corpora for natural language disambiguation. In Proceedings of the 39th annual meeting of the Association for Computational Linguistics, pages 26–33, 2001.
  86. Reproducible scaling laws for contrastive language-image learning. arXiv preprint arXiv:2212.07143, 2022.
  87. Broken neural scaling laws. arXiv preprint arXiv:2210.14891, 2022.
  88. Neural scaling of deep chemical models. 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Dingshuo Chen (10 papers)
  2. Yanqiao Zhu (45 papers)
  3. Jieyu Zhang (63 papers)
  4. Yuanqi Du (52 papers)
  5. Zhixun Li (17 papers)
  6. Qiang Liu (405 papers)
  7. Shu Wu (109 papers)
  8. Liang Wang (512 papers)
Citations (12)

Summary

We haven't generated a summary for this paper yet.