Pushing the Limits of All-Atom Geometric Graph Neural Networks: Pre-Training, Scaling and Zero-Shot Transfer (2410.21683v1)
Abstract: Constructing transferable descriptors for conformation representation of molecular and biological systems finds numerous applications in drug discovery, learning-based molecular dynamics, and protein mechanism analysis. Geometric graph neural networks (Geom-GNNs) with all-atom information have transformed atomistic simulations by serving as a general learnable geometric descriptors for downstream tasks including prediction of interatomic potential and molecular properties. However, common practices involve supervising Geom-GNNs on specific downstream tasks, which suffer from the lack of high-quality data and inaccurate labels leading to poor generalization and performance degradation on out-of-distribution (OOD) scenarios. In this work, we explored the possibility of using pre-trained Geom-GNNs as transferable and highly effective geometric descriptors for improved generalization. To explore their representation power, we studied the scaling behaviors of Geom-GNNs under self-supervised pre-training, supervised and unsupervised learning setups. We find that the expressive power of different architectures can differ on the pre-training task. Interestingly, Geom-GNNs do not follow the power-law scaling on the pre-training task, and universally lack predictable scaling behavior on the supervised tasks with quantum chemical labels important for screening and design of novel molecules. More importantly, we demonstrate how all-atom graph embedding can be organically combined with other neural architectures to enhance the expressive power. Meanwhile, the low-dimensional projection of the latent space shows excellent agreement with conventional geometrical descriptors.
- Toward reliable density functional methods without adjustable parameters: The pbe0 model. The Journal of chemical physics, 110(13):6158–6170, 1999.
- Computer aided drug design and its application to the development of potential drugs for neurodegenerative disorders. Current neuropharmacology, 16(6):740–748, 2018.
- Mace: Higher order equivariant message passing neural networks for fast and accurate force fields. Advances in Neural Information Processing Systems, 35:11423–11436, 2022.
- E (3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature communications, 13(1):2453, 2022.
- Deep learning the slow modes for rare events sampling. Proceedings of the National Academy of Sciences, 118(44):e2113533118, 2021.
- Atomistic folding simulations of the five-helix bundle protein λ𝜆\lambdaitalic_λ6- 85. Journal of the American Chemical Society, 133(4):664–667, 2011.
- Protein data bank (pdb): the single global macromolecular structure archive. Protein crystallography: methods and protocols, pp. 627–641, 2017.
- Uncovering neural scaling laws in molecular representation learning. Advances in Neural Information Processing Systems, 36, 2024.
- Machine learning of accurate energy-conserving molecular force fields. Science advances, 3(5):e1603015, 2017.
- Orbnet denali: A machine learning potential for biological and organic chemistry with semi-empirical cost and dft accuracy. The Journal of Chemical Physics, 155(20), 2021.
- Bert: Pre-training of deep bidirectional transformers for language understanding, 2019. URL https://arxiv.org/abs/1810.04805.
- Self-consistent molecular-orbital methods. ix. an extended gaussian-type basis for molecular-orbital studies of organic molecules. The Journal of Chemical Physics, 54(2):724–728, 1971.
- Quantum chemical benchmark databases of gold-standard dimer interaction energies. Scientific data, 8(1):55, 2021.
- Thom H Dunning Jr. Gaussian basis sets for use in correlated molecular calculations. i. the atoms boron through neon and hydrogen. The Journal of chemical physics, 90(2):1007–1023, 1989.
- Neural scaling of deep chemical models. Nature Machine Intelligence, 5(11):1297–1305, 2023.
- Directional message passing for molecular graphs. arXiv preprint arXiv:2003.03123, 2020.
- A thorough benchmark of density functional methods for general main group thermochemistry, kinetics, and noncovalent interactions. Physical Chemistry Chemical Physics, 13(14):6670–6688, 2011.
- Mn15: A kohn–sham global-hybrid exchange–correlation density functional with broad accuracy for multi-reference and single-reference systems and noncovalent interactions. Chemical science, 7(8):5032–5051, 2016.
- Intrinsic-extrinsic convolution and pooling for learning on 3d protein structures. International Conference on Learning Representations, 2021.
- Scaling laws for transfer. arXiv preprint arXiv:2102.01293, 2021.
- Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- Training compute-optimal large language models. arXiv preprint arXiv:2203.15556, 2022.
- Deepsf: deep convolutional neural network for mapping protein sequences to folds. Bioinformatics, 34(8):1295–1303, 2018.
- Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems, 33:22118–22133, 2020.
- Revgraphvamp: A protein molecular simulation analysis model combining graph convolutional neural networks and physical constraints. bioRxiv, pp. 2024–03, 2024.
- Machine-guided path sampling to discover mechanisms of molecular self-organization. Nature Computational Science, 3(4):334–345, 2023.
- Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
- Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4015–4026, 2023.
- Timewarp: Transferable acceleration of molecular dynamics by learning time-coarsened dynamics. Advances in Neural Information Processing Systems, 36, 2024.
- Greg Landrum et al. Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum, 8(31.10):5281, 2013.
- Development of the colle-salvetti correlation-energy formula into a functional of the electron density. Physical review B, 37(2):785, 1988.
- How fast-folding proteins fold. Science, 334(6055):517–520, 2011.
- Graphvampnets for uncovering slow collective variables of self-assembly dynamics. The Journal of Chemical Physics, 159(9), 2023.
- Neural scaling laws on graphs. arXiv preprint arXiv:2402.02054, 2024.
- Molecular geometry pretraining with se (3)-invariant denoising distance matching. arXiv preprint arXiv:2206.13602, 2022.
- Pdb-wide collection of binding data: current status of the pdbbind database. Bioinformatics, 31(3):405–412, 2015.
- Thirty years of density functional theory in computational chemistry: an overview and extensive assessment of 200 density functionals. Molecular physics, 115(19):2315–2372, 2017.
- Vampnets for deep learning of molecular kinetics. Nature communications, 9(1):5, 2018.
- Molecular descriptors. In Handbook of computational chemistry, pp. 2065–2093. Springer International Publishing, 2017.
- Transition path theory for markov jump processes. Multiscale Modeling & Simulation, 7(3):1192–1219, 2009.
- Separation of a mixture of independent signals using time delayed correlations. Physical review letters, 72(23):3634, 1994.
- Mordred: a molecular descriptor calculator. Journal of cheminformatics, 10:1–14, 2018.
- Scaling data-constrained language models. Advances in Neural Information Processing Systems, 36, 2024.
- Learning local equivariant representations for large-scale atomistic dynamics. Nature Communications, 14(1):579, 2023.
- Pubchemqc project: a large-scale first-principles electronic structure database for data-driven chemistry. Journal of chemical information and modeling, 57(6):1300–1308, 2017.
- Slow dynamics in protein fluctuations revealed by time-structure based independent component analysis: the case of domain motions. The Journal of chemical physics, 134(6), 2011.
- Sliced denoising: A physics-informed molecular pre-training method. arXiv preprint arXiv:2311.02124, 2023.
- Markov state models from short non-equilibrium simulations—analysis and correction of estimation bias. The Journal of Chemical Physics, 146(9), 2017.
- Atomic-level characterization of protein–protein association. Proceedings of the National Academy of Sciences, 116(10):4244–4249, 2019.
- Beyond md17: the reactive xxmd dataset. Scientific Data, 11(1):222, 2024a.
- geom2vec: pretrained gnns as geometric featurizers for conformational dynamics. arXiv preprint arXiv:2409.19838, 2024b. URL http://arxiv.org/abs/2409.19838.
- Quest for a universal density functional: the accuracy of density functionals across a broad spectrum of databases in chemistry and physics. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 372(2011):20120476, 2014.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
- Quantum chemistry structures and properties of 134 kilo molecules. Scientific data, 1(1):1–7, 2014.
- Pyemma 2: A software package for estimation, validation, and analysis of markov models. Journal of chemical theory and computation, 11(11):5525–5542, 2015.
- Equivariant message passing for the prediction of tensorial properties and molecular spectra. In International Conference on Machine Learning, pp. 9377–9388. PMLR, 2021.
- Schnet–a deep learning architecture for molecules and materials. The Journal of Chemical Physics, 148(24), 2018.
- Small representative databases for testing and validating density functionals and other electronic structure methods. The Journal of Physical Chemistry A, 2024.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pp. 2256–2265. PMLR, 2015.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020a.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020b.
- Predicting rare events using neural networks and short-trajectory data. Journal of computational physics, 488:112152, 2023a.
- Inexact iterative numerical linear algebra for neural network-based spectral estimation and rare-event prediction. The Journal of Chemical Physics, 159(1), 2023b.
- Recent developments in the pyscf program package. The Journal of chemical physics, 153(2), 2020.
- Torchmd-net: equivariant transformers for neural network based molecular potentials. arXiv preprint arXiv:2202.02541, 2022.
- Handbook of molecular descriptors. John Wiley & Sons, 2008.
- Deep image prior. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 9446–9454, 2018.
- Atlas: protein flexibility description from atomistic molecular dynamics simulations. Nucleic acids research, 52(D1):D384–D392, 2024.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Learning hierarchical protein representations via complete 3d graph networks. arXiv preprint arXiv:2207.12600, 2022.
- Enhancing geometric representations for molecules with equivariant vector-scalar interactive message passing. Nature Communications, 15(1):313, 2024.
- Denoise pretraining on nonequilibrium molecules for accurate and transferable neural potentials. Journal of Chemical Theory and Computation, 19(15):5077–5087, 2023.
- Time-lagged autoencoders: Deep learning of slow collective variables for molecular kinetics. The Journal of chemical physics, 148(24), 2018.
- Florian Weigend. Accurate coulomb-fitting basis sets for h to rn. Physical chemistry chemical physics, 8(9):1057–1065, 2006.
- Variational approach for learning markov processes from time series data. Journal of Nonlinear Science, 30(1):23–66, 2020.
- Pre-training via denoising for molecular property prediction. arXiv preprint arXiv:2206.00133, 2022.
- Comparison of dft methods for molecular orbital eigenvalue calculations. The journal of physical chemistry A, 111(8):1554–1561, 2007.
- Benchmark databases for nonbonded interactions and their use to test density functional theory. Journal of Chemical Theory and Computation, 1(3):415–432, 2005.
- The m06 suite of density functionals for main group thermochemistry, thermochemical kinetics, noncovalent interactions, excited states, and transition elements: two new functionals and systematic testing of four m06-class functionals and 12 other functionals. Theoretical chemistry accounts, 120:215–241, 2008.