Molecular relaxation by reverse diffusion with time step prediction (2404.10935v2)
Abstract: Molecular relaxation, finding the equilibrium state of a non-equilibrium structure, is an essential component of computational chemistry to understand reactivity. Classical force field (FF) methods often rely on insufficient local energy minimization, while neural network FF models require large labeled datasets encompassing both equilibrium and non-equilibrium structures. As a remedy, we propose MoreRed, molecular relaxation by reverse diffusion, a conceptually novel and purely statistical approach where non-equilibrium structures are treated as noisy instances of their corresponding equilibrium states. To enable the denoising of arbitrarily noisy inputs via a generative diffusion model, we further introduce a novel diffusion time step predictor. Notably, MoreRed learns a simpler pseudo potential energy surface (PES) instead of the complex physical PES. It is trained on a significantly smaller, and thus computationally cheaper, dataset consisting of solely unlabeled equilibrium structures, avoiding the computation of non-equilibrium structures altogether. We compare MoreRed to classical FFs, equivariant neural network FFs trained on a large dataset of equilibrium and non-equilibrium data, as well as a semi-empirical tight-binding model. To assess this quantitatively, we evaluate the root-mean-square deviation between the found equilibrium structures and the reference equilibrium structures as well as their energies.
- Schlegel, H. B. Geometry optimization. WIREs Computational Molecular Science, 1(5):790–809, May 2011.
- Computer generated pyrolysis modeling: On-the-fly generation of species, reactions, and rates. Industrial & Engineering Chemistry Research, 33(4):790–799, April 1994.
- Computer generated reaction modelling: Decomposition and encoding algorithms for determining species uniqueness. Computers & Chemical Engineering, 20(2):113–129, February 1996.
- Lexicography of kinetic modeling of complex reaction networks. AIChE Journal, 51(8):2112–2121, June 2005.
- Architecture and evolution of organic chemistry. Angewandte Chemie International Edition, 44(44):7263–7269, November 2005.
- Rewiring chemistry: Algorithmic discovery and experimental validation of one-pot reactions in the network of organic chemistry. Angewandte Chemie International Edition, 51(32):7922–7927, July 2012.
- Parallel optimization of synthetic pathways within the network of organic chemistry. Angewandte Chemie International Edition, 51(32):7928–7932, July 2012.
- Computational catalysis using the artificial force induced reaction method. Accounts of Chemical Research, 49(4):763–773, March 2016.
- Methods for exploring reaction space in molecular systems. WIREs Computational Molecular Science, 8(2), November 2017.
- Finding reaction pathways for multicomponent reactions: The passerini reaction is a four-component reaction. Angewandte Chemie International Edition, 50(3):644–649, December 2010.
- Potentialnet for molecular property prediction. ACS Central Science, 4(11):1520–1530, November 2018.
- Exploration of reaction pathways and chemical transformation networks. The Journal of Physical Chemistry A, 123(2):385–399, November 2018.
- The exploration of chemical reaction networks. Annual Review of Physical Chemistry, 71(1):121–142, April 2020.
- qcscine/utilities: Release 5.0.0, 2022.
- Modeling the partial oxidation of methane in a short-contact-time reactor. AIChE Journal, 44(11):2465–2477, November 1998.
- Modeling elementary heterogeneous chemistry and electrochemistry in solid-oxide fuel cells. Journal of The Electrochemical Society, 152(12):A2427, 2005.
- Carmen: An improved computer-aided method for developing catalytic reaction mechanisms. Catalysts, 9(3):227, March 2019.
- To address surface reaction network complexity using scaling relations machine learning and dft calculations. Nature Communications, 8(1), March 2017.
- Autonomous reaction network exploration in homogeneous and heterogeneous catalysis. Topics in Catalysis, 65(1–4):6–39, January 2022.
- Structure of a spatially developing turbulent lean methane–air bunsen flame. Proceedings of the Combustion Institute, 31(1):1291–1298, January 2007.
- Comprehensive reaction mechanism for n-butanol pyrolysis and combustion. Combustion and Flame, 158(1):16–41, January 2011.
- Unraveling reaction pathways and specifying reaction kinetics for complex systems. Annual Review of Chemical and Biomolecular Engineering, 3(1):29–54, July 2012.
- Theoretical chemical kinetics in tropospheric chemistry: Methodologies and applications. Chemical Reviews, 115(10):4063–4114, April 2015.
- Reliable estimation of prediction uncertainty for physicochemical property models. Journal of Chemical Theory and Computation, 13(7):3297–3317, June 2017.
- Mechanism deduction from noisy chemical reaction networks. Journal of Chemical Theory and Computation, 15(1):357–370, December 2018.
- Automated discovery of elementary chemical reaction steps using freezing string and berny optimization methods. Journal of Chemical Theory and Computation, 11(9):4248–4259, August 2015.
- Reaction mechanism generator: Automatic construction of chemical kinetic mechanisms. Computer Physics Communications, 203:212–225, June 2016.
- Rate-based construction of kinetic models for complex systems. The Journal of Physical Chemistry A, 101(20):3731–3740, May 1997.
- On-the-fly pruning for rate-based reaction mechanism generation. Computers & Chemical Engineering, 100:1–8, May 2017.
- Exploring the gdb-13 chemical space using deep generative models. Journal of Cheminformatics, 11(1), March 2019.
- Enumeration of de novo inorganic complexes for chemical discovery and machine learning. Molecular Systems Design & Engineering, 5(1):139–152, 2020.
- Reymond, J.-L. The chemical space project. Accounts of Chemical Research, 48(3):722–730, February 2015.
- A decade of fragment-based drug design: strategic advances and lessons learned. Nature Reviews Drug Discovery, 6(3):211–219, 2007.
- Novel mixed polyanions lithium-ion battery cathode materials predicted by high-throughput ab initio computations. Journal of Materials Chemistry, 21(43):17147–17153, 2011.
- A perspective on inverse design of battery interphases using multi-scale modelling, experiments and generative deep learning. Energy Storage Materials, 21:446–456, 2019.
- Search for catalysts by inverse design: Artificial intelligence, mountain climbers, and alchemists. Chemical Reviews, 119(11):6595–6612, 2019. PMID: 31059236.
- Inverse-qspr for de novo design: A review. Molecular Informatics, 39(4):1900087, 2020.
- Exploring chemical compound space with quantum-based machine learning. Nature Reviews Chemistry, 4(7):347–358, 2020.
- Zur quantentheorie der molekeln. Annalen der Physik, 389(20):457–484, January 1927.
- Sutcliffe, B. T. The Born-Oppenheimer Approximation, page 19–46. Springer US, 1992.
- Jensen, F. Introduction to Computational Chemistry. Wiley, 3. edition edition, 2017.
- Cramer, C. J. Essentials of Computational Chemistry: Theories and Models. Wiley, 2. edition edition, 2004.
- Halgren, T. A. Merck molecular force field. i. basis, form, scope, parameterization, and performance of mmff94. Journal of computational chemistry, 17(5-6):490–519, 1996.
- Uff, a full periodic table force field for molecular mechanics and molecular dynamics simulations. Journal of the American chemical society, 114(25):10024–10035, 1992.
- Charmm general force field: A force field for drug-like molecules compatible with the charmm all-atom additive biological force fields. Journal of computational chemistry, 31(4):671–690, 2010.
- Gfn2-xtb—an accurate and broadly parametrized self-consistent tight-binding quantum chemical method with multipole electrostatics and density-dependent dispersion contributions. Journal of chemical theory and computation, 15(3):1652–1671, 2019.
- Stewart, J. J. P. Optimization of parameters for semiempirical methods v: Modification of nddo approximations and application to 70 elements. Journal of Molecular Modeling, 13(12):1173–1213, Dec 2007.
- Stewart, J. J. P. Optimization of parameters for semiempirical methods vi: more modifications to the nddo approximations and re-optimization of parameters. Journal of Molecular Modeling, 19(1):1–32, Jan 2013.
- Orthogonalization corrections for semiempirical methods. Theoretical Chemistry Accounts, 103(6):495–506, Apr 2000.
- Fast and accurate modeling of molecular atomization energies with machine learning. Physical Review Letters, 108(5):058301, 2012.
- Comparing molecules and solids across structural and alchemical space. Physical Chemistry Chemical Physics, 18(20):13754–13769, 2016.
- Generalized neural-network representation of high-dimensional potential-energy surfaces. Physical Review Letters, 98(14):146401, 2007.
- Alchemical and structural distribution based representation for universal quantum machine learning. The Journal of Chemical Physics, 148(24), 2018.
- Quantum-chemical insights from deep tensor neural networks. Nature Communications, 8:13890, 2017.
- ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chemical Science, 8(4):3192–3203, 2017.
- Directional message passing for molecular graphs. In International Conference on Learning Representations, 2020.
- E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature Communications, 13(1):2453, May 2022.
- E(n) equivariant graph neural networks. In Meila, M. and Zhang, T., editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 9323–9332. PMLR, 18–24 Jul 2021.
- So3krates: Equivariant attention for interactions on arbitrary length-scales in molecular systems. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A., editors, Advances in Neural Information Processing Systems, volume 35, pages 29400–29413. Curran Associates, Inc., 2022.
- MACE: Higher order equivariant message passing neural networks for fast and accurate force fields. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K., editors, Advances in Neural Information Processing Systems, 2022.
- Biomolecular dynamics with machine-learned quantum-mechanical force fields trained on diverse chemical fragments. Science Advances, 10(14):eadn4397, 2024.
- Learning local equivariant representations for large-scale atomistic dynamics. Nature Communications, 14(1):579, 2023.
- Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds. arXiv preprint arXiv:1802.08219, 2018.
- Machine learning for molecular simulation. Annual Review of Physical Chemistry, 71:361–390, 2020.
- Machine learning force fields. Chemical Reviews, 121(16):10142–10186, 2021.
- Machine learning of accurate energy-conserving molecular force fields. Science Advances, 3(5):e1603015, 2017.
- sgdml: Constructing accurate and data efficient molecular force fields using machine learning. Computer Physics Communications, 240:38–45, 2019.
- Accurate global machine learning force fields for molecules with hundreds of atoms. Science Advances, 9(2):eadf0873, 2023.
- SchNet: A continuous-filter convolutional neural network for modeling quantum interactions. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 30, pages 991–1001. Curran Associates, Inc., 2017.
- SchNet – A deep learning architecture for molecules and materials. The Journal of Chemical Physics, 148(24):241722, 2018.
- PhysNet: a neural network for predicting energies, forces, dipole moments, and partial charges. Journal of Chemical Theory and Computation, 15(6):3678–3693, 2019.
- Spookynet: Learning force fields with electronic degrees of freedom and nonlocal effects. Nature communications, 12(1):7273, 2021.
- Equivariant diffusion for molecule generation in 3D. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S., editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 8867–8887. PMLR, 17–23 Jul 2022.
- Diffusion-based molecule generation with informative prior bridges. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A., editors, Advances in Neural Information Processing Systems, volume 35, pages 36533–36545. Curran Associates, Inc., 2022.
- Mdm: Molecular diffusion model for 3d molecule generation. Proceedings of the AAAI Conference on Artificial Intelligence, 37(4):5105–5112, Jun. 2023.
- Geometric latent diffusion models for 3D molecule generation. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J., editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 38592–38610. PMLR, 23–29 Jul 2023.
- MolDiff: Addressing the atom-bond inconsistency problem in 3D molecule diffusion generation. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J., editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 27611–27629. PMLR, 23–29 Jul 2023.
- GeoDiff: A geometric diffusion model for molecular conformation generation. In International Conference on Learning Representations, 2022.
- Digress: Discrete denoising diffusion for graph generation. In The Eleventh International Conference on Learning Representations, 2023.
- Autoregressive diffusion model for graph generation. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J., editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 17391–17408. PMLR, 23–29 Jul 2023.
- Symmetry-adapted generation of 3d point sets for the targeted discovery of molecules. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 32, pages 7566–7578. Curran Associates, Inc., 2019.
- Inverse design of 3d molecular structures with conditional generative neural networks. Nature Communications, 13(1):973, 2022.
- Reinforcement learning for molecular design guided by quantum mechanics. In III, H. D. and Singh, A., editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 8959–8969. PMLR, 13–18 Jul 2020.
- Symmetry-aware actor-critic for 3d molecular design. In International Conference on Learning Representations, 2021.
- Generating stable molecules using imitation and reinforcement learning. Machine Learning: Science and Technology, 3(1):015008, 2022.
- Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning. Science, 365(6457):eaaw1147, 2019.
- Equivariant flows: Exact likelihood generative learning for symmetric densities. In III, H. D. and Singh, A., editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 5361–5370. PMLR, 13–18 Jul 2020.
- E(n) equivariant normalizing flows. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W., editors, Advances in Neural Information Processing Systems, volume 34, pages 4181–4192. Curran Associates, Inc., 2021.
- Timewarp: Transferable acceleration of molecular dynamics by learning time-coarsened dynamics. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Molecular geometry prediction using a deep generative graph neural network. Scientific Reports, 9(1):20381, 2019.
- A generative model for molecular distance geometry. In III, H. D. and Singh, A., editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 8949–8958. PMLR, 13–18 Jul 2020.
- TorsionNet: A reinforcement learning approach to sequential conformer search. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H., editors, Advances in Neural Information Processing Systems, volume 33, pages 20142–20153. Curran Associates, Inc., 2020.
- GeoMol: Torsional geometric generation of molecular 3d conformer ensembles. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W., editors, Advances in Neural Information Processing Systems, volume 34, pages 13757–13769. Curran Associates, Inc., 2021.
- An end-to-end framework for molecular conformation generation via bilevel programming. In Meila, M. and Zhang, T., editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 11537–11547. PMLR, 18–24 Jul 2021.
- Machine learning based energy-free structure predictions of molecules, transition states, and solids. Nature Communications, 12(1):4468, 2021.
- Torsional diffusion for molecular conformer generation. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K., editors, Advances in Neural Information Processing Systems, 2022.
- QM7-X, a comprehensive dataset of quantum-mechanical properties spanning the chemical space of small organic molecules. Scientific Data, 8(1):43, Feb 2021.
- Structure and stability of molecular crystals with many-body dispersion-inclusive density functional tight binding. The Journal of Physical Chemistry Letters, 9(2):399–405, January 2018.
- Calculations of molecules, clusters, and solids with a simplified lcao-dft-lda scheme. International Journal of Quantum Chemistry, 58(2):185–192, 1996.
- Self-consistent-charge density-functional tight-binding method for simulations of complex materials properties. Physical Review B, 58(11):7260–7268, September 1998.
- Dftb3: Extension of the self-consistent-charge density-functional tight-binding method (scc-dftb). Journal of Chemical Theory and Computation, 7(4):931–948, March 2011.
- Accurate and efficient method for many-body van der waals interactions. Physical Review Letters, 108(23), June 2012.
- Long-range correlation energy calculated from coupled atomic response functions. The Journal of Chemical Physics, 140(18), February 2014.
- 970 million druglike small molecules for virtual screening in the chemical universe database gdb-13. Journal of the American Chemical Society, 131(25):8732–8733, 2009.
- Toward reliable density functional methods without adjustable parameters: The pbe0 model. The Journal of Chemical Physics, 110(13):6158–6170, April 1999.
- Rationale for mixing exact exchange with density functional approximations. The Journal of Chemical Physics, 105(22):9982–9985, December 1996.
- Ab initio molecular simulations with numeric atom-centered orbitals. Computer Physics Communications, 180(11):2175–2196, November 2009.
- Resolution-of-identity approach to hartree–fock, hybrid density functionals, rpa, mp2 andgwwith numeric atom-centered orbital basis functions. New Journal of Physics, 14(5):053020, May 2012.
- Deep unsupervised learning using nonequilibrium thermodynamics. In Bach, F. and Blei, D., editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 2256–2265, Lille, France, 07–09 Jul 2015. PMLR.
- Denoising diffusion probabilistic models. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H., editors, Advances in Neural Information Processing Systems, volume 33, pages 6840–6851. Curran Associates, Inc., 2020.
- Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.
- Equivariant message passing for the prediction of tensorial properties and molecular spectra. In Meila, M. and Zhang, T., editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 9377–9388. PMLR, 18–24 Jul 2021.
- Bishop, C. M. Pattern recognition and machine learning. Information Science and Statistics. Springer New York, 2006.
- Schnetpack: A deep learning toolbox for atomistic systems. Journal of chemical theory and computation, 15(1):448–455, 2018.
- SchNetPack 2.0: A neural network toolbox for atomistic machine learning. The Journal of Chemical Physics, 158(14):144801, 04 2023.
- Open babel: An open chemical toolbox. Journal of Cheminformatics, 3(1):33, Oct 2011.
- The atomic simulation environment—a python library for working with atoms. Journal of Physics: Condensed Matter, 29(27):273002, 2017.
- Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for h to rn: Design and assessment of accuracy. Physical Chemistry Chemical Physics, 7(18):3297–3305, 2005.
- Sun, Q. Libcint: An efficient general integral library for gaussian basis functions. Journal of Computational Chemistry, 36(22):1664–1671, June 2015.
- P<scp>y</scp>scf: the python-based simulations of chemistry framework. WIREs Computational Molecular Science, 8(1), September 2017.
- Recent developments in the p<scp>y</scp>scf program package. The Journal of Chemical Physics, 153(2), July 2020.
- Morered: Molecular relaxation by reverse diffusion with time step prediction. Zenodo, April 2024.
- Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. Journal of Chemical Information and Modeling, 52(11):2864–2875, 2012.
- Quantum chemistry structures and properties of 134 kilo molecules. Scientific Data, 1(1):140022, 2014.
- Krizhevsky, A. Learning multiple layers of features from tiny images. 2009.
- Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of Chemical Information and Computer Science, 28(1):31–36, 1988.
- Elucidating the design space of diffusion-based generative models. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K., editors, Advances in Neural Information Processing Systems, 2022.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2019.