Zero Shot Molecular Generation via Similarity Kernels (2402.08708v1)
Abstract: Generative modelling aims to accelerate the discovery of novel chemicals by directly proposing structures with desirable properties. Recently, score-based, or diffusion, generative models have significantly outperformed previous approaches. Key to their success is the close relationship between the score and physical force, allowing the use of powerful equivariant neural networks. However, the behaviour of the learnt score is not yet well understood. Here, we analyse the score by training an energy-based diffusion model for molecular generation. We find that during the generation the score resembles a restorative potential initially and a quantum-mechanical force at the end. In between the two endpoints, it exhibits special properties that enable the building of large molecules. Using insights from the trained model, we present Similarity-based Molecular Generation (SiMGen), a new method for zero shot molecular generation. SiMGen combines a time-dependent similarity kernel with descriptors from a pretrained machine learning force field to generate molecules without any further training. Our approach allows full control over the molecular shape through point cloud priors and supports conditional generation. We also release an interactive web tool that allows users to generate structures with SiMGen online (https://zndraw.icp.uni-stuttgart.de).
- G. Corso, H. Stärk, B. Jing, R. Barzilay, and T. Jaakkola, “DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking,” (2022), arxiv:2210.01776 [physics, q-bio] .
- I. Igashov, H. Stärk, C. Vignac, V. G. Satorras, P. Frossard, M. Welling, M. Bronstein, and B. Correia, “Equivariant 3D-Conditional Diffusion Models for Molecular Linker Design,” (2022), arxiv:2210.05274 [cs, q-bio] .
- A. Schneuing, Y. Du, C. Harris, A. Jamasb, I. Igashov, W. Du, T. Blundell, P. Lió, C. Gomes, M. Welling, M. Bronstein, and B. Correia, “Structure-based Drug Design with Equivariant Diffusion Models,” (2022), arxiv:2210.13695 [cs, q-bio] .
- H. Lin, Y. Huang, M. Liu, X. Li, S. Ji, and S. Z. Li, “DiffBP: Generative Diffusion of 3D Molecules for Target Protein Binding,” (2022), arxiv:2211.11214 [cs, q-bio] .
- T. Xie, X. Fu, O.-E. Ganea, R. Barzilay, and T. Jaakkola, “Crystal Diffusion Variational Autoencoder for Periodic Material Generation,” (2022), arxiv:2110.06197 [cond-mat, physics:physics] .
- J. Ho, A. Jain, and P. Abbeel, “Denoising Diffusion Probabilistic Models,” (2020), arxiv:2006.11239 [cs, stat] .
- J. Sohl-Dickstein, E. A. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep Unsupervised Learning using Nonequilibrium Thermodynamics,” (2015), arxiv:1503.03585 [cond-mat, q-bio, stat] .
- A. Hyvärinen, Journal of Machine Learning Research 6, 695 (2005).
- P. Vincent, Neural Computation 23, 1661 (2011).
- Y. Song and S. Ermon, “Generative Modeling by Estimating Gradients of the Data Distribution,” (2020), arxiv:1907.05600 [cs, stat] .
- W. Kohn and L. J. Sham, Physical Review 140, A1133 (1965).
- G. Kresse and J. Hafner, Physical Review B 47, 558 (1993).
- S. Spicher and S. Grimme, Angewandte Chemie International Edition 59, 15665 (2020).
- J. Behler and M. Parrinello, Physical Review Letters 98, 146401 (2007).
- R. Drautz, Physical Review B 99, 014104 (2019).
- S. Zaidi, M. Schaarschmidt, J. Martens, H. Kim, Y. W. Teh, A. Sanchez-Gonzalez, P. Battaglia, R. Pascanu, and J. Godwin, “Pre-training via Denoising for Molecular Property Prediction,” (2022), arxiv:2206.00133 [cs, q-bio, stat] .
- L. Wu, C. Gong, X. Liu, M. Ye, and Q. Liu, “Diffusion-based molecule generation with informative prior bridges,” (2022), arXiv:2209.00865 [cs.LG] .
- M. Arts, V. G. Satorras, C.-W. Huang, D. Zuegner, M. Federici, C. Clementi, F. Noé, R. Pinsler, and R. van den Berg, “Two for One: Diffusion Models and Force Fields for Coarse-Grained Molecular Dynamics,” (2023), arxiv:2302.00600 [cs] .
- C. J. Pickard and R. J. Needs, Journal of Physics: Condensed Matter 23, 053201 (2011).
- C. J. Pickard and R. J. Needs, Physical Review Letters 97, 045504 (2006).
- M. Xu, L. Yu, Y. Song, C. Shi, S. Ermon, and J. Tang, “GeoDiff: A Geometric Diffusion Model for Molecular Conformation Generation,” (2022), arxiv:2203.02923 [cs, q-bio] .
- E. Hoogeboom, V. G. Satorras, C. Vignac, and M. Welling, “Equivariant Diffusion for Molecule Generation in 3D,” (2022), arxiv:2203.17003 [cs, q-bio, stat] .
- C. Hua, S. Luan, M. Xu, R. Ying, J. Fu, S. Ermon, and D. Precup, “MUDiff: Unified Diffusion for Complete Molecule Generation,” (2023), arxiv:2304.14621 [cs, q-bio] .
- I. Batatia, D. P. Kovács, G. N. C. Simm, C. Ortner, and G. Csányi, “MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields,” (2022a), arxiv:2206.07697 [cond-mat, physics:physics, stat] .
- V. G. Satorras, E. Hoogeboom, and M. Welling, “E(n) Equivariant Graph Neural Networks,” (2022), arxiv:2102.09844 [cs, stat] .
- R. Gao, Y. Song, B. Poole, Y. N. Wu, and D. P. Kingma, “Learning Energy-Based Models by Diffusion Recovery Likelihood,” (2021), arxiv:2012.08125 [cs, stat] .
- T. Salimans and J. Ho, in Energy Based Models Workshop - ICLR 2021 (2021).
- D. P. Kovács, J. H. Moore, N. J. Browning, I. Batatia, J. T. Horton, V. Kapil, W. C. Witt, I.-B. Magdău, D. J. Cole, and G. Csányi, “MACE-OFF23: Transferable Machine Learning Force Fields for Organic Molecules,” (2023), arxiv:2312.15211 [physics] .
- P. M. Morse, Physical Review 34, 57 (1929).
- B. Máté and F. Fleuret, “Learning Interpolations between Boltzmann Densities,” (2023), arxiv:2301.07388 [cs, stat] .
- J. Song, C. Meng, and S. Ermon, “Denoising Diffusion Implicit Models,” (2022), arxiv:2010.02502 [cs] .
- T. Karras, M. Aittala, T. Aila, and S. Laine, “Elucidating the Design Space of Diffusion-Based Generative Models,” (2022), arxiv:2206.00364 [cs, stat] .
- J. Kennedy and R. Eberhart, in Proceedings of ICNN’95 - International Conference on Neural Networks, Vol. 4 (1995) pp. 1942–1948 vol.4.
- C. Vignac, N. Osman, L. Toni, and P. Frossard, “MiDi: Mixed Graph and 3D Denoising Diffusion for Molecule Generation,” (2023), arxiv:2302.09048 [cs] .
- J. Jo, S. Lee, and S. J. Hwang, “Score-based Generative Modeling of Graphs via the System of Stochastic Differential Equations,” (2022), arxiv:2202.02514 [cs] .
- C. Zang and F. Wang, in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2020) pp. 617–626, arxiv:2006.10137 [physics, stat] .
- M. Liu, K. Yan, B. Oztekin, and S. Ji, “GraphEBM: Molecular Graph Generation with Energy-Based Models,” (2021), arxiv:2102.00546 [cs] .
- F. Zills and R. Elijošius, “Zndraw,” (2023).
- L. N. Smith and N. Topin, “Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates,” (2018), arxiv:1708.07120 [cs, stat] .
- J. J. P. Stewart, Journal of Molecular Modeling 13, 1173 (2007).
- J. J. P. Stewart, “Mopac2016,” Stewart Computational Chemistry, Colorado Springs, CO, USA, http://OpenMOPAC.net (2016).
- F. Zills, M. Schäfer, S. Tovey, J. Kästner, and C. Holm, “ZnTrack – Data as Code,” (2024), arXiv:2401.10603 .
- Rokas Elijošius (2 papers)
- Fabian Zills (5 papers)
- Ilyes Batatia (18 papers)
- Sam Walton Norwood (2 papers)
- Dávid Péter Kovács (6 papers)
- Christian Holm (71 papers)
- Gábor Csányi (84 papers)