Mixed Continuous and Categorical Flow Matching for 3D De Novo Molecule Generation (2404.19739v1)
Abstract: Deep generative models that produce novel molecular structures have the potential to facilitate chemical discovery. Diffusion models currently achieve state of the art performance for 3D molecule generation. In this work, we explore the use of flow matching, a recently proposed generative modeling framework that generalizes diffusion models, for the task of de novo molecule generation. Flow matching provides flexibility in model design; however, the framework is predicated on the assumption of continuously-valued data. 3D de novo molecule generation requires jointly sampling continuous and categorical variables such as atom position and atom type. We extend the flow matching framework to categorical data by constructing flows that are constrained to exist on a continuous representation of categorical data known as the probability simplex. We call this extension SimplexFlow. We explore the use of SimplexFlow for de novo molecule generation. However, we find that, in practice, a simpler approach that makes no accommodations for the categorical nature of the data yields equivalent or superior performance. As a result of these experiments, we present FlowMol, a flow matching model for 3D de novo generative model that achieves improved performance over prior flow matching methods, and we raise important questions about the design of prior distributions for achieving strong performance in flow matching models. Code and trained models for reproducing this work are available at https://github.com/dunni3/FlowMol
- A dual diffusion model enables 3D molecule generation and lead optimization based on target pockets. Nature Communications, 15(1):2657, March 2024. ISSN 2041-1723. doi: 10.1038/s41467-024-46569-1. URL https://www.nature.com/articles/s41467-024-46569-1. Publisher: Nature Publishing Group.
- 3D Equivariant Diffusion for Target-Aware Molecule Generation and Affinity Prediction, March 2023. URL http://arxiv.org/abs/2303.03543. arXiv:2303.03543 [cs, q-bio].
- Structure-based Drug Design with Equivariant Diffusion Models, June 2023. URL http://arxiv.org/abs/2210.13695. arXiv:2210.13695 [cs, q-bio].
- Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets, May 2022. URL http://arxiv.org/abs/2205.07249. arXiv:2205.07249 [cs, q-bio].
- Generating 3D Molecules for Target Protein Binding, May 2022a. URL http://arxiv.org/abs/2204.09410. arXiv:2204.09410 [cs, q-bio].
- DiffHopp: A Graph Diffusion Model for Novel Drug Design via Scaffold Hopping, August 2023. URL http://arxiv.org/abs/2308.07416. arXiv:2308.07416 [q-bio].
- Equivariant 3D-conditional diffusion model for molecular linker design. Nature Machine Intelligence, pages 1–11, April 2024. ISSN 2522-5839. doi: 10.1038/s42256-024-00815-9. URL https://www.nature.com/articles/s42256-024-00815-9. Publisher: Nature Publishing Group.
- Accelerating Inference in Molecular Diffusion Models with Latent Representations of Protein Structure. October 2023. URL https://openreview.net/forum?id=Z4ia7s2tpV.
- De novo design of protein structure and function with RFdiffusion. Nature, 620(7976):1089–1100, August 2023. ISSN 1476-4687. doi: 10.1038/s41586-023-06415-8. URL https://www.nature.com/articles/s41586-023-06415-8. Publisher: Nature Publishing Group.
- Atomically accurate de novo design of single-domain antibodies, March 2024. URL https://www.biorxiv.org/content/10.1101/2024.03.14.585103v1. Pages: 2024.03.14.585103 Section: New Results.
- Illuminating protein space with a programmable generative model. Nature, 623(7989):1070–1078, November 2023. ISSN 1476-4687. doi: 10.1038/s41586-023-06728-8. URL https://www.nature.com/articles/s41586-023-06728-8. Publisher: Nature Publishing Group.
- MatterGen: a generative model for inorganic materials design, January 2024. URL http://arxiv.org/abs/2312.03687. arXiv:2312.03687 [cond-mat].
- Deep Unsupervised Learning using Nonequilibrium Thermodynamics, November 2015. URL http://arxiv.org/abs/1503.03585. arXiv:1503.03585 [cond-mat, q-bio, stat].
- Denoising Diffusion Probabilistic Models, December 2020. URL http://arxiv.org/abs/2006.11239. arXiv:2006.11239 [cs, stat].
- Score-Based Generative Modeling through Stochastic Differential Equations, February 2021. URL http://arxiv.org/abs/2011.13456. arXiv:2011.13456 [cs, stat].
- Flow Matching for Generative Modeling, February 2023. URL http://arxiv.org/abs/2210.02747. arXiv:2210.02747 [cs, stat].
- Improving and generalizing flow-based generative models with minibatch optimal transport, July 2023. URL http://arxiv.org/abs/2302.00482. arXiv:2302.00482 [cs].
- Stochastic Interpolants: A Unifying Framework for Flows and Diffusions, November 2023. URL http://arxiv.org/abs/2303.08797. arXiv:2303.08797 [cond-mat].
- Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow, September 2022b. URL http://arxiv.org/abs/2209.03003. arXiv:2209.03003 [cs].
- AlphaFold Meets Flow Matching for Generating Protein Ensembles, February 2024. URL http://arxiv.org/abs/2402.04845. arXiv:2402.04845 [cs, q-bio].
- Harmonic Self-Conditioned Flow Matching for Multi-Ligand Docking and Binding Site Design, March 2024. URL http://arxiv.org/abs/2310.05764. arXiv:2310.05764 [cs].
- Learning Joint 2D & 3D Diffusion Models for Complete Molecule Generation, June 2023. URL http://arxiv.org/abs/2305.12347. arXiv:2305.12347 [cs, q-bio].
- MiDi: Mixed Graph and 3D Denoising Diffusion for Molecule Generation, June 2023. URL http://arxiv.org/abs/2302.09048. arXiv:2302.09048 [cs].
- MolDiff: Addressing the Atom-Bond Inconsistency Problem in 3D Molecule Diffusion Generation, May 2023. URL http://arxiv.org/abs/2305.07508. arXiv:2305.07508 [cs, q-bio].
- Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions, October 2021. URL http://arxiv.org/abs/2102.05379. arXiv:2102.05379 [cs, stat].
- A Continuous Time Framework for Discrete Denoising Models, October 2022. URL http://arxiv.org/abs/2205.14987. arXiv:2205.14987 [cs, stat].
- Structured Denoising Diffusion Models in Discrete State-Spaces, February 2023. URL http://arxiv.org/abs/2107.03006. arXiv:2107.03006 [cs].
- Categorical SDEs with Simplex Diffusion, October 2022. URL http://arxiv.org/abs/2210.14784. arXiv:2210.14784 [cs].
- Diffusion on the Probability Simplex, September 2023. URL http://arxiv.org/abs/2309.02530. arXiv:2309.02530 [cs, stat].
- Dirichlet Diffusion Score Model for Biological Sequence Generation, June 2023. URL http://arxiv.org/abs/2305.10699. arXiv:2305.10699 [cs, q-bio].
- Continuous diffusion for categorical data, December 2022. URL http://arxiv.org/abs/2211.15089. arXiv:2211.15089 [cs].
- Bidirectional Molecule Generation with Recurrent Neural Networks. Journal of Chemical Information and Modeling, 60(3):1175–1183, March 2020. ISSN 1549-9596. doi: 10.1021/acs.jcim.9b00943. URL https://doi.org/10.1021/acs.jcim.9b00943. Publisher: American Chemical Society.
- Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules. ACS Central Science, 4(2):268–276, February 2018. ISSN 2374-7943. doi: 10.1021/acscentsci.7b00572. URL https://doi.org/10.1021/acscentsci.7b00572. Publisher: American Chemical Society.
- Syntax-Directed Variational Autoencoder for Structured Data, February 2018. URL http://arxiv.org/abs/1802.08786. arXiv:1802.08786 [cs].
- Junction Tree Variational Autoencoder for Molecular Graph Generation, March 2019. URL http://arxiv.org/abs/1802.04364. arXiv:1802.04364 [cs, stat].
- Constrained Graph Variational Autoencoders for Molecule Design, March 2019. URL http://arxiv.org/abs/1805.09076. arXiv:1805.09076 [cs, stat].
- GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation, February 2020. URL http://arxiv.org/abs/2001.09382. arXiv:2001.09382 [cs, stat].
- Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation, February 2019. URL http://arxiv.org/abs/1806.02473. arXiv:1806.02473 [cs, stat].
- Learning a Continuous Representation of 3D Molecular Structures with Deep Generative Models, November 2020. URL http://arxiv.org/abs/2010.08687. arXiv:2010.08687 [cs, q-bio].
- Generating 3D molecules conditional on receptor binding sites with deep generative models. Chemical Science, 13(9):2701–2713, March 2022. ISSN 2041-6539. doi: 10.1039/D1SC05976A. URL https://pubs.rsc.org/en/content/articlelanding/2022/sc/d1sc05976a. Publisher: The Royal Society of Chemistry.
- Symmetry-adapted generation of 3d point sets for the targeted discovery of molecules, January 2020. URL http://arxiv.org/abs/1906.00957. arXiv:1906.00957 [physics, stat].
- An Autoregressive Flow Model for 3D Molecular Geometry Generation from Scratch. October 2021. URL https://openreview.net/forum?id=C03Ajc-NS5W.
- E(n) Equivariant Normalizing Flows, January 2022a. URL http://arxiv.org/abs/2105.09016. arXiv:2105.09016 [physics, stat].
- Equivariant Diffusion for Molecule Generation in 3D, June 2022. URL http://arxiv.org/abs/2203.17003. arXiv:2203.17003 [cs, q-bio, stat].
- MUDiff: Unified Diffusion for Complete Molecule Generation, February 2024. URL http://arxiv.org/abs/2304.14621. arXiv:2304.14621 [cs, q-bio].
- Equivariant Flow Matching with Hybrid Probability Transport, December 2023. URL http://arxiv.org/abs/2312.07168. arXiv:2312.07168 [cs].
- Dirichlet Flow Matching with Applications to DNA Sequence Design, February 2024. URL http://arxiv.org/abs/2402.05841. arXiv:2402.05841 [cs, q-bio].
- Generative Modeling of Discrete Joint Distributions by E-Geodesic Flow Matching on Assignment Manifolds, February 2024. URL http://arxiv.org/abs/2402.07846. arXiv:2402.07846 [cs, stat].
- Ricky T. Q. Chen and Yaron Lipman. Flow Matching on General Geometries, February 2024. URL http://arxiv.org/abs/2302.03660. arXiv:2302.03660 [cs, stat].
- Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design, February 2024. URL http://arxiv.org/abs/2402.04997. arXiv:2402.04997 [cs, q-bio, stat].
- Navigating the Design Space of Equivariant Diffusion-Based Generative Models for De Novo 3D Molecule Generation, November 2023. URL http://arxiv.org/abs/2309.17296. arXiv:2309.17296 [cs].
- Equivariant flow matching, November 2023. URL http://arxiv.org/abs/2306.15030. arXiv:2306.15030 [physics, stat].
- Equivariant Graph Neural Networks for 3D Macromolecular Structure, July 2021. URL http://arxiv.org/abs/2106.03843. arXiv:2106.03843 [cs, q-bio].
- E(n) Equivariant Graph Neural Networks, February 2022b. URL http://arxiv.org/abs/2102.09844. arXiv:2102.09844 [cs, stat].
- Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17. Journal of Chemical Information and Modeling, 52(11):2864–2875, November 2012. ISSN 1549-9596. doi: 10.1021/ci300415d. URL https://doi.org/10.1021/ci300415d. Publisher: American Chemical Society.
- Quantum chemistry structures and properties of 134 kilo molecules. Scientific Data, 1(1):140022, August 2014. ISSN 2052-4463. doi: 10.1038/sdata.2014.22. URL https://www.nature.com/articles/sdata201422. Publisher: Nature Publishing Group.
- GEOM, energy-annotated molecular conformations for property prediction and molecular generation. Scientific Data, 9(1):185, April 2022. ISSN 2052-4463. doi: 10.1038/s41597-022-01288-4. URL https://www.nature.com/articles/s41597-022-01288-4. Publisher: Nature Publishing Group.
- RDKit. URL http://www.rdkit.org/.
- Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks, August 2020. URL http://arxiv.org/abs/1909.01315. arXiv:1909.01315 [cs, stat].
- Ian Dunn (4 papers)
- David Ryan Koes (13 papers)