SynFlowNet: Design of Diverse and Novel Molecules with Synthesis Constraints (2405.01155v2)
Abstract: Generative models see increasing use in computer-aided drug design. However, while performing well at capturing distributions of molecular motifs, they often produce synthetically inaccessible molecules. To address this, we introduce SynFlowNet, a GFlowNet model whose action space uses chemical reactions and buyable reactants to sequentially build new molecules. By incorporating forward synthesis as an explicit constraint of the generative mechanism, we aim at bridging the gap between in silico molecular generation and real world synthesis capabilities. We evaluate our approach using synthetic accessibility scores and an independent retrosynthesis tool to assess the synthesizability of our compounds, and motivate the choice of GFlowNets through considerable improvement in sample diversity compared to baselines. Additionally, we identify challenges with reaction encodings that can complicate traversal of the MDP in the backward direction. To address this, we introduce various strategies for learning the GFlowNet backward policy and thus demonstrate how additional constraints can be integrated into the GFlowNet MDP framework. This approach enables our model to successfully identify synthesis pathways for previously unseen molecules.
- All SMILES VAE. CoRR, abs/1905.13343, 2019.
- Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation. arXiv preprint arXiv:2106.04399, 2021.
- Quantifying the chemical beauty of drugs. Nature Chemistry, 4(2):90–98, 2012. doi: 10.1038/nchem.1243.
- A model to search for synthesizable molecules. CoRR, abs/1906.05221, 2019.
- Thermometer encoding: One hot way to resist adversarial examples. In International Conference on Learning Representations, 2018.
- Automated de novo molecular design by hybrid machine intelligence and rule-driven chemical synthesis. Nature Machine Intelligence, 1(7):307–315, 2019.
- Molgan: An implicit generative model for small molecular graphs. arXiv preprint arXiv:1805.11973, 2022.
- ChEMBL2409. Chembl2409 Target Report Card. https://www.ebi.ac.uk/chembl/target_report_card/CHEMBL2409/. Accessed: 28.03.2024.
- Computer-assisted retrosynthesis based on molecular similarity. ACS Central Science, 3(12):1237–1245, 2017. doi: 10.1021/acscentsci.7b00355.
- Scscore: Synthetic complexity learned from a reaction corpus. Journal of Chemical Information and Modeling, 58(2):252–261, 2018. doi: 10.1021/acs.jcim.7b00622.
- Enamine. Enamine. https://enamine.net. Accessed: 28.11.2023.
- Estimation of synthetic acces- sibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of Cheminformatics, (1):1–11, 2009.
- The synthesizability of molecules proposed by generative models. Journal of Chemical Information and Modeling, 60(12):5714–5723, 2020. doi: 10.1021/acs.jcim.0c00174.
- Amortized tree generation for bottom-up synthesis planning and synthesizable molecular design. CoRR, abs/2110.06389, 2021.
- Aizynthfinder: a fast, robust and flexible open-source software for retrosynthetic planning. J Cheminform, 12:70, 2020. doi: 10.1186/s13321-020-00472-1.
- Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning, volume 70, pp. 1263–1272, 2017.
- Automatic chemical design using a data-driven continuous representation of molecules. CoRR, abs/1610.02415, 2016.
- Learning to navigate the synthetically accessible chemical space using reinforcement learning. In Proceedings of the 37th International Conference on Machine Learning, volume 119, pp. 3668–3679, 2020.
- Objective-reinforced generative adversarial networks (organ) for sequence generation models. arXiv preprint arXiv:1705.10843, 2018.
- Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science, 4(2):268–276, 2018. doi: 10.1021/acscentsci.7b00572.
- Posecheck: Generative models for 3d structure-based drug design produce unrealistic poses. In NeurIPS 2023 Generative AI and Biology (GenBio) Workshop, 2023.
- DOGS: reaction-driven de novo design of bioactive. PLOS Computational Biology, 8(2), 2012. doi: 10.1371/journal.pcbi.1002380.
- Equivariant diffusion for molecule generation in 3D. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162, pp. 8867–8887, 2022.
- Molecular design in synthetically accessible chemical space via deep reinforcement learning. ACS Omega, 5(51):32984–32994, December 2020. doi: 10.1021/acsomega.0c04153.
- Soluble epoxide hydrolase as a therapeutic target for cardiovascular diseases. Nat Rev Drug Discov, 8(2):794–805, 2009. doi: 10.1038/nrd2875.
- Multi-objective GFlowNets. arXiv preprint arXiv:2210.12765, 2023.
- Junction Tree Variational Autoencoder for Molecular Graph Generation. arXiv preprint arXiv:1802.04364, 2019.
- Thompson sampling an efficient method for searching ultralarge synthesis on demand databases. Journal of Chemical Information and Modeling, 64(4):1158–1171, 2024. doi: 10.1021/acs.jcim.3c01790.
- Chembo: Bayesian optimization of small organic molecules with synthesizable recommendations. In Silvia Chiappa and Roberto Calandra (eds.), Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108, pp. 3393–3403, 2020.
- D. Lowe. Chemical reactions from US patents. https://figshare.com/articles/Chemical_reactions_from_US_patents_1976-Sep2016_/5104873. Accessed: 31.04.2018.
- Trajectory balance: Improved credit assignment in gflownets. CoRR, abs/2201.13259, 2022.
- Molecular de-novo design through deep reinforcement learning. Journal of Cheminformatics, 9(48), 2017. doi: 10.1186/s13321-017-0235-x.
- Goal-conditioned gflownets for controllable multi-objective molecular design. arXiv preprint arXiv:2306.04620, 2023.
- Synthon-based ligand discovery in virtual libraries of over 11 billion compounds. Nature, 601(7893):452–459, 2022. doi: 10.1038/s41586-021-04220-9.
- Structure-based drug design with equivariant diffusion models. arXiv preprint arXiv:2210.13695, 2023.
- Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy. Chem. Sci., 11:3316–3325, 2020. doi: 10.1039/C9SC05704H.
- Tacogfn: Target conditioned gflownet for structure-based drug design. arXiv preprint arXiv:2310.03223, 2023.
- Graphaf: a flow-based autoregressive model for molecular graph generation. arXiv preprint arXiv:2001.09382, 2020.
- Autodock Vina: improving the speed and accuracy of docking with a new scoring func- tion, efficient optimization, and multithreading. Journal of Computational Chemistry, 31(2):455–461, 2010. doi: 10.1002/jcc.21334.
- Synopsis: SYNthesize and OPtimize System in Silico. Journal of Medicinal Chemistry, 46(13):2765–2773, 2003. doi: 10.1021/jm030809x.
- Graph transformer networks. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
- Moflow: An invertible flow model for generating molecular graphs. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’20, pp. 617–626, 2020. doi: 10.1145/3394486.3403104.
- The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Research, 52(D1):D1180–D1192, 2023. doi: 10.1093/nar/gkad1004.
- Miruna Cretu (2 papers)
- Charles Harris (8 papers)
- Julien Roy (9 papers)
- Emmanuel Bengio (36 papers)
- Pietro Liò (270 papers)
- Ilia Igashov (6 papers)
- Arne Schneuing (3 papers)
- Marwin Segler (16 papers)
- Bruno Correia (6 papers)