Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SynFlowNet: Design of Diverse and Novel Molecules with Synthesis Constraints (2405.01155v2)

Published 2 May 2024 in cs.LG and q-bio.BM

Abstract: Generative models see increasing use in computer-aided drug design. However, while performing well at capturing distributions of molecular motifs, they often produce synthetically inaccessible molecules. To address this, we introduce SynFlowNet, a GFlowNet model whose action space uses chemical reactions and buyable reactants to sequentially build new molecules. By incorporating forward synthesis as an explicit constraint of the generative mechanism, we aim at bridging the gap between in silico molecular generation and real world synthesis capabilities. We evaluate our approach using synthetic accessibility scores and an independent retrosynthesis tool to assess the synthesizability of our compounds, and motivate the choice of GFlowNets through considerable improvement in sample diversity compared to baselines. Additionally, we identify challenges with reaction encodings that can complicate traversal of the MDP in the backward direction. To address this, we introduce various strategies for learning the GFlowNet backward policy and thus demonstrate how additional constraints can be integrated into the GFlowNet MDP framework. This approach enables our model to successfully identify synthesis pathways for previously unseen molecules.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. All SMILES VAE. CoRR, abs/1905.13343, 2019.
  2. Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation. arXiv preprint arXiv:2106.04399, 2021.
  3. Quantifying the chemical beauty of drugs. Nature Chemistry, 4(2):90–98, 2012. doi: 10.1038/nchem.1243.
  4. A model to search for synthesizable molecules. CoRR, abs/1906.05221, 2019.
  5. Thermometer encoding: One hot way to resist adversarial examples. In International Conference on Learning Representations, 2018.
  6. Automated de novo molecular design by hybrid machine intelligence and rule-driven chemical synthesis. Nature Machine Intelligence, 1(7):307–315, 2019.
  7. Molgan: An implicit generative model for small molecular graphs. arXiv preprint arXiv:1805.11973, 2022.
  8. ChEMBL2409. Chembl2409 Target Report Card. https://www.ebi.ac.uk/chembl/target_report_card/CHEMBL2409/. Accessed: 28.03.2024.
  9. Computer-assisted retrosynthesis based on molecular similarity. ACS Central Science, 3(12):1237–1245, 2017. doi: 10.1021/acscentsci.7b00355.
  10. Scscore: Synthetic complexity learned from a reaction corpus. Journal of Chemical Information and Modeling, 58(2):252–261, 2018. doi: 10.1021/acs.jcim.7b00622.
  11. Enamine. Enamine. https://enamine.net. Accessed: 28.11.2023.
  12. Estimation of synthetic acces- sibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of Cheminformatics, (1):1–11, 2009.
  13. The synthesizability of molecules proposed by generative models. Journal of Chemical Information and Modeling, 60(12):5714–5723, 2020. doi: 10.1021/acs.jcim.0c00174.
  14. Amortized tree generation for bottom-up synthesis planning and synthesizable molecular design. CoRR, abs/2110.06389, 2021.
  15. Aizynthfinder: a fast, robust and flexible open-source software for retrosynthetic planning. J Cheminform, 12:70, 2020. doi: 10.1186/s13321-020-00472-1.
  16. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning, volume 70, pp.  1263–1272, 2017.
  17. Automatic chemical design using a data-driven continuous representation of molecules. CoRR, abs/1610.02415, 2016.
  18. Learning to navigate the synthetically accessible chemical space using reinforcement learning. In Proceedings of the 37th International Conference on Machine Learning, volume 119, pp.  3668–3679, 2020.
  19. Objective-reinforced generative adversarial networks (organ) for sequence generation models. arXiv preprint arXiv:1705.10843, 2018.
  20. Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science, 4(2):268–276, 2018. doi: 10.1021/acscentsci.7b00572.
  21. Posecheck: Generative models for 3d structure-based drug design produce unrealistic poses. In NeurIPS 2023 Generative AI and Biology (GenBio) Workshop, 2023.
  22. DOGS: reaction-driven de novo design of bioactive. PLOS Computational Biology, 8(2), 2012. doi: 10.1371/journal.pcbi.1002380.
  23. Equivariant diffusion for molecule generation in 3D. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162, pp.  8867–8887, 2022.
  24. Molecular design in synthetically accessible chemical space via deep reinforcement learning. ACS Omega, 5(51):32984–32994, December 2020. doi: 10.1021/acsomega.0c04153.
  25. Soluble epoxide hydrolase as a therapeutic target for cardiovascular diseases. Nat Rev Drug Discov, 8(2):794–805, 2009. doi: 10.1038/nrd2875.
  26. Multi-objective GFlowNets. arXiv preprint arXiv:2210.12765, 2023.
  27. Junction Tree Variational Autoencoder for Molecular Graph Generation. arXiv preprint arXiv:1802.04364, 2019.
  28. Thompson sampling an efficient method for searching ultralarge synthesis on demand databases. Journal of Chemical Information and Modeling, 64(4):1158–1171, 2024. doi: 10.1021/acs.jcim.3c01790.
  29. Chembo: Bayesian optimization of small organic molecules with synthesizable recommendations. In Silvia Chiappa and Roberto Calandra (eds.), Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108, pp.  3393–3403, 2020.
  30. D. Lowe. Chemical reactions from US patents. https://figshare.com/articles/Chemical_reactions_from_US_patents_1976-Sep2016_/5104873. Accessed: 31.04.2018.
  31. Trajectory balance: Improved credit assignment in gflownets. CoRR, abs/2201.13259, 2022.
  32. Molecular de-novo design through deep reinforcement learning. Journal of Cheminformatics, 9(48), 2017. doi: 10.1186/s13321-017-0235-x.
  33. Goal-conditioned gflownets for controllable multi-objective molecular design. arXiv preprint arXiv:2306.04620, 2023.
  34. Synthon-based ligand discovery in virtual libraries of over 11 billion compounds. Nature, 601(7893):452–459, 2022. doi: 10.1038/s41586-021-04220-9.
  35. Structure-based drug design with equivariant diffusion models. arXiv preprint arXiv:2210.13695, 2023.
  36. Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy. Chem. Sci., 11:3316–3325, 2020. doi: 10.1039/C9SC05704H.
  37. Tacogfn: Target conditioned gflownet for structure-based drug design. arXiv preprint arXiv:2310.03223, 2023.
  38. Graphaf: a flow-based autoregressive model for molecular graph generation. arXiv preprint arXiv:2001.09382, 2020.
  39. Autodock Vina: improving the speed and accuracy of docking with a new scoring func- tion, efficient optimization, and multithreading. Journal of Computational Chemistry, 31(2):455–461, 2010. doi: 10.1002/jcc.21334.
  40. Synopsis: SYNthesize and OPtimize System in Silico. Journal of Medicinal Chemistry, 46(13):2765–2773, 2003. doi: 10.1021/jm030809x.
  41. Graph transformer networks. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  42. Moflow: An invertible flow model for generating molecular graphs. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’20, pp.  617–626, 2020. doi: 10.1145/3394486.3403104.
  43. The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Research, 52(D1):D1180–D1192, 2023. doi: 10.1093/nar/gkad1004.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Miruna Cretu (2 papers)
  2. Charles Harris (8 papers)
  3. Julien Roy (9 papers)
  4. Emmanuel Bengio (36 papers)
  5. Pietro Liò (270 papers)
  6. Ilia Igashov (6 papers)
  7. Arne Schneuing (3 papers)
  8. Marwin Segler (16 papers)
  9. Bruno Correia (6 papers)

Summary

Exploring SynFlowNet: A Novel Approach to Synthesis-Aware Molecular Design

Understanding the SynFlowNet Model

With the advancement of AI in drug discovery, the ability to generate novel molecular structures efficiently and effectively is paramount. One of the significant limitations of existing generative models is their tendency to propose molecules that, although theoretically interesting, may be impossible to synthesize in the laboratory setting. This challenge is at the core of the paper I'm discussing today, where the introduction of SynFlowNet, a GFlowNet-based approach, marks a shift towards integrating synthetic feasibility into the generative modeling processes.

The Synthesis Accessibility Challenge

The concept here pivots on not just conceiving new molecules but ensuring that these molecules can indeed be synthesized with current chemical reactions using available materials. Most traditional generative models might impress with the novelty of the structures they come up with or their theoretical properties, but they fall short when it comes to actual, practical synthesis.

SynFlowNet's Advantage: This model directly incorporates chemical reactions into its generative process, which ensures every output is synthetically viable — a considerable advantage over conventional models that merely focus on mimicking molecular patterns learned from data.

Core Methodology

At the heart of SynFlowNet is the use of GFlowNets to guide the generation of molecules through a sequence of realistic chemical reactions, starting from commercially available compounds. This method represents a more practical approach than other models that might generate molecules requiring complex or unknown synthesis paths.

  • The use of GFlowNets helps in maintaining diversity among the generated molecules. It balances the exploration across various synthetic possibilities rather than converging prematurely on a narrow set of high-reward options.
  • The model is fundamentally built around a reaction-based action space rather than mere molecular fragments. This ensures that every theoretically proposed structure has a clear, feasible synthetic pathway using existing chemical reactions.

Operational Dynamics: SynFlowNet starts from basic molecular structures and incrementally builds more complex molecules by applying chemical reactions step-by-step. This progression mimics the actual synthetic processes in a lab, making each proposed molecule more than a mere theoretical construct.

Impressive Results

The model has demonstrated an ability to produce novel molecules with properties comparable to experimentally validated molecules, especially in terms of molecular weight and binding affinity, crucial metrics in therapeutic development.

  • Synthetic Accessibility Scores: SynFlowNet outputs molecules with noticeably better synthetic accessibility scores than its counterparts, affirming its practical advantage.
  • The diversity of the molecules generated remains high, reflecting the model's capacity to explore a wide chemical space despite its focus on synthetic feasibility.

Practical Implications and Future Prospects

The integration of a realistic synthetic pathway into the molecule generation process is poised to significantly streamline the drug discovery pipeline. By ensuring that each proposed molecule can be synthesized, SynFlowNet reduces the gap between theoretical design and practical usability.

Looking forward, there are exciting possibilities for expanding this model:

  1. Expanding the Reaction Space: Including a broader range of chemical reactions could further enhance the model's ability to generate a wider variety of feasible molecules.
  2. Multi-Objective Optimization: Adapting SynFlowNet to optimize for multiple properties simultaneously, such as solubility and toxicity, could make it an even more powerful tool in drug design.

Conclusion

SynFlowNet represents a sophisticated stride forward in the field of molecular design, bridging the gap between theoretical generative models and practical synthetic chemistry. By ensuring that each generated molecule is not only desirable for its properties but also synthesizable in the lab, this approach enhances the reliability and efficiency of drug discovery efforts, grounding them firmly in the realms of practical achievability.

Youtube Logo Streamline Icon: https://streamlinehq.com