Sculpting Molecules in Text-3D Space: A Flexible Substructure Aware Framework for Text-Oriented Molecular Optimization (2403.03425v2)
Abstract: The integration of deep learning, particularly AI-Generated Content, with high-quality data derived from ab initio calculations has emerged as a promising avenue for transforming the landscape of scientific research. However, the challenge of designing molecular drugs or materials that incorporate multi-modality prior knowledge remains a critical and complex undertaking. Specifically, achieving a practical molecular design necessitates not only meeting the diversity requirements but also addressing structural and textural constraints with various symmetries outlined by domain experts. In this article, we present an innovative approach to tackle this inverse design problem by formulating it as a multi-modality guidance optimization task. Our proposed solution involves a textural-structure alignment symmetric diffusion framework for the implementation of molecular optimization tasks, namely 3DToMolo. 3DToMolo aims to harmonize diverse modalities including textual description features and graph structural features, aligning them seamlessly to produce molecular structures adhere to specified symmetric structural and textural constraints by experts in the field. Experimental trials across three guidance optimization settings have shown a superior hit optimization performance compared to state-of-the-art methodologies. Moreover, 3DToMolo demonstrates the capability to discover potential novel molecules, incorporating specified target substructures, without the need for prior knowledge. This work not only holds general significance for the advancement of deep learning methodologies but also paves the way for a transformative shift in molecular design strategies. 3DToMolo creates opportunities for a more nuanced and effective exploration of the vast chemical space, opening new frontiers in the development of molecular entities with tailored properties and functionalities.
- A deep generative model for molecule optimization via one fragment modification. \JournalTitleNature Machine Intelligence 3, 1040–1049, 10.1038/s42256-021-00410-2 (2021).
- Chemical probes and drug leads from advances in synthetic planning and methodology. \JournalTitleNature Reviews Drug Discovery 17, 333–352 (2018).
- Hoffer, L. et al. Integrated strategy for lead optimization based on fragment growing: the diversity-oriented-target-focused-synthesis approach. \JournalTitleJournal of medicinal chemistry 61, 5719–5732 (2018).
- de Souza Neto, L. R. et al. In silico strategies to support fragment-to-lead optimization in drug discovery. \JournalTitleFrontiers in Chemistry 8 (2020).
- Grammar variational autoencoder. In International conference on machine learning, 1945–1954 (PMLR, 2017).
- Gómez-Bombarelli, R. & et al. Automatic chemical design using a data-driven continuous representation of molecules. \JournalTitleACS Central Science 10.1021/acscentsci.7b00572 (2018).
- Inverse molecular design using machine learning: Generative models for matter engineering. \JournalTitleScience 361, 360–365 (2018).
- Generating focused molecule libraries for drug discovery with recurrent neural networks. \JournalTitleACS central science 4, 120–131 (2018).
- Duvenaud, D. K. et al. Convolutional networks on graphs for learning molecular fingerprints. \JournalTitleAdvances in neural information processing systems 28 (2015).
- N-gram graph: Simple unsupervised representation for graphs, with applications to molecules. \JournalTitleAdvances in neural information processing systems 32 (2019).
- Liu, S. et al. Multi-modal molecule structure-text model for text-based retrieval and editing. \JournalTitleNature Machine Intelligence 5, 1447–1457, 10.1038/s42256-023-00759-6 (2023).
- A deep-learning system bridging molecule structure and biomedical text with comprehension comparable to human professionals. \JournalTitleNature communications 13, 862 (2022).
- Nakata, Y. & et al. Molecular generation for organic electrolyte molecule discovery using conditional variational autoencoders. \JournalTitleThe Journal of Physical Chemistry Letters 10.1021/acs.jpclett.8b02011 (2018).
- Constrained graph variational autoencoders for molecule design. \JournalTitlearXiv preprint arXiv:1805.09076 (2018). 1805.09076.
- Goodfellow, I. et al. Generative adversarial networks. \JournalTitleCommunications of the ACM 63, 139–144 (2020).
- Prykhodko, O. et al. A de novo molecular generation method using latent vector based generative adversarial network. \JournalTitleJournal of Cheminformatics 11, 1–13 (2019).
- Gomez-Bombarelli, R. & et al. Chemgan challenge for drug discovery: can ai reproduce natural chemical diversity? \JournalTitleChemRxiv 10.26434/chemrxiv.5309669.v1 (2018).
- Molgan: An implicit generative model for small molecular graphs. \JournalTitlearXiv preprint arXiv:1805.11973 (2018). 1805.11973.
- Krishnan, S. R. et al. De novo structure-based drug design using deep learning. \JournalTitleJournal of Chemical Information and Modeling 62, 5100–5109 (2021).
- Arús-Pous, J. et al. Randomized smiles strings improve the quality of molecular generative models. \JournalTitleJournal of cheminformatics 11, 1–13 (2019).
- Molgpt: molecular generation using a transformer-decoder model. \JournalTitleJournal of Chemical Information and Modeling 62, 2064–2076 (2021).
- Masked graph modeling for molecule generation. \JournalTitleNature communications 12, 3156 (2021).
- Gupta, A. et al. Generative recurrent networks for de novo drug design. \JournalTitleMolecular informatics 37, 1700111 (2018).
- Structure-based de novo drug design using 3d deep generative models. \JournalTitleChemical science 12, 13664–13675 (2021).
- He, J. et al. Molecular optimization by capturing chemist’s intuition using deep neural networks. \JournalTitleJournal of Cheminformatics 13, 26, 10.1186/s13321-021-00497-0 (2021).
- Optimizing molecules using efficient queries from property evaluations. \JournalTitleNature Machine Intelligence 4, 21–31, 10.1038/s42256-021-00422-y (2022).
- De novo drug design using reinforcement learning with graph-based deep generative models. \JournalTitleJournal of Chemical Information and Modeling 62, 4863–4872, 10.1021/acs.jcim.2c00838 (2022). PMID: 36219571, https://doi.org/10.1021/acs.jcim.2c00838.
- Deep reinforcement learning for de novo drug design. \JournalTitleScience advances 4, eaap7885 (2018).
- Molecular de novo design through deep reinforcement learning. \JournalTitleJournal of cheminformatics 9, 48 (2017).
- Putin, E. et al. Reinforcement learning for molecular de novo design. \JournalTitleJournal of cheminformatics 10, 1–11 (2018).
- Graph convolutional policy network for goal-directed molecular graph generation. In Advances in Neural Information Processing Systems, 6410–6421 (2018).
- Planning chemical syntheses with deep neural networks and symbolic ai. \JournalTitleNature 555, 604–610 (2018).
- Jorgensen, W. L. Efficient drug lead discovery and optimization. \JournalTitleAccounts of chemical research 42, 724–733 (2009).
- O’Boyle, N. M. et al. Open babel: An open chemical toolbox. \JournalTitleJournal of cheminformatics 3, 1–14 (2011).
- Score-based generative modeling of graphs via the system of stochastic differential equations. In International Conference on Machine Learning, 10362–10383 (PMLR, 2022).
- A group symmetric stochastic differential equation model for molecule multi-modal pretraining. In International Conference on Machine Learning, 21497–21526 (PMLR, 2023).
- Touvron, H. et al. Llama: Open and efficient foundation language models. \JournalTitlearXiv preprint arXiv:2302.13971 (2023).
- Hu, W. et al. Strategies for pre-training graph neural networks. \JournalTitlearXiv preprint arXiv:1905.12265 (2019).
- Wu, Z. et al. Moleculenet: a benchmark for molecular machine learning. \JournalTitleChemical science 9, 513–530 (2018).
- Bradley, A. P. The use of the area under the roc curve in the evaluation of machine learning algorithms. \JournalTitlePattern recognition 30, 1145–1159 (1997).
- Why is tanimoto index an appropriate choice for fingerprint-based similarity calculations? \JournalTitleJournal of cheminformatics 7, 1–13 (2015).
- Irwin, J. J. et al. Zinc20—a free ultralarge-scale chemical database for ligand discovery. \JournalTitleJournal of Chemical Information and Modeling 60, 6065–6073, 10.1021/acs.jcim.0c00675 (2020). PMID: 33118813, https://doi.org/10.1021/acs.jcim.0c00675.
- Mendez, D. et al. Chembl: towards direct deposition of bioassay data. \JournalTitleNucleic acids research 47, D930–D940 (2019).
- Molecule joint auto-encoding: Trajectory pretraining with 2d and 3d diffusion (2023). 2312.03475.
- A route to carbon-sp3 bridging spiro-molecules: synthetic methods and optoelectronic applications. \JournalTitleOrganic Chemistry Frontiers 11, 508 (2024).
- Seto, R. et al. 9,9’-spirobifluorene-containing polycarbonates: Transparent polymers with high refractive index and low birefringence. \JournalTitleJournal of Polymer Science Part A: Polymer Chemistry 48, 3658–3667, https://doi.org/10.1002/pola.24150 (2010). https://onlinelibrary.wiley.com/doi/pdf/10.1002/pola.24150.
- Smith, D. G. et al. Psi4 1.4: Open-source software for high-throughput quantum chemistry. \JournalTitleThe Journal of chemical physics 152 (2020).
- Frisch, M. J. et al. Gaussian˜16 Revision C.01 (2016). Gaussian Inc. Wallingford CT.
- Penicillin: the medicine with the greatest impact on therapeutic outcomes. \JournalTitleApplied microbiology and biotechnology 92, 677–687 (2011).
- Penicillin-binding proteins and the mechanism of action of beta-lactam antibiotics. \JournalTitleAnnual Review of Biochemistry 52, 825–869 (1983).
- β𝛽\betaitalic_β-lactam antibiotics: An overview from a medicinal chemistry perspective. \JournalTitleEuropean journal of medicinal chemistry 208, 112829 (2020).
- Klein, A. R. et al. Probing the fate of different structures of beta-lactam antibiotics: Hydrolysis, mineral capture, and influence of organic matter. \JournalTitleACS Earth and Space Chemistry 5,6, 1511–1524 (2021).
- Rolinson, G. N. Forty years of beta-lactam research. \JournalTitleThe Journal of antimicrobial chemotherapy 41, 589–603 (1998).
- Triptolide: structural modifications, structure–activity relationships, bioactivities, clinical development and mechanisms. \JournalTitleNatural product reports 29, 457–475 (2012).
- Tong, L. et al. Triptolide: reflections on two decades of research and prospects for the future. \JournalTitleNatural product reports 38, 843–860 (2021).
- Triptolide: Medicinal chemistry, chemical biology and clinical progress. \JournalTitleEuropean journal of medicinal chemistry 176, 378–392 (2019).
- Hu, E. J. et al. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations (2022).
- Sun, Q. et al. Generative multimodal models are in-context learners. \JournalTitlearXiv:2312.13286 (2023).
- The synthesizability of molecules proposed by generative models. \JournalTitleJournal of Chemical Information and Modeling 60, 5714–5723, 10.1021/acs.jcim.0c00174 (2020). PMID: 32250616, https://doi.org/10.1021/acs.jcim.0c00174.
- Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. \JournalTitleJournal of cheminformatics 1, 1–11 (2009).
- Pubchemqc project: A large-scale first-principles electronic structure database for data-driven chemistry. \JournalTitleJournal of Chemical Information and Modeling 57, 1300–1308, 10.1021/acs.jcim.7b00083 (2017). PMID: 28481528, https://doi.org/10.1021/acs.jcim.7b00083.
- Score-based generative modeling in latent space. \JournalTitleAdvances in Neural Information Processing Systems 34, 11287–11302 (2021).
- Meng, C. et al. Sdedit: Guided image synthesis and editing with stochastic differential equations. \JournalTitlearXiv preprint arXiv:2108.01073 (2021).
- Denoising diffusion implicit models. In International Conference on Learning Representations (2021).
- Midi: Mixed graph and 3d denoising diffusion for molecule generation. \JournalTitlearXiv preprint arXiv:2302.09048 (2023).
- A flexible diffusion model. In International Conference on Machine Learning, 8678–8696 (PMLR, 2023).
- Radford, A. et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, 8748–8763 (PMLR, 2021).
- Llm-prop: Predicting physical and electronic properties of crystalline solids from their text descriptions. \JournalTitlearXiv:2310.14029 (2023).
- Kaiwei Zhang (11 papers)
- Yange Lin (2 papers)
- Guangcheng Wu (1 paper)
- Yuxiang Ren (24 papers)
- Xuecang Zhang (12 papers)
- Xiaoyu Zhang (144 papers)
- Weitao Du (23 papers)
- Bo Wang (823 papers)