Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization (2403.13829v1)

Published 7 Mar 2024 in q-bio.BM and cs.LG

Abstract: Recently, 3D generative models have shown promising performances in structure-based drug design by learning to generate ligands given target binding sites. However, only modeling the target-ligand distribution can hardly fulfill one of the main goals in drug discovery -- designing novel ligands with desired properties, e.g., high binding affinity, easily synthesizable, etc. This challenge becomes particularly pronounced when the target-ligand pairs used for training do not align with these desired properties. Moreover, most existing methods aim at solving \textit{de novo} design task, while many generative scenarios requiring flexible controllability, such as R-group optimization and scaffold hopping, have received little attention. In this work, we propose DecompOpt, a structure-based molecular optimization method based on a controllable and decomposed diffusion model. DecompOpt presents a new generation paradigm which combines optimization with conditional diffusion models to achieve desired properties while adhering to the molecular grammar. Additionally, DecompOpt offers a unified framework covering both \textit{de novo} design and controllable generation. To achieve so, ligands are decomposed into substructures which allows fine-grained control and local optimization. Experiments show that DecompOpt can efficiently generate molecules with improved properties than strong de novo baselines, and demonstrate great potential in controllable generation tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Guiding deep molecular optimization with genetic exploration. Advances in neural information processing systems, 33:12008–12021, 2020.
  2. Amy C Anderson. The process of structure-based drug design. Chemistry & biology, 10(9):787–797, 2003.
  3. Deep generative models for 3d molecular structure. Current Opinion in Structural Biology, 80:102566, 2023.
  4. Quantifying the chemical beauty of drugs. Nature chemistry, 4(2):90–98, 2012.
  5. Scaffold hopping. Drug discovery today: Technologies, 1(3):217–224, 2004.
  6. Danail Bonchev. Chemical graph theory: introduction and fundamentals, volume 1. CRC Press, 1991.
  7. Autodock vina 1.2. 0: New docking methods, expanded force field, and python bindings. Journal of chemical information and modeling, 61(8):3891–3898, 2021.
  8. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of cheminformatics, 1(1):1–11, 2009.
  9. Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. Journal of chemical information and modeling, 60(9):4200–4215, 2020.
  10. Reinforced genetic algorithm for structure-based drug design. Advances in Neural Information Processing Systems, 35:12325–12338, 2022.
  11. 3d equivariant diffusion for target-aware molecule generation and affinity prediction. In International Conference on Learning Representations, 2023a.
  12. DecompDiff: Diffusion models with decomposed priors for structure-based drug design. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  11827–11846. PMLR, 23–29 Jul 2023b. URL https://proceedings.mlr.press/v202/guan23a.html.
  13. Thomas A Halgren. Merck molecular force field. i. basis, form, scope, parameterization, and performance of mmff94. Journal of computational chemistry, 17(5-6):490–519, 1996.
  14. De novo drug design. Chemoinformatics and computational chemical biology, pp. 299–323, 2011.
  15. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  16. 3dlinker: an e (3) equivariant variational autoencoder for molecular linker design. arXiv preprint arXiv:2205.07309, 2022.
  17. Equivariant 3d-conditional diffusion models for molecular linker design. arXiv preprint arXiv:2210.05274, 2022.
  18. Deep generative models for 3d linker design. Journal of chemical information and modeling, 60(4):1983–1995, 2020.
  19. Deep generative design with 3d pharmacophoric constraints. Chemical science, 12(43):14577–14589, 2021.
  20. Jan H Jensen. A graph-based genetic algorithm and generative model/monte carlo tree search for the exploration of chemical space. Chemical science, 10(12):3567–3572, 2019.
  21. Junction tree variational autoencoder for molecular graph generation. In International conference on machine learning, pp. 2323–2332. PMLR, 2018.
  22. Multi-objective molecule generation using interpretable substructures. In International conference on machine learning, pp. 4849–4859. PMLR, 2020.
  23. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  24. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  25. Self-referencing embedded strings (selfies): A 100% robust molecular string representation. Machine Learning: Science and Technology, 1(4):045024, 2020.
  26. Bioisosteric replacement and scaffold hopping in lead generation and optimization. Molecular informatics, 29(5):366–385, 2010.
  27. Scaffold-constrained molecular generation. Journal of Chemical Information and Modeling, 60(12):5637–5646, 2020.
  28. Diffbp: Generative diffusion of 3d molecules for target protein binding. arXiv preprint arXiv:2211.11214, 2022.
  29. Generating 3d molecules for target protein binding. arXiv preprint arXiv:2204.09410, 2022.
  30. Zero-shot 3d drug design by sketching and generating. arXiv preprint arXiv:2209.13865, 2022.
  31. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  11461–11471, 2022.
  32. A 3d generative model for structure-based drug design. Advances in Neural Information Processing Systems, 34:6229–6239, 2021.
  33. Learning to extend molecular scaffolds with structural motifs. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=ZTsoE8G3GG.
  34. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pp. 8162–8171. PMLR, 2021.
  35. Augmenting genetic algorithms with deep neural networks for exploring the chemical space. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=H1lmyRNFvr.
  36. Open babel: An open chemical toolbox. Journal of cheminformatics, 3(1):1–14, 2011.
  37. Molecular de-novo design through deep reinforcement learning. Journal of cheminformatics, 9(1):1–14, 2017.
  38. Pocket2mol: Efficient molecular sampling based on 3d protein pockets. In International Conference on Machine Learning, pp. 17644–17655. PMLR, 2022.
  39. Molecular sets (moses): a benchmarking platform for molecular generation models. Frontiers in pharmacology, 11:565644, 2020.
  40. Generating 3d molecules conditional on receptor binding sites with deep generative models. Chemical science, 13(9):2701–2713, 2022.
  41. E (n) equivariant graph neural networks. In International conference on machine learning, pp. 9323–9332. PMLR, 2021.
  42. Structure-based drug design with equivariant diffusion models. arXiv preprint arXiv:2210.13695, 2022.
  43. Autogrow4: an open-source genetic algorithm for de novo drug design and lead optimization. Journal of cheminformatics, 12(1):1–16, 2020.
  44. R-group replacement database for medicinal chemistry. Future Science OA, 7(8):FSO742, 2021.
  45. Applications of machine learning in drug discovery and development. Nature reviews Drug discovery, 18(6):463–477, 2019.
  46. David Weininger. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences, 28(1):31–36, 1988.
  47. {MARS}: Markov molecular sampling for multi-objective drug discovery. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=kHSu4ebxFXY.
  48. Knowledge guided geometric editing for unsupervised drug design. 2021. URL https://openreview.net/forum?id=91muTwt1_t5.
  49. Structure-based drug design via 3d molecular generative pre-training and sampling. arxiv, 2024. URL https://arxiv.org/abs/2402.14315v1.
  50. Zaixi Zhang and Qi Liu. Learning subpocket prototypes for generalizable structure-based drug design. arXiv preprint arXiv:2305.13997, 2023.
  51. Molecule generation for target protein binding with structural motifs. In The Eleventh International Conference on Learning Representations, 2022.
  52. Optimization of molecules via deep reinforcement learning. Scientific reports, 9(1):10752, 2019.
  53. Dennis G Zill. Advanced engineering mathematics. Jones & Bartlett Learning, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Xiangxin Zhou (22 papers)
  2. Xiwei Cheng (6 papers)
  3. Yuwei Yang (11 papers)
  4. Yu Bao (36 papers)
  5. Liang Wang (512 papers)
  6. Quanquan Gu (198 papers)
Citations (7)
X Twitter Logo Streamline Icon: https://streamlinehq.com