Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Molecule Design by Latent Prompt Transformer (2402.17179v2)

Published 27 Feb 2024 in cs.LG and q-bio.BM

Abstract: This work explores the challenging problem of molecule design by framing it as a conditional generative modeling task, where target biological properties or desired chemical constraints serve as conditioning variables. We propose the Latent Prompt Transformer (LPT), a novel generative model comprising three components: (1) a latent vector with a learnable prior distribution modeled by a neural transformation of Gaussian white noise; (2) a molecule generation model based on a causal Transformer, which uses the latent vector as a prompt; and (3) a property prediction model that predicts a molecule's target properties and/or constraint values using the latent prompt. LPT can be learned by maximum likelihood estimation on molecule-property pairs. During property optimization, the latent prompt is inferred from target properties and constraints through posterior sampling and then used to guide the autoregressive molecule generation. After initial training on existing molecules and their properties, we adopt an online learning algorithm to progressively shift the model distribution towards regions that support desired target properties. Experiments demonstrate that LPT not only effectively discovers useful molecules across single-objective, multi-objective, and structure-constrained optimization tasks, but also exhibits strong sample efficiency.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Conditioning by adaptive sampling for robust design. In International conference on machine learning (ICML), pp.  773–782. PMLR, 2019.
  2. Group selfies: a robust fragment-based molecular string representation. Digital Discovery, 2023.
  3. Molgan: An implicit generative model for small molecular graphs. arXiv preprint arXiv:1805.11973, 2018.
  4. Molgensurvey: A systematic survey in machine learning models for molecule design. arXiv preprint arXiv:2203.14500, 2022.
  5. Limo: Latent inceptionism for targeted molecule generation. In International Conference on Machine Learning (ICML), 2022.
  6. Autofocused oracles for model-based design. Advances in Neural Information Processing Systems (NeurIPS), 33:12945–12956, 2020.
  7. Differentiable scaffolding tree for molecular optimization. arXiv preprint arXiv:2109.10469, 2021.
  8. An improved model for fragment-based lead generation at astrazeneca. Drug Discovery Today, 21(8):1272–1283, 2016.
  9. Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science, 4(2):268–276, 2018.
  10. Hansen, N. The cma evolution strategy: a comparing review. Towards a new evolutionary computation: Advances in the estimation of distribution algorithms, pp.  75–102, 2006.
  11. Zinc: a free tool to discover chemistry for biology. Journal of Chemical Information and Modeling, 52(7):1757–1768, 2012.
  12. Biological sequence design with gflownets. In International Conference on Machine Learning (ICML), pp.  9786–9801. PMLR, 2022.
  13. Junction tree variational autoencoder for molecular graph generation. In International Conference on Machine Learning (ICML), pp.  2323–2332, 2018.
  14. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  15. Molecule design by latent space energy-based modeling and gradual distribution shifting. In Conference on Uncertainty in Artificial Intelligence (UAI), volume 216, pp.  1109–1120, 2023.
  16. Self-referencing embedded strings (selfies): A 100% robust molecular string representation. Machine Learning: Science and Technology, 1(4):045024, 2020.
  17. Grammar variational autoencoder. In International Conference on Machine Learning (ICML), pp.  1945–1954, 2017.
  18. Landrum, G. et al. RDKit: Open-source cheminformatics. URL https://www.rdkit.org.
  19. Decoupled weight decay regularization. In International Conference on Learning Representations (ICLR), 2019.
  20. Graphdf: A discrete flow model for molecular graph generation. In International Conference on Machine Learning (ICML), pp.  7192–7203, 2021.
  21. Inhibition of 3-phosphoglycerate dehydrogenase (phgdh) by indole amides abrogates de novo serine synthesis in cancer cells. Bioorganic & medicinal chemistry letters, 29(17):2503–2510, 2019. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6702104/.
  22. Neal, R. M. MCMC using hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, 2, 2011.
  23. Augmenting genetic algorithms with deep neural networks for exploring the chemical space. In International Conference on Learning Representations (ICLR), 2020.
  24. Learning multi-layer latent variable model via variational optimization of short run mcmc for approximate inference. In European Conference on Computer Vision (ECCV), pp.  361–378, 2020.
  25. Learning latent space energy-based prior model. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
  26. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp.  234–241. Springer, 2015.
  27. Serine synthesis helps hypoxic cancer stem cells regulate redox. Cancer research, 76(22):6458–6462, 2016. URL https://pubmed.ncbi.nlm.nih.gov/27811150/.
  28. Accelerating AutoDock4 with GPUs and gradient-based local search. Journal of Chemical Theory and Computation, 17(2):1060–1073, 2021.
  29. Graphaf: a flow-based autoregressive model for molecular graph generation. arXiv preprint arXiv:2001.09382, 2020.
  30. Phosphoglycerate dehydrogenase (phgdh) inhibitors: A comprehensive review 2015–2020. Expert opinion on therapeutic patents, 31(7):597–608, 2021.
  31. Amortized bayesian optimization over discrete spaces. In Conference on Uncertainty in Artificial Intelligence (UAI), pp.  769–778. PMLR, 2020.
  32. Conservative objective models for effective offline model-based optimization. In International Conference on Machine Learning (ICML), pp.  10358–10368. PMLR, 2021.
  33. Design-bench: Benchmarks for data-driven offline model-based optimization. In International Conference on Machine Learning (ICML), pp.  21658–21676. PMLR, 2022.
  34. Structural insights into the enzymatic activity and potential substrate promiscuity of human 3-phosphoglycerate dehydrogenase (phgdh). Oncotarget, 8(61):104478, 2017. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5732821/.
  35. Weininger, D. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences, 28(1):31–36, 1988.
  36. The reparameterization trick for acquisition functions. CORR, abs/1712.00424, 2017. URL http://arxiv.org/abs/1712.00424.
  37. Efficient multi-objective molecular optimization in a continuous latent space. Chemical Science, 10(34):8016–8024, 2019.
  38. Learning energy-based spatial-temporal generative convnets for dynamic patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), abs/1909.11975, 2019. URL http://arxiv.org/abs/1909.11975.
  39. A tale of two latent flows: Learning latent space normalizing flow with short-run langevin flow for approximate inference. In Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023.
  40. Mars: Markov molecular sampling for multi-objective drug discovery. In International Conference on Learning Representations, 2021.
  41. Practical massively parallel monte-carlo tree search applied to molecular design. arXiv preprint arXiv:2006.10504, 2020.
  42. Graph convolutional policy network for goal-directed molecular graph generation. In Advances in Neural Information Processing Systems (NeurIPS), pp.  6410–6421, 2018.
  43. The role of d-3-phosphoglycerate dehydrogenase in cancer. International Journal of Biological Sciences, 16(9):1495, 2020.
  44. Optimization of molecules via deep reinforcement learning. Scientific Reports, 9(1):1–10, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Deqian Kong (14 papers)
  2. Yuhao Huang (50 papers)
  3. Jianwen Xie (52 papers)
  4. Edouardo Honig (5 papers)
  5. Ming Xu (154 papers)
  6. Shuanghong Xue (1 paper)
  7. Pei Lin (7 papers)
  8. Sanping Zhou (50 papers)
  9. Sheng Zhong (57 papers)
  10. Nanning Zheng (146 papers)
  11. Ying Nian Wu (138 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com