Papers
Topics
Authors
Recent
Search
2000 character limit reached

De Novo Drug Design with Joint Transformers

Published 3 Oct 2023 in cs.LG and cs.AI | (2310.02066v3)

Abstract: De novo drug design requires simultaneously generating novel molecules outside of training data and predicting their target properties, making it a hard task for generative models. To address this, we propose Joint Transformer that combines a Transformer decoder, Transformer encoder, and a predictor in a joint generative model with shared weights. We formulate a probabilistic black-box optimization algorithm that employs Joint Transformer to generate novel molecules with improved target properties and outperforms other SMILES-based optimization methods in de novo drug design.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Two decades of blackbox optimization applications. EURO Journal on Computational Optimization, 9:100011, 2021. ISSN 2192-4406. doi: https://doi.org/10.1016/j.ejco.2021.100011. URL https://www.sciencedirect.com/science/article/pii/S2192440621001386.
  2. Derivative-free and blackbox optimization. Springer, 2017.
  3. MolGPT: Molecular Generation Using a Transformer-Decoder Model. Journal of Chemical Information and Modeling, 62(9):2064–2076, May 2022. ISSN 1549-9596. doi: 10.1021/acs.jcim.1c00600.
  4. GuacaMol: Benchmarking Models for de Novo Molecular Design. Journal of Chemical Information and Modeling, 59(3):1096–1108, 2019. ISSN 1549-9596. doi: 10.1021/acs.jcim.8b00839.
  5. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  6. In silico generation of novel, drug-like chemical matter using the LSTM neural network, January 2018.
  7. Sample Efficiency Matters: A Benchmark for Practical Molecular Optimization. Advances in Neural Information Processing Systems, 35:21342–21357, December 2022.
  8. Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules. ACS Central Science, 4(2):268–276, February 2018. ISSN 2374-7943. doi: 10.1021/acscentsci.7b00572.
  9. Accelerating high-throughput virtual screening through molecular pool-based active learning. Chemical Science, 12(22):7866–7881, June 2021. ISSN 2041-6539. doi: 10.1039/D0SC06805E.
  10. Francesca Grisoni. Chemical language models for de novo drug design: Challenges and opportunities. Current Opinion in Structural Biology, 79:102527, 2023.
  11. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.
  12. Magnet: Motif-agnostic generation of molecules from shapes. arXiv preprint arXiv:2305.19303, 2023.
  13. Equivariant diffusion for molecule generation in 3d. In International conference on machine learning, pp.  8867–8887. PMLR, 2022.
  14. ZINC20—A Free Ultralarge-Scale Chemical Database for Ligand Discovery. Journal of Chemical Information and Modeling, 60(12):6065–6073, 2020. ISSN 1549-9596. doi: 10.1021/acs.jcim.0c00675.
  15. Junction Tree Variational Autoencoder for Molecular Graph Generation. In Proceedings of the 35th International Conference on Machine Learning, pp.  2323–2332. PMLR, July 2018.
  16. The cornucopia of meaningful leads: Applying deep adversarial autoencoders for new molecule development in oncology. Oncotarget, 8(7):10883–10890, December 2016. ISSN 1949-2553. doi: 10.18632/oncotarget.14073.
  17. Andrej Karpathy. minGPT, September 2023.
  18. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  19. Principled hybrids of generative and discriminative models. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), volume 1, pp.  87–94. IEEE, 2006.
  20. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  21. Learning to extend molecular scaffolds with structural motifs. arXiv preprint arXiv:2103.03864, 2021.
  22. ChEMBL: Towards direct deposition of bioassay data. Nucleic Acids Research, 47(D1):D930–D940, 2019. ISSN 1362-4962. doi: 10.1093/nar/gky1075.
  23. Hybrid models with deep and invertible features. In International Conference on Machine Learning, pp.  4723–4732. PMLR, 2019.
  24. Exploring deep recurrent models with reinforcement learning for molecule design. 2018.
  25. Molecular De Novo Design through Deep Reinforcement Learning, August 2017.
  26. Fréchet ChemNet Distance: A metric for generative models for molecules in drug discovery, August 2018.
  27. Improving language understanding by generative pre-training. 2018.
  28. Language models are unsupervised multitask learners. 2019.
  29. Stochastic backpropagation and approximate inference in deep generative models. In International conference on machine learning, pp.  1278–1286. PMLR, 2014.
  30. Large-scale chemical language representations capture molecular structure and properties, 2022.
  31. Automated de novo drug design: are we nearly there yet? Angewandte Chemie International Edition, 58(32):10792–10803, 2019.
  32. Mapping the space of chemical reactions using attention-based neural networks. ChemRxiv, 2020. doi: 10.26434/chemrxiv.9897365.v4.
  33. Black-box optimization for automated discovery. Accounts of Chemical Research, 54(6):1334–1346, 2021. doi: 10.1021/acs.accounts.0c00713. URL https://doi.org/10.1021/acs.accounts.0c00713. PMID: 33635621.
  34. Augmentation is what you need! In International Conference on Artificial Neural Networks, pp.  831–835. Springer, 2019.
  35. Sample-efficient optimization in the latent space of deep generative models via weighted retraining. Advances in Neural Information Processing Systems, 33:11259–11272, 2020.
  36. Attention is All you Need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  37. Bert has a mouth, and it must speak: Bert as a markov random field language model. arXiv preprint arXiv:1902.04094, 2019.
  38. David Weininger. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences, 28(1):31–36, February 1988. ISSN 0095-2338. doi: 10.1021/ci00057a005.
  39. Protein sequence design with deep generative models. Current Opinion in Chemical Biology, 65:18–27, 2021. ISSN 1367-5931. doi: https://doi.org/10.1016/j.cbpa.2021.04.004. URL https://www.sciencedirect.com/science/article/pii/S136759312100051X. Mechanistic Biology * Machine Learning in Chemical Biology.
  40. Population-based de novo molecule generation, using grammatical evolution. Chemistry Letters, 47(11):1431–1434, 2018.
  41. Uni-mol: A universal 3d molecular representation learning framework. ChemRxiv, 2023. doi: 10.26434/chemrxiv-2022-jjm0j-v4.
Citations (3)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.