2000 character limit reached
De Novo Drug Design with Joint Transformers
Published 3 Oct 2023 in cs.LG and cs.AI | (2310.02066v3)
Abstract: De novo drug design requires simultaneously generating novel molecules outside of training data and predicting their target properties, making it a hard task for generative models. To address this, we propose Joint Transformer that combines a Transformer decoder, Transformer encoder, and a predictor in a joint generative model with shared weights. We formulate a probabilistic black-box optimization algorithm that employs Joint Transformer to generate novel molecules with improved target properties and outperforms other SMILES-based optimization methods in de novo drug design.
- Two decades of blackbox optimization applications. EURO Journal on Computational Optimization, 9:100011, 2021. ISSN 2192-4406. doi: https://doi.org/10.1016/j.ejco.2021.100011. URL https://www.sciencedirect.com/science/article/pii/S2192440621001386.
- Derivative-free and blackbox optimization. Springer, 2017.
- MolGPT: Molecular Generation Using a Transformer-Decoder Model. Journal of Chemical Information and Modeling, 62(9):2064–2076, May 2022. ISSN 1549-9596. doi: 10.1021/acs.jcim.1c00600.
- GuacaMol: Benchmarking Models for de Novo Molecular Design. Journal of Chemical Information and Modeling, 59(3):1096–1108, 2019. ISSN 1549-9596. doi: 10.1021/acs.jcim.8b00839.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- In silico generation of novel, drug-like chemical matter using the LSTM neural network, January 2018.
- Sample Efficiency Matters: A Benchmark for Practical Molecular Optimization. Advances in Neural Information Processing Systems, 35:21342–21357, December 2022.
- Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules. ACS Central Science, 4(2):268–276, February 2018. ISSN 2374-7943. doi: 10.1021/acscentsci.7b00572.
- Accelerating high-throughput virtual screening through molecular pool-based active learning. Chemical Science, 12(22):7866–7881, June 2021. ISSN 2041-6539. doi: 10.1039/D0SC06805E.
- Francesca Grisoni. Chemical language models for de novo drug design: Challenges and opportunities. Current Opinion in Structural Biology, 79:102527, 2023.
- Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.
- Magnet: Motif-agnostic generation of molecules from shapes. arXiv preprint arXiv:2305.19303, 2023.
- Equivariant diffusion for molecule generation in 3d. In International conference on machine learning, pp. 8867–8887. PMLR, 2022.
- ZINC20—A Free Ultralarge-Scale Chemical Database for Ligand Discovery. Journal of Chemical Information and Modeling, 60(12):6065–6073, 2020. ISSN 1549-9596. doi: 10.1021/acs.jcim.0c00675.
- Junction Tree Variational Autoencoder for Molecular Graph Generation. In Proceedings of the 35th International Conference on Machine Learning, pp. 2323–2332. PMLR, July 2018.
- The cornucopia of meaningful leads: Applying deep adversarial autoencoders for new molecule development in oncology. Oncotarget, 8(7):10883–10890, December 2016. ISSN 1949-2553. doi: 10.18632/oncotarget.14073.
- Andrej Karpathy. minGPT, September 2023.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Principled hybrids of generative and discriminative models. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), volume 1, pp. 87–94. IEEE, 2006.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Learning to extend molecular scaffolds with structural motifs. arXiv preprint arXiv:2103.03864, 2021.
- ChEMBL: Towards direct deposition of bioassay data. Nucleic Acids Research, 47(D1):D930–D940, 2019. ISSN 1362-4962. doi: 10.1093/nar/gky1075.
- Hybrid models with deep and invertible features. In International Conference on Machine Learning, pp. 4723–4732. PMLR, 2019.
- Exploring deep recurrent models with reinforcement learning for molecule design. 2018.
- Molecular De Novo Design through Deep Reinforcement Learning, August 2017.
- Fréchet ChemNet Distance: A metric for generative models for molecules in drug discovery, August 2018.
- Improving language understanding by generative pre-training. 2018.
- Language models are unsupervised multitask learners. 2019.
- Stochastic backpropagation and approximate inference in deep generative models. In International conference on machine learning, pp. 1278–1286. PMLR, 2014.
- Large-scale chemical language representations capture molecular structure and properties, 2022.
- Automated de novo drug design: are we nearly there yet? Angewandte Chemie International Edition, 58(32):10792–10803, 2019.
- Mapping the space of chemical reactions using attention-based neural networks. ChemRxiv, 2020. doi: 10.26434/chemrxiv.9897365.v4.
- Black-box optimization for automated discovery. Accounts of Chemical Research, 54(6):1334–1346, 2021. doi: 10.1021/acs.accounts.0c00713. URL https://doi.org/10.1021/acs.accounts.0c00713. PMID: 33635621.
- Augmentation is what you need! In International Conference on Artificial Neural Networks, pp. 831–835. Springer, 2019.
- Sample-efficient optimization in the latent space of deep generative models via weighted retraining. Advances in Neural Information Processing Systems, 33:11259–11272, 2020.
- Attention is All you Need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
- Bert has a mouth, and it must speak: Bert as a markov random field language model. arXiv preprint arXiv:1902.04094, 2019.
- David Weininger. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences, 28(1):31–36, February 1988. ISSN 0095-2338. doi: 10.1021/ci00057a005.
- Protein sequence design with deep generative models. Current Opinion in Chemical Biology, 65:18–27, 2021. ISSN 1367-5931. doi: https://doi.org/10.1016/j.cbpa.2021.04.004. URL https://www.sciencedirect.com/science/article/pii/S136759312100051X. Mechanistic Biology * Machine Learning in Chemical Biology.
- Population-based de novo molecule generation, using grammatical evolution. Chemistry Letters, 47(11):1431–1434, 2018.
- Uni-mol: A universal 3d molecular representation learning framework. ChemRxiv, 2023. doi: 10.26434/chemrxiv-2022-jjm0j-v4.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.