Molecular Generative Adversarial Network with Multi-Property Optimization (2404.00081v1)
Abstract: Deep generative models, such as generative adversarial networks (GANs), have been employed for $de~novo$ molecular generation in drug discovery. Most prior studies have utilized reinforcement learning (RL) algorithms, particularly Monte Carlo tree search (MCTS), to handle the discrete nature of molecular representations in GANs. However, due to the inherent instability in training GANs and RL models, along with the high computational cost associated with MCTS sampling, MCTS RL-based GANs struggle to scale to large chemical databases. To tackle these challenges, this study introduces a novel GAN based on actor-critic RL with instant and global rewards, called InstGAN, to generate molecules at the token-level with multi-property optimization. Furthermore, maximized information entropy is leveraged to alleviate the mode collapse. The experimental results demonstrate that InstGAN outperforms other baselines, achieves comparable performance to state-of-the-art models, and efficiently generates molecules with multi-property optimization. The source code will be released upon acceptance of the paper.
- Unpaired generative molecule-to-molecule translation for lead optimization. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 2554–2564, 2021.
- Multi-property molecular optimization using an integrated poly-cycle architecture. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 3727–3736, 2021.
- Quantifying the chemical beauty of drugs. Nature Chemistry, 4(2):90–98, 2012.
- Molecular fingerprint similarity search in virtual screening. Methods, 71:58–63, 2015.
- Advancing drug discovery via artificial intelligence. Trends in Pharmacological Sciences, 40(8):592–604, 2019.
- The rise of deep learning in drug discovery. Drug Discovery Today, 23(6):1241–1250, 2018.
- Lipophilicity profiles: theory and measurement. Pharmacokinetic Optimization in Drug Research: Biological, Physicochemical and Computational Strategies, pp. 275–304, 2001.
- Discriminative embeddings of latent variable models for structured data. In Proceedings of the International Conference on Machine Learning, pp. 2702–2711. PMLR, 2016.
- Syntax-directed variational autoencoder for molecule generation. In Proceedings of the International Conference on Learning Representations, 2018.
- MolGAN: An implicit generative model for small molecular graphs. ArXiv Preprint ArXiv:1805.11973, 2018.
- Training language GANs from scratch. In Proceedings of the Advances in Neural Information Processing Systems, volume 32, 2019.
- Attention-based generative models for de novo molecular design. Chemical Science, 12(24):8362–8372, 2021.
- Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of Cheminformatics, 1(1):1–11, 2009.
- MaskGAN: better text generation via filling in the_. ArXiv Preprint ArXiv:1801.07736, 2018.
- Chembl: a large-scale bioactivity database for drug discovery. Nucleic Acids Research, 40(D1):D1100–D1107, 2012.
- The chembl database in 2017. Nucleic Acids Research, 45(D1):D945–D954, 2017.
- Neural message passing for quantum chemistry. In Proceedings of the International Conference on Machine Learning, pp. 1263–1272. PMLR, 2017.
- Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems, volume 27, 2014.
- Objective-reinforced generative adversarial networks (ORGAN) for sequence generation models. ArXiv Preprint ArXiv:1705.10843, 2017.
- Diffusing on two levels and optimizing for multiple properties: A novel approach to generating molecules with desirable properties. ArXiv Preprint ArXiv:2310.04463, 2023.
- Zinc: a free tool to discover chemistry for biology. Journal of Chemical Information and Modeling, 52(7):1757–1768, 2012.
- Junction tree variational autoencoder for molecular graph generation. In Proceedings of the International Conference on Machine Learning, pp. 2323–2332. PMLR, 2018.
- Hierarchical graph-to-graph translation for molecules. ArXiv Preprint ArXiv:1907.11223, 2019.
- Score-based generative modeling of graphs via the system of stochastic differential equations. In Proceedings of the International Conference on Machine Learning, pp. 10362–10383. PMLR, 2022.
- GLow: Generative flow with invertible 1x1 convolutions. In Proceedings of the Advances in Neural Information Processing Systems, volume 31, 2018.
- Hückel rules and electron correlation. Journal of the American Chemical Society, 106(26):8050–8056, 1984.
- Actor-critic algorithms. In Proceedings of the Advances in Neural Information Processing Systems, volume 12, 1999.
- Grammar variational autoencoder. In Proceedings of the International Conference on Machine Learning, pp. 1945–1954. PMLR, 2017.
- Landrum, G. Rdkit documentation. Release, 1(1-79):4, 2013.
- SpotGAN: A reverse-transformer gan generates scaffold-constrained molecules with property optimization. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 323–338. Springer, 2023.
- Transformer-based objective-reinforced generative adversarial network to generate desired molecules. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, pp. 3884–3890, 2022.
- Categorical normalizing flows via continuous transformations. In Proceedings of the International Conference on Learning Representations, 2021.
- GraphDF: A discrete flow model for molecular graph generation. In Proceedings of the International Conference on Machine Learning, pp. 7192–7203. PMLR, 2021.
- Molecular de-novo design through deep reinforcement learning. Journal of Cheminformatics, 9(1):1–14, 2017.
- OpenAI, R. GPT-4 technical report. arxiv 2303.08774. View in Article, 2023.
- Artificial intelligence in drug discovery and development. Drug Discovery Today, 26(1):80, 2021.
- A computer program for classifying plants. Science, 132(3434):1115–1118, 1960.
- GraphAF: a flow-based autoregressive model for molecular graph generation. ArXiv Preprint ArXiv:2001.09382, 2020.
- Reinforcement learning: An introduction. MIT press, 2018.
- Earlgan: An enhanced actor–critic reinforcement learning agent-driven gan for de novo drug design. Pattern Recognition Letters, 175:45–51, 2023.
- DiGress: Discrete denoising diffusion for graph generation. In Proceedings of the 11th International Conference on Learning Representations, 2023.
- Weininger, D. SMILES, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences, 28(1):31–36, 1988.
- Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229–256, 1992.
- SeqGAN: Sequence generative adversarial nets with policy gradient. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.
- MoFlow: an invertible flow model for generating molecular graphs. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 617–626, 2020.
- Iflow: Numerically invertible flows for efficient lossless compression via a uniform coder. In Proceedings of the Advances in Neural Information Processing Systems, volume 34, pp. 5822–5833, 2021.