Deep learning for molecular design - a review of the state of the art (1903.04388v3)

Published 11 Mar 2019 in cs.LG, physics.chem-ph, and stat.ML

Abstract: In the space of only a few years, deep generative modeling has revolutionized how we think of artificial creativity, yielding autonomous systems which produce original images, music, and text. Inspired by these successes, researchers are now applying deep generative modeling techniques to the generation and optimization of molecules - in our review we found 45 papers on the subject published in the past two years. These works point to a future where such systems will be used to generate lead molecules, greatly reducing resources spent downstream synthesizing and characterizing bad leads in the lab. In this review we survey the increasingly complex landscape of models and representation schemes that have been proposed. The four classes of techniques we describe are recursive neural networks, autoencoders, generative adversarial networks, and reinforcement learning. After first discussing some of the mathematical fundamentals of each technique, we draw high level connections and comparisons with other techniques and expose the pros and cons of each. Several important high level themes emerge as a result of this work, including the shift away from the SMILES string representation of molecules towards more sophisticated representations such as graph grammars and 3D representations, the importance of reward function design, the need for better standards for benchmarking and testing, and the benefits of adversarial training and reinforcement learning over maximum likelihood based training.

Authors (4)

Daniel C. Elton (22 papers)
Zois Boukouvalas (13 papers)
Mark D. Fuge (6 papers)
Peter W. Chung (9 papers)

Citations (321)

View on Semantic Scholar

Summary

Deep Learning for Molecular Design: A State-of-the-Art Review

In recent years, the field of deep generative modeling has seen significant advancements, with transformative impacts on creativity in image, music, and text generation. This momentum has spurred interest in applying deep learning methods to molecular design, particularly in generating and optimizing molecular structures for drug discovery and other applications. This review article comprehensively surveys these advancements, highlighting key methodologies, representation schemes, and unresolved challenges in the field.

Overview of Techniques and Architectures

The paper categorizes current approaches into four major classes: recursive neural networks (RNNs), autoencoders (AEs), generative adversarial networks (GANs), and reinforcement learning (RL). Each of these techniques contributes uniquely to the domain of molecular generation and optimization.

Recursive Neural Networks (RNNs): RNNs are primarily used for sequence generation, harnessing long short-term memory (LSTM) units to overcome issues of exploding and vanishing gradients. They have shown remarkable capabilities in generating valid molecular structures expressed as SMILES strings.
Autoencoders (AEs): AEs, particularly variational autoencoders (VAEs), offer a promising direction by encoding molecular representations into continuous latent spaces, which can then be decoded back into molecular structures. This provides the dual benefits of generating novel compounds and predicting their properties through learned latent spaces.
Generative Adversarial Networks (GANs): GANs introduce a competitive dynamic between generator and discriminator networks, thus potentially addressing the problem of poor quality distribution in generated samples found in maximum likelihood estimation (MLE)-based models. Emerging architectures, such as Wasserstein GANs, show improved stability and convergence properties.
Reinforcement Learning (RL): RL techniques enhance generative models by optimizing molecules towards desirable properties through reward mechanisms. This approach aligns well with combinatorial chemical design, offering a framework to incrementally improve candidate molecules based on specific objectives like synthesizability and bioactivity.

Challenges and Methodological Advances

Several high-level themes have surfaced in the research landscape:

Representation Learning: There is a marked shift from using traditional SMILES strings to more chemically intuitive representations like graph-based methods and 3D molecular descriptors. This transition aims to better capture molecular structure and enhance the generation of valid molecular forms.
Reward Function Design: The design of effective reward functions in RL is critical for guiding molecular optimization. Functions that balance diversity, stability, and synthesizability of generated molecules are under continuous development.
Standards and Benchmarking: There is an acknowledged need for standardized benchmarks to evaluate model performance quantitatively. Initiatives like MOSES and GuacaMol are aiming to provide comprehensive frameworks for assessing diversity, novelty, and drug-likeness of generated molecules.

Implications and Future Directions

The application of deep learning to molecular design holds profound implications for various domains such as pharmaceuticals, materials science, and chemical engineering. The reduction in time and cost associated with synthesizing and testing drug candidates is particularly noteworthy. However, the efficacy of deep generative models vis-à-vis traditional combinatorial and genetic algorithms remains a subject for further exploration.

Looking forward, future research will likely focus on:

Improved Representations: Development of more chemically accurate 3D representations and the incorporation of quantum mechanical properties will enhance the utility of generative models.
Hybrid Models: Combining different AI methodologies, such as integrating RL with unsupervised learning models, will yield robust frameworks for molecular discovery.
Integration with Automated Synthesis: The looping of generative design with automated synthesis and screening systems promises significant advancements in the realization of "self-driving" labs for autonomous molecular discovery.

Overall, the integration of deep learning into molecular design is set to reshape how molecules are conceived and optimized, heralding a new era of innovation in chemical sciences and beyond.

PDF Markdown