Conditional molecular design with deep generative models (1805.00108v3)

Published 30 Apr 2018 in cs.LG and stat.ML

Abstract: Although machine learning has been successfully used to propose novel molecules that satisfy desired properties, it is still challenging to explore a large chemical space efficiently. In this paper, we present a conditional molecular design method that facilitates generating new molecules with desired properties. The proposed model, which simultaneously performs both property prediction and molecule generation, is built as a semi-supervised variational autoencoder trained on a set of existing molecules with only a partial annotation. We generate new molecules with desired properties by sampling from the generative distribution estimated by the model. We demonstrate the effectiveness of the proposed model by evaluating it on drug-like molecules. The model improves the performance of property prediction by exploiting unlabeled molecules, and efficiently generates novel molecules fulfilling various target conditions.

Authors (2)

Seokho Kang (2 papers)
Kyunghyun Cho (292 papers)

Citations (172)

View on Semantic Scholar

Summary

Conditional Molecular Design with Deep Generative Models: An Expert Overview

The paper, "Conditional Molecular Design with Deep Generative Models," elucidates an advanced approach to molecular design, currently challenged by the vast expanse of chemical space. The authors propose a method leveraging a conditional molecular design model utilizing a semi-supervised variational autoencoder (SSVAE), introducing innovations in both property prediction and molecule generation. This framework offers significant improvements over traditional methods, which have been predominantly manual and computationally intensive. The discussion here explores the paper's core contributions, numerical results, and the implications for future research within this domain.

Core Methodology

The proposed model combines the processes of property prediction and molecule generation into a unified framework using an SSVAE, trained on a dataset of existing molecules, with some possessing known properties. This facilitates the generation of new molecules aligned with designated properties, addressing inefficiencies in exploring chemical spaces. The methodology is grounded in adapting SSVAE to accommodate continuous output variables, enabling the model to utilize both labeled and unlabeled data effectively.

Experimental Results

Evaluations were conducted on a substantial dataset—310,000 drug-like molecules from the ZINC database. The property prediction performance was evident, with the SSVAE model outperforming baseline models, including GraphConv and ECFP. Notably, the SSVAE model demonstrated better mean absolute error (MAE) metrics, particularly when the fraction of labeled molecules was minimal, highlighting the model's utility in semi-supervised learning scenarios.

Further, the model's efficacy in conditional molecular design was reaffirmed. The generation of 3,000 novel molecules for varying target properties underscored the SSVAE's proficiency. The distribution metrics of generated molecules exhibited tight clustering around target values for molecular weight (MolWt), LogP, and QED, demonstrating the SSVAE's accuracy and the efficiency of conditional generation without supplementary optimization, unlike models requiring Bayesian optimization.

Implications and Future Directions

The methodological advances presented in this paper have pivotal implications for molecular design, particularly in the pharmaceutical industry. The SSVAE model allows for rapid and precise generation of candidate molecules, significantly reducing the typical timeframe and costs associated with drug discovery. An inherent advantage is the ability to exploit unlabeled data, making this approach highly practical given the often high cost and scarcity of labeled biochemical data.

From a theoretical standpoint, this paper opens avenues for further research into enhancing the representation of chemical structure beyond the SMILES encoding. Future work could explore graph-based neural architectures to potentially broaden the chemical space coverage, improving the generative capabilities of chemical models.

In summary, the paper provides a substantive contribution, facilitating more computationally efficient and accurate approaches to molecular design through deep generative models. This work not only strengthens the role of machine learning in chemical informatics but also sets a foundational framework upon which further optimization and broader applicability can be pursued.

PDF Markdown