GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation (2001.09382v2)

Published 26 Jan 2020 in cs.LG and stat.ML

Abstract: Molecular graph generation is a fundamental problem for drug discovery and has been attracting growing attention. The problem is challenging since it requires not only generating chemically valid molecular structures but also optimizing their chemical properties in the meantime. Inspired by the recent progress in deep generative models, in this paper we propose a flow-based autoregressive model for graph generation called GraphAF. GraphAF combines the advantages of both autoregressive and flow-based approaches and enjoys: (1) high model flexibility for data density estimation; (2) efficient parallel computation for training; (3) an iterative sampling process, which allows leveraging chemical domain knowledge for valency checking. Experimental results show that GraphAF is able to generate 68% chemically valid molecules even without chemical knowledge rules and 100% valid molecules with chemical rules. The training process of GraphAF is two times faster than the existing state-of-the-art approach GCPN. After fine-tuning the model for goal-directed property optimization with reinforcement learning, GraphAF achieves state-of-the-art performance on both chemical property optimization and constrained property optimization.

Authors (6)

Chence Shi (16 papers)
Minkai Xu (40 papers)
Zhaocheng Zhu (22 papers)
Weinan Zhang (322 papers)
Ming Zhang (313 papers)
Jian Tang (327 papers)

Citations (396)

View on Semantic Scholar

Summary

An Overview of GraphAF: A Flow-based Autoregressive Model for Molecular Graph Generation

Introduction and Background

Molecular graph generation is a fundamental aspect of computational drug discovery, enabling the design of novel molecules with specific desired properties. The challenge lies in generating chemically valid molecular structures while simultaneously optimizing their chemical properties. This paper introduces a model named GraphAF, positioned within the sphere of deep generative models, specifically targeting the generation of molecular graphs.

Deep generative models have seen significant advances via approaches like Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and autoregressive models. Normalizing flows are particularly noteworthy, as they create invertible transformations between a latent base distribution and data, enabling exact likelihood calculations. This paper presents a new synthesis of these methods in the form of GraphAF, an autoregressive flow-based model that claims to bring enhancement in the domain of molecular graph generation.

GraphAF Methodology

GraphAF combines the strengths of both autoregressive models and normalizing flows. The autoregressive nature allows the model to dynamically add nodes and edges to form graphs by treating the process as a sequential decision-making problem. This permits the incorporation of domain-specific knowledge, such as valency checks during molecule generation, ensuring chemical validity. Meanwhile, the flow-based aspect of GraphAF facilitates efficient parallel computation and exact likelihood estimation, resulting in a model that is both flexible and efficient.

The model is trained using a density estimation approach, leveraging the availability of a large dataset of molecular graphs to learn the distribution of existing molecules. GraphAF employs a feedforward network to handle node and edge generation, enabling efficient parallel processing, a significant advantage over prior autoregressive models which could be slower due to their inherently sequential nature.

Experimental Results

The authors evaluated GraphAF on the standard ZINC dataset. The results indicated that the training process of the GraphAF model is two times faster than the previous state-of-the-art model, GCPN. Impressively, GraphAF could generate 100% chemically valid molecules when chemical rules were applied during generation, and even without such rules, it achieved a validity rate of 68%.

When it comes to optimizing chemical properties such as logP and QED, GraphAF, fine-tuned with reinforcement learning, surpassed existing benchmarks. This demonstrates its capability not only to generate valid molecules but also to enhance certain chemical characteristics, a key feature for practical application in drug discovery.

Implications and Future Work

The introduction of GraphAF provides a robust framework for molecular graph generation, highlighting the potent combination of autoregressive flows and normalizing flows. The promising results demonstrate the method's validity in efficiently generating chemically accurate and optimized molecular structures.

The implications of such research are significant for both theoretical exploration and practical applications, particularly in drug discovery and materials science. It opens avenues for synthesizing a vast chemical space far more efficiently than previous models.

Future research could extend the capabilities of GraphAF by training on larger and more diverse datasets. Moreover, adaptations could be explored for the generation of other graph-structured data types, like social networks or biochemical interactions, showcasing the broad versatility of the model.

In conclusion, the paper presents GraphAF as an innovative and effective contribution to the field of computational chemistry and graph generation methodologies, with demonstrated efficient training, high validity in molecule generation, and successful property optimization.

PDF Markdown