GraphNVP: An Invertible Flow Model for Generating Molecular Graphs (1905.11600v1)

Published 28 May 2019 in stat.ML, cs.AI, and cs.LG

Abstract: We propose GraphNVP, the first invertible, normalizing flow-based molecular graph generation model. We decompose the generation of a graph into two steps: generation of (i) an adjacency tensor and (ii) node attributes. This decomposition yields the exact likelihood maximization on graph-structured data, combined with two novel reversible flows. We empirically demonstrate that our model efficiently generates valid molecular graphs with almost no duplicated molecules. In addition, we observe that the learned latent space can be used to generate molecules with desired chemical properties.

Authors (4)

Kaushalya Madhawa (3 papers)
Katushiko Ishiguro (1 paper)
Kosuke Nakago (7 papers)
Motoki Abe (4 papers)

Citations (173)

View on Semantic Scholar

Summary

The paper presents GraphNVP, an invertible flow model that uses a novel two-step process to generate molecular graphs with precise likelihood maximization.
It leverages reversible coupling layers to transform both adjacency tensors and node attributes into latent space, ensuring perfect reconstruction.
Experiments on QM9 and ZINC-250k demonstrate near 100% uniqueness, highlighting its promise for efficient lead optimization in drug discovery.

Overview of GraphNVP: An Invertible Flow Model for Generating Molecular Graphs

This paper presents GraphNVP, an invertible flow-based model for generating molecular graphs. It is the first of its kind to use a normalizing flow framework to generate molecular structures, providing a novel approach in computational drug discovery. By leveraging the invertible nature of normalizing flows, GraphNVP efficiently maximizes the likelihood of graph-structured data, offering advantages over traditional VAE and GAN-based models.

Key Contributions

GraphNVP introduces a two-step generation process, dividing the task into creating an adjacency tensor and node attributes. This decomposition allows for precise likelihood maximization, a feature that enhances the generation of valid and unique molecules. The model implements two reversible flows specifically designed for handling the discrete and sparse nature of molecular graphs.

Methodology

The model is structured through invertible transformations, popularly known as coupling layers in normalizing flow architectures. GraphNVP incorporates two specific types:

Adjacency Coupling Layers: Transform the adjacency tensor into a latent space.
Node Feature Coupling Layers: Map the feature matrix (node attributes) to a latent space while considering adjacency information.

These transformations ensure perfect reconstruction, a significant improvement over the stochastic decoding of VAE models and the sample generation inflexibility of GAN models.

Results and Insights

The experiments conducted on the QM9 and ZINC-250k datasets demonstrate GraphNVP's capability to yield high-quality molecular graphs with notable uniqueness and validity. The model achieves almost 100% uniqueness, addressing a common challenge in generative models involved in molecular graph synthesis.

Moreover, GraphNVP's latent space is shown to be smooth, facilitating molecule optimization tasks. By adjusting latent vectors alongside desired property directions (e.g., improving a molecule's quantitative estimate of drug-likeness), the model supports exploration within the chemical space, offering practical advantages for lead optimization in drug discovery.

Implications and Future Work

This research advances the application of invertible flow models in graph-structured domains beyond image processing. The success of GraphNVP underscores the potential of adopting flow-based approaches for preserving structural integrity in generated outputs.

Future developments may focus on enhancing the permutation-invariance of the model to handle the inherent variability of graph representations. Exploring broader applications within graph-structured data and further refining the latent space for targeted molecular optimization are potential avenues for future research.

GraphNVP's innovative use of invertible flows marks a promising direction for molecular graph generation, highlighting the utility of precise likelihood maximization in handling complex, structured data. As research progresses, such methodologies could significantly impact computational chemistry and related fields.

PDF Markdown

Related Papers

GitHub

GitHub - chainer/chainer-chemistry: Chainer Chemistry: A Library for Deep Learning in Biology and Chemistry (671 stars)