NeVAE: A Deep Generative Model for Molecular Graphs (1802.05283v4)

Published 14 Feb 2018 in cs.LG, physics.soc-ph, and stat.ML

Abstract: Deep generative models have been praised for their ability to learn smooth latent representation of images, text, and audio, which can then be used to generate new, plausible data. However, current generative models are unable to work with molecular graphs due to their unique characteristics-their underlying structure is not Euclidean or grid-like, they remain isomorphic under permutation of the nodes labels, and they come with a different number of nodes and edges. In this paper, we first propose a novel variational autoencoder for molecular graphs, whose encoder and decoder are specially designed to account for the above properties by means of several technical innovations. Moreover, in contrast with the state of the art, our decoder is able to provide the spatial coordinates of the atoms of the molecules it generates. Then, we develop a gradient-based algorithm to optimize the decoder of our model so that it learns to generate molecules that maximize the value of certain property of interest and, given a molecule of interest, it is able to optimize the spatial configuration of its atoms for greater stability. Experiments reveal that our variational autoencoder can discover plausible, diverse and novel molecules more effectively than several state of the art models. Moreover, for several properties of interest, our optimized decoder is able to identify molecules with property values 121% higher than those identified by several state of the art methods based on Bayesian optimization and reinforcement learning

PDF Abstract

NeVAE: A Deep Generative Model for Molecular Graphs

The paper introduces NeVAE, a novel variational autoencoder (VAE) specifically designed for generating molecular graphs. This model presents innovative solutions to several challenges inherent in molecular graph generation, such as permutation invariance of node labels, varying numbers of atoms and bonds, and the provision of spatial coordinates for generated molecules. This research contributes to the field of molecular design within computational chemistry, offering a promising tool for drug discovery and material design.

Architectures and Innovations

NeVAE sets itself apart by addressing key shortcomings of existing generative models for molecular graphs. The paper identifies six primary limitations in current methods: fixed atom counts, non-invariance to node permutation, quadratic training complexity, constrained diversity due to molecular graphlets, lack of spatial coordinates, and suboptimal property maximization strategies. NeVAE tackles these through several innovations:

Probabilistic Encoder: The encoder aggregates information from variable hops to capture local node features, using symmetric aggregation to ensure permutation invariance. It maps diverse graph sizes into a continuous latent space efficiently.
Probabilistic Decoder: Contrary to prior models using Bernoulli edge processes, NeVAE employs a multinomial distribution for edge generation, which significantly reduces complexity to O(l) relative to graph size. Additionally, the decoder outputs spatial coordinates for atoms by modeling their positions with Gaussian distributions, integrating chemical bonds' influences.
Masking Strategies: The decoder optionally employs masks to enforce local structural properties in generated graphs, such as valid atomic valences. This aspect, while common in text-based molecule generation, is underutilized in molecular graph models.
Gradient-Based Property Optimization: Apart from generating realistic molecular graphs, NeVAE includes a mechanism to optimize them for specific properties (e.g., solubility) using a gradient-based approach, outperforming Bayesian optimization and reinforcement learning methods.

Experimental Results

The experiments utilize two datasets—ZINC and QM9—to validate NeVAE against several baselines, including GraphVAE, GrammarVAE, and JTVAE. Results demonstrate NeVAE's superior performance across validity, novelty, and uniqueness metrics, showcasing its ability to generate chemically plausible and diverse molecular structures. Furthermore, NeVAE's continuous latent space provides semantic interpolation capabilities, illustrating smooth transitions between molecular structures.

The property-oriented decoder further exemplifies NeVAE's utility in property optimization. Experiments reveal significant improvements in generating molecules with higher penalized logP and QED scores, achieving up to 121% greater efficacy than state-of-the-art practices.

Implications and Future Work

NeVAE illustrates significant advancements in the generation of molecular graphs, with immediate applications in drug discovery and synthetic chemistry. The model's ability to generate three-dimensional molecular structures with desired properties holds promise for accelerating the design of new compounds.

Future work could involve enhancing the VAE framework to encompass dynamic graphs or adapting the methodology to other domains, such as network biology or materials science. Additional open questions remain around better integrating expert chemical knowledge into the generative process and improving scalability to larger molecular spaces.

Overall, NeVAE represents a substantial contribution to molecular graph generation, offering a flexible and robust tool for researchers exploring the vast chemical space in search of novel compounds with targeted properties.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Bidisha Samanta (14 papers)
Abir De (36 papers)
Gourhari Jana (2 papers)
Pratim Kumar Chattaraj (5 papers)
Niloy Ganguly (95 papers)
Manuel Gomez-Rodriguez (40 papers)

Citations (204)

View on Semantic Scholar

NeVAE: A Deep Generative Model for Molecular Graphs (1802.05283v4)