MolGAN: An implicit generative model for small molecular graphs (1805.11973v2)

Published 30 May 2018 in stat.ML and cs.LG

Abstract: Deep generative models for graph-structured data offer a new angle on the problem of chemical synthesis: by optimizing differentiable models that directly generate molecular graphs, it is possible to side-step expensive search procedures in the discrete and vast space of chemical structures. We introduce MolGAN, an implicit, likelihood-free generative model for small molecular graphs that circumvents the need for expensive graph matching procedures or node ordering heuristics of previous likelihood-based methods. Our method adapts generative adversarial networks (GANs) to operate directly on graph-structured data. We combine our approach with a reinforcement learning objective to encourage the generation of molecules with specific desired chemical properties. In experiments on the QM9 chemical database, we demonstrate that our model is capable of generating close to 100% valid compounds. MolGAN compares favorably both to recent proposals that use string-based (SMILES) representations of molecules and to a likelihood-based method that directly generates graphs, albeit being susceptible to mode collapse. Code at https://github.com/nicola-decao/MolGAN

Authors (2)

Nicola De Cao (21 papers)
Thomas Kipf (43 papers)

Citations (849)

View on Semantic Scholar

Summary

MolGAN: An Implicit Generative Model for Small Molecular Graphs

The paper "MolGAN: An Implicit Generative Model for Small Molecular Graphs" by Nicola De Cao and Thomas Kipf presents an innovative application of deep generative models tailored for the generation of molecular graphs. This work targets the challenging problem of chemical synthesis, specifically focusing on generating small molecules with desired properties without relying on traditional string-based or likelihood-based methods.

Methodology

MolGAN builds on the foundations of Generative Adversarial Networks (GANs) and adopts a likelihood-free approach that operates directly on graph-structured data. This differentiates MolGAN from previous attempts that required ordered representations or expensive matching procedures to evaluate the generated molecular structures. MolGAN's architecture consists of three main components: a generator, a discriminator, and a reward network.

Generator: The generator produces molecular graphs by sampling from a latent prior distribution. It predicts an entire graph at once, which includes node features (representing atom types) and adjacency matrices (representing bond types). This model is realized using a multi-layer perceptron (MLP), avoiding the complications introduced by sequential graph generation.
Discriminator: The discriminator evaluates the authenticity of generated molecular graphs, distinguishing between real samples from the training dataset and synthetic graphs produced by the generator. It leverages relational graph convolutional layers to ensure permutation invariance.
Reward Network: This network is utilized in a reinforcement learning setup to optimize the generation process based on desired chemical properties. It learns to predict rewards derived from external evaluations, guiding the generator to produce molecules that meet specific criteria, such as solubility or synthesizability.

Key Findings

The authors conducted extensive experiments using the QM9 chemical database to evaluate MolGAN’s performance against several baselines, including SMILES-based methods and likelihood-based graph generation models. The results indicate several critical findings:

Validity: MolGAN achieves nearly 100% valid molecular graph generation, a significant improvement compared to other methods. This high validity is maintained even when the model incorporates the reinforcement learning component to optimize specific chemical properties.
Efficiency: MolGAN outperforms existing methods in terms of training time. When compared to ORGAN, a recent sequential GAN model operating on SMILES representations, MolGAN is approximately five times faster during training while producing higher chemical property scores.
Versatility: The model demonstrates adaptability to different optimization objectives, achieving higher scores for various chemical properties such as druglikeness (QED), synthesizability, and solubility. This flexibility is primarily due to the reinforcement learning objective integrated into the model.
Mode Collapse: Although MolGAN shows susceptibility to mode collapse, where the model generates a limited variety of outputs, employing early stopping and careful training procedures mitigated some of these effects. The unique score, used to evaluate diversity in generated molecules, remained around an acceptable threshold, indicating the model's tendency to occasionally converge to specific modes.

Implications and Future Work

MolGAN presents a promising framework for molecular graph generation, providing substantial improvements over existing methodologies in terms of both efficacy and computational efficiency. The implications are profound for fields such as drug discovery and material science, where the ability to generate novel, valid molecular structures with desired properties can substantially accelerate innovation and reduce costs.

Future work could focus on several areas to enhance MolGAN’s capabilities:

Addressing Mode Collapse: Developing more robust techniques, perhaps through the design of novel reward functions or advanced regularization strategies, can help mitigate mode collapse, ensuring consistent diversity in the generated outputs.
Scalability: Extending the model to handle larger graphs could involve integrating recurrent graph-based generative models or exploring hierarchical approaches that allow for the generation of complex molecular structures without a pre-defined maximum size.
Benchmarking and Validation: Further benchmarking on other chemical and molecular datasets and validation in real-world experimental settings would solidify MolGAN's applicability and generalize its use across different domains of synthetic chemistry.

In summary, "MolGAN: An Implicit Generative Model for Small Molecular Graphs" provides a significant contribution to the field of generative models for chemical synthesis, showcasing the potential of GANs in generating valid and optimized molecular structures while circumventing the limitations of previous methods.

PDF Markdown

MolGAN: An implicit generative model for small molecular graphs (1805.11973v2)

Summary

MolGAN: An Implicit Generative Model for Small Molecular Graphs

Methodology

Key Findings

Implications and Future Work

Related Papers

GitHub

YouTube