Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation (1806.02473v3)

Published 7 Jun 2018 in cs.LG, cs.AI, and stat.ML

Abstract: Generating novel graph structures that optimize given objectives while obeying some given underlying rules is fundamental for chemistry, biology and social science research. This is especially important in the task of molecular graph generation, whose goal is to discover novel molecules with desired properties such as drug-likeness and synthetic accessibility, while obeying physical laws such as chemical valency. However, designing models to find molecules that optimize desired properties while incorporating highly complex and non-differentiable rules remains to be a challenging task. Here we propose Graph Convolutional Policy Network (GCPN), a general graph convolutional network based model for goal-directed graph generation through reinforcement learning. The model is trained to optimize domain-specific rewards and adversarial loss through policy gradient, and acts in an environment that incorporates domain-specific rules. Experimental results show that GCPN can achieve 61% improvement on chemical property optimization over state-of-the-art baselines while resembling known molecules, and achieve 184% improvement on the constrained property optimization task.

Authors (5)

Jiaxuan You (51 papers)
Bowen Liu (63 papers)
Rex Ying (90 papers)
Vijay Pande (13 papers)
Jure Leskovec (233 papers)

Citations (843)

View on Semantic Scholar

Summary

The paper introduces a GCPN that integrates reinforcement learning and graph convolutional networks to optimize molecular graph generation while ensuring chemical validity.
It employs advanced validation, valency, steric strain, and reactive group filters to produce realistic molecular structures.
The approach significantly advances drug discovery and material science by enabling the targeted design of molecules with desired properties.

Overview of "Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation"

The paper "Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation" by Jiaxuan You, Bowen Liu, Rex Ying, Vijay Pande, and Jure Leskovec presents an innovative approach to the generation of molecular graphs with specific desired properties using a Graph Convolutional Policy Network (GCPN). This methodology addresses key challenges in molecular design, leveraging deep learning techniques to optimize the molecular generation process in a goal-directed manner. The main contributions and findings of the paper can be summarized as follows:

Methodology

The proposed GCPN integrates the benefits of reinforcement learning (RL) and graph convolutional networks (GCNs) to produce molecular graphs. The process focuses on several critical components:

Validation: Molecules generated must pass the RDKit sanitization checks to be considered valid.
Valency Checks: Ensures atoms within the partially completed molecular graphs do not exceed their maximum possible valency.
Steric Strain Filter: Incorporates an MMFF94 forcefield minimization to penalize molecules with high steric strain, specifically those with an average angle bend energy exceeding 0.82 kcal/mol.
Reactive Functional Group Filter: Penalizes molecules containing known problematic reactive functional groups using rules from the ZINC dataset as implemented in RDKit.

Reward Design

The reward function in the GCPN is critical for guiding the optimization process:

Property Optimization: Linear functions map the minimum and maximum property scores of the ZINC dataset to the desired reward range.
Property Targeting: Maps the absolute difference between the target and actual property scores to a reward range, ensuring rewards do not exceed the predefined thresholds.

Detailed parameter settings and the implementation of the reward design are available in the open-sourced code, which facilitates reproducibility and further research.

Results and Implications

The proposed GCPN framework is demonstrated to efficiently generate valid and realistic molecular graphs while adhering to the predefined chemical constraints. The key implications of this research include:

Practical Applications: This approach can significantly advance the design of novel molecules in drug discovery and material science by systematically generating compounds with targeted properties.
Theoretical Contributions: The integration of GCNs with RL in the context of molecular graph generation provides a new perspective for combining machine learning paradigms, showcasing their potential in cheminformatics and computational chemistry.

Future Directions

Further research could focus on extending the model's capabilities to handle more complex molecular properties and exploring hybrid architectures that combine GCPN with other deep learning models. Additionally, investigating the generalization of this approach to other domains requiring graph-based representations, like social network analysis or combinatorial optimization problems, is a promising direction.

In conclusion, the paper offers a substantial contribution to goal-directed molecular generation, providing a solid foundation for future advancements in both the theoretical and practical aspects of this burgeoning field.

PDF Markdown