Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MoFlow: An Invertible Flow Model for Generating Molecular Graphs (2006.10137v1)

Published 17 Jun 2020 in stat.ML, cs.LG, and physics.chem-ph

Abstract: Generating molecular graphs with desired chemical properties driven by deep graph generative models provides a very promising way to accelerate drug discovery process. Such graph generative models usually consist of two steps: learning latent representations and generation of molecular graphs. However, to generate novel and chemically-valid molecular graphs from latent representations is very challenging because of the chemical constraints and combinatorial complexity of molecular graphs. In this paper, we propose MoFlow, a flow-based graph generative model to learn invertible mappings between molecular graphs and their latent representations. To generate molecular graphs, our MoFlow first generates bonds (edges) through a Glow based model, then generates atoms (nodes) given bonds by a novel graph conditional flow, and finally assembles them into a chemically valid molecular graph with a posthoc validity correction. Our MoFlow has merits including exact and tractable likelihood training, efficient one-pass embedding and generation, chemical validity guarantees, 100\% reconstruction of training data, and good generalization ability. We validate our model by four tasks: molecular graph generation and reconstruction, visualization of the continuous latent space, property optimization, and constrained property optimization. Our MoFlow achieves state-of-the-art performance, which implies its potential efficiency and effectiveness to explore large chemical space for drug discovery.

An Overview of "MoFlow: An Invertible Flow Model for Generating Molecular Graphs"

The paper presents MoFlow, a novel flow-based deep generative model designed to generate molecular graphs efficiently. It introduces an innovative approach to solving the complex problem of constructing novel, chemically feasible molecular graphs from latent representations. This challenge is pertinent in accelerating the drug discovery process, where the exploration of large chemical spaces efficiently and effectively is becoming increasingly critical.

Technical Contributions

MoFlow integrates two key components to address the generation of molecular graphs: a Glow-based model for bond generation and a novel graph conditional flow for atom generation conditional on bonds. This decomposition utilizes a multi-type edge structure for bonds and leverages the relationship between atom structures to enforce chemical validity in the generated molecules.

  1. Invertible Mapping: The core of MoFlow's approach is leveraging invertible neural networks to establish a one-to-one mapping between molecular graphs and a continuous latent space. This mapping not only facilitates efficient sampling and exact likelihood estimation but also ensures the reconstruction of any given training data.
  2. Graph Conditional Flow: By introducing conditional flows, the model allows atoms to be generated based on the condition of existing bond structures. This innovation is essential for maintaining chemical validity—using a graph convolutional network that understands the dependencies and relational structure among atoms within a molecule.
  3. Validity Correction: MoFlow includes a post-hoc correction mechanism to ensure the chemical validity of generated molecules. This mechanism checks whether the valency constraints of atoms within the molecule are satisfied and corrects any deviations while preserving the larger molecular structure, which is crucial for ensuring that all communications of the model are chemically viable.

Numerical and Comparative Analysis

MoFlow was empirically evaluated on the QM9 and ZINC250K datasets, considered benchmarks in the molecular generation sphere. The model achieved several noteworthy results:

  • 100% Reconstruction: MoFlow achieves perfect reconstruction of training data due to its invertible mapping, a significant advantage over VAE-based models which typically suffer from reconstruction errors due to their stochastic nature.
  • Chemical Validity: The model substantially outperforms existing methods such as GraphNVP and GRF in terms of generating chemically valid molecular graphs, achieving a 100% validity rate.
  • Exploration of Chemical Space: With the ability to generate more novel, unique, and valid molecules, MoFlow shows promise for deeper exploration in the vast 106010^{60} chemical space.

Implications and Future Directions

The development of MoFlow implies essential advancements in the field of molecular graph generation, specifically in accelerating de novo drug design. The flow-based architecture allows for more precise and explainable transformations between molecular structures and latent representations. The model's strengths in maintaining chemical validity and ensuring efficient generation through exact likelihood estimation paves the way for applications beyond drug discovery, potentially encompassing materials design and chemical property optimization.

Future directions for this line of research may involve extending MoFlow to incorporate more complex chemical properties directly into the flow framework or scaling the model to handle even larger and more diverse datasets. Moreover, the integration of MoFlow with existing cheminformatics tools could amplify its utility in industrial settings, where rapid prototyping of molecular candidates is beneficial. Furthermore, research could explore the fusion of MoFlow’s one-shot generation approach with sequential generative frameworks to further improve the efficiency and validity of generated molecules.

In summary, MoFlow stands as a significant contribution to the domain of molecular graph generation, providing a robust framework for generating valid and novel molecular structures with implications for accelerating drug discovery and beyond.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Chengxi Zang (8 papers)
  2. Fei Wang (574 papers)
Citations (255)
Youtube Logo Streamline Icon: https://streamlinehq.com