An End-to-End Framework for Molecular Conformation Generation via Bilevel Programming (2105.07246v2)

Published 15 May 2021 in cs.LG and q-bio.BM

Abstract: Predicting molecular conformations (or 3D structures) from molecular graphs is a fundamental problem in many applications. Most existing approaches are usually divided into two steps by first predicting the distances between atoms and then generating a 3D structure through optimizing a distance geometry problem. However, the distances predicted with such two-stage approaches may not be able to consistently preserve the geometry of local atomic neighborhoods, making the generated structures unsatisfying. In this paper, we propose an end-to-end solution for molecular conformation prediction called ConfVAE based on the conditional variational autoencoder framework. Specifically, the molecular graph is first encoded in a latent space, and then the 3D structures are generated by solving a principled bilevel optimization program. Extensive experiments on several benchmark data sets prove the effectiveness of our proposed approach over existing state-of-the-art approaches. Code is available at \url{https://github.com/MinkaiXu/ConfVAE-ICML21}.

Authors (7)

Minkai Xu (40 papers)
Wujie Wang (7 papers)
Shitong Luo (17 papers)
Chence Shi (16 papers)
Yoshua Bengio (601 papers)
Rafael Gomez-Bombarelli (50 papers)
Jian Tang (327 papers)

Citations (76)

View on Semantic Scholar

Summary

The paper introduces ConfVAE, an integrated framework that uses bilevel programming to directly predict 3D molecular structures from graphs.
It employs a conditional variational autoencoder with neural normalizing flows to maintain rotational and translational invariance.
Empirical benchmarks on GEOM-QM9 and GEOM-Drugs demonstrate improved coverage and matching scores compared to traditional two-step methods.

An End-to-End Framework for Molecular Conformation Generation via Bilevel Programming

The research paper proposes a novel approach to molecular conformation generation, using an end-to-end framework dubbed ConfVAE, based on bilevel programming. This framework is designed to predict molecular conformations directly from molecular graphs, addressing limitations in prior approaches which separate distance prediction and conformation generation into distinct stages.

Problem Context

Predicting molecular conformations from graphs is vital for applications in computational chemistry and drug design, yet traditionally this process requires costly experimental determination. Prior methods typically employ a two-step process: first predicting atomic distances and then generating the 3D molecular structure by solving a distance geometry problem. However, these approaches often fail to preserve the geometric fidelity of local atomic neighborhoods, resulting in unsatisfactory molecular structures.

Methodology

ConfVAE integrates both stages into one cohesive framework utilizing the conditional variational autoencoder (CVAE) schema. The process begins with encoding the molecular graph into a latent space, followed by the generation of 3D structures through a bilevel optimization problem:

Inner Loop: It solves the distance geometry problem using predicted pairwise atomic distances to derive Cartesian coordinates.
Outer Loop: Aims to model the distribution of conformations directly from the molecular graph by optimizing for rotational and translational invariance, akin to the method adopted by AlphaFold2 in protein structure prediction.

Technical Framework

The CVAE model employed uses three key components:

Encoder: Generates latent embeddings from the molecular graph.
Decoder: Utilizes neural normalizing flows (CNFs) to predict pairwise atomic distances, maintaining invariance properties.
Optimization: Implements bilevel programming via gradient descent within the inner loop to minimize the distance error, and hypergradient descent for the outer loop training objective.

Through iterative training, the model updates the structure prediction by optimizing the calculated likelihood of conformations using reverse-mode automatic differentiation for efficient computation of hypergradients.

Results

Empirical evaluation using benchmarks such as GEOM-QM9 and GEOM-Drugs datasets demonstrates ConfVAE's superiority over existing models, achieving higher coverage (COV) and better matching (MAT) scores in diverse conformation generation tasks. The method also excels in modeling underlying distance distributions, verified using maximum mean discrepancy (MMD) metrics, showcasing improved estimation of atomic interactions compared to baselines like RDKit.

Implications and Future Directions

ConfVAE presents a robust framework that directly tackles limitations of prior molecule modeling approaches by ensuring generation consistency through an integrated optimization approach. This advancement offers significant potential in computational chemistry, expediting the discovery of new drug candidates and molecular designs. Future work may explore extensions of this framework to accommodate larger and more intricate molecular structures such as proteins or multi-molecular systems, enhancing applicability in bioinformatics and materials science.

PDF Markdown

Related Papers

GitHub

GitHub - MinkaiXu/ConfVAE-ICML21: An End-to-End Framework for Molecular Conformation Generation via Bilevel Programming (ICML'21) (51 stars)