DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking (2210.01776v2)

Published 4 Oct 2022 in q-bio.BM, cs.LG, and physics.bio-ph

Abstract: Predicting the binding structure of a small molecule ligand to a protein -- a task known as molecular docking -- is critical to drug design. Recent deep learning methods that treat docking as a regression problem have decreased runtime compared to traditional search-based methods but have yet to offer substantial improvements in accuracy. We instead frame molecular docking as a generative modeling problem and develop DiffDock, a diffusion generative model over the non-Euclidean manifold of ligand poses. To do so, we map this manifold to the product space of the degrees of freedom (translational, rotational, and torsional) involved in docking and develop an efficient diffusion process on this space. Empirically, DiffDock obtains a 38% top-1 success rate (RMSD<2A) on PDBBind, significantly outperforming the previous state-of-the-art of traditional docking (23%) and deep learning (20%) methods. Moreover, while previous methods are not able to dock on computationally folded structures (maximum accuracy 10.4%), DiffDock maintains significantly higher precision (21.7%). Finally, DiffDock has fast inference times and provides confidence estimates with high selective accuracy.

Citations (332)

View on Semantic Scholar

Summary

The paper introduces DiffDock, a diffusion generative model that redefines molecular docking by modeling the distribution of ligand poses.
It employs a reverse diffusion process to iteratively refine ligand translations, rotations, and torsion angles, improving speed and prediction accuracy.
Empirical tests on PDBBind demonstrate a top-1 RMSD <2Å success rate of 38.2%, outperforming traditional methods and current state-of-the-art.

Overview of DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking

The paper presents DiffDock, a novel approach to molecular docking framed as a generative modeling problem. Unlike traditional docking methods that rely on scoring functions and optimization algorithms to predict ligand poses within protein structures, DiffDock leverages diffusion generative models (DGMs) to learn a distribution over ligand poses, thereby improving both the speed and accuracy of predictions.

Motivation and Background

Molecular docking is a critical task in computational drug design, involving the prediction of a ligand's position, orientation, and conformation when bound to a target protein. Classical docking methods, such as AutoDock and Glide, use search-based algorithms to explore possible ligand poses, often leading to suboptimal solutions due to the complexity of the search space. Recent advancements using deep learning approaches treat docking as a regression problem. However, these models, while faster, have not significantly improved accuracy, partly because they do not align well with the practice of molecular docking where predictions are evaluated based on RMSD thresholds.

The authors argue that viewing docking as a generative problem better aligns with its objective, allowing for the modeling of distribution over possible poses, which can better accommodate uncertainties in predictions.

Methodology

The core of the paper is the development of DiffDock, a diffusion generative model specifically tailored for the docking task. The model defines a diffusion process over ligand poses characterized by three degrees of freedom: translation, rotation, and torsion angles. DiffDock samples ligand poses by executing a reverse diffusion process, progressively refining poses from an uninformed prior distribution to a distribution learned from training data. This approach can be seen as iteratively updating ligand translations, rotations, and torsion angles to fit the protein structure.

The model's backbone is a transformation mapping between poses and transformations within a manifold, allowing ligand poses consistent with a seed conformation to be represented in a submanifold of the full Euclidean space. This bijection facilitates the diffusion process over ligand poses while accounting for the chemical constraints inherent in feasible molecular conformations.

DiffDock's inference involves sampling multiple ligand poses and ranking them using a confidence model that estimates the likelihood of each pose being within the acceptable error range of the true structure. This approach balances the exhaustive search of traditional methods with the expediency of one-shot predictions.

Empirical Results

Empirically, DiffDock vastly outperforms both traditional methods and state-of-the-art machine learning models in terms of docking accuracy. On the PDBBind benchmark, DiffDock achieves a top-1 RMSD $<$ 2Å success rate of 38.2%, surpassing previous models that stood at 20.4% (TANKBind) and 23% (state-of-the-art search-based methods). Notably, DiffDock's results are not only more accurate but also obtained with substantially reduced computational costs, being 3 to 12 times faster on a GPU.

Moreover, DiffDock's generalization capabilities extend to docking with computationally predicted apo-structures from ESMFold, placing it as a robust model in scenarios where traditional approaches fail due to reliance on crystallographic holo-structures.

Implications and Future Directions

The introduction of DiffDock suggests a shift in the paradigm for molecular docking towards generative models that better capture the logical structure of the problem at hand. By focusing on the distribution of ligand poses, DiffDock provides more informed predictions and confidence measures, valuable in high-throughput drug discovery pipelines.

Future research could explore extending DiffDock to related domain problems such as protein-protein and protein-nucleic acid interactions. Integrating downstream analyses, like affinity prediction, into the generative framework may also yield more comprehensive solutions to molecular interaction modeling. Additionally, the scalability and efficiency of DGMs open opportunities for their application in broader molecular and materials science tasks.

In summary, DiffDock represents a significant advancement in molecular docking by aligning the model framework with the task's fundamental objectives, providing scalable, accurate, and efficient solutions to a longstanding challenge in computational chemistry.

PDF Markdown

Related Papers

Tweets

https://twitter.com/AllThingsApx/status/1821547885697040760

https://twitter.com/RussellCox_Chem/status/1763297532547940759

https://twitter.com/AzamHussai70792/status/1777290949334335997

https://twitter.com/DaleYuzuki/status/1776737450595090793

YouTube

Show All Videos