DiffDock-PP: Rigid Protein-Protein Docking with Diffusion Models (2304.03889v1)

Published 8 Apr 2023 in q-bio.BM and cs.LG

Abstract: Understanding how proteins structurally interact is crucial to modern biology, with applications in drug discovery and protein design. Recent machine learning methods have formulated protein-small molecule docking as a generative problem with significant performance boosts over both traditional and deep learning baselines. In this work, we propose a similar approach for rigid protein-protein docking: DiffDock-PP is a diffusion generative model that learns to translate and rotate unbound protein structures into their bound conformations. We achieve state-of-the-art performance on DIPS with a median C-RMSD of 4.85, outperforming all considered baselines. Additionally, DiffDock-PP is faster than all search-based methods and generates reliable confidence estimates for its predictions. Our code is publicly available at $\texttt{https://github.com/ketatam/DiffDock-PP}$

Citations (39)

View on Semantic Scholar

Summary

The paper introduces a diffusion-based generative framework for rigid protein docking, achieving a top-1 median C-RMSD of 4.85.
It leverages SE(3)-equivariant graph neural networks to efficiently model and sample rigid-body transformations in protein complexes.
The approach demonstrates significant speed improvements—up to 60× faster—over exhaustive search methods in drug discovery applications.

An Overview of DiffDock-PP: Rigid Protein-Protein Docking with Diffusion Models

The paper "DiffDock-PP: Rigid Protein-Protein Docking with Diffusion Models" presents a novel approach to tackling the challenge of protein-protein docking via the employment of diffusion generative models (DGMs). Given the significance of understanding the structural interactions between proteins in the realms of drug discovery and protein design, this research aims to forecast the formation of protein complexes from individual unbound proteins, maintaining their internal geometry constant. Unlike traditional methods, which are computationally intensive due to exhaustive search algorithms, DiffDock-PP frames docking as a generative problem, leveraging the recent advances achieved in protein-small molecule docking.

The method, DiffDock-PP, innovatively applies DGMs to learn the probability distribution over all possible protein poses. It achieves state-of-the-art results on the Database of Interacting Protein Structures (DIPS), with a top-1 median complex root mean square deviation (C-RMSD) of 4.85, outperforming all other existing methods. Furthermore, DiffDock-PP shows remarkable efficiency, being 5 to 60 times faster than search-based counterparts when employed with GPU computations.

Background and Related Work

There are traditionally two main methodologies for protein-protein docking: search-based methods and deep learning-based methods. Search-based approaches rely heavily on the exhaustive enumeration of potential complexes, which is computationally expensive. On the other hand, recent machine learning techniques have treated docking as a regression problem to predict the final pose directly, albeit with limited success compared to search-based methods.

DiffDock-PP takes inspiration from the burgeoning domain of using generative models for molecule docking, drawing on principles where generative modeling is aligned better with biological data's intrinsic multi-modal distributions than deterministic predictions. This distinct formulation allows DiffDock-PP to generate plausible and diverse protein conformations by estimating probabilities over possible configurations.

Methodology and Model Architecture

In rigid protein-protein docking, the interaction is modeled by determining a probability distribution over the ligand's rotational and translational space relative to the receptor. DiffDock-PP employs a DGM on this space, effectively tweaking and learning through the intrinsic manifold that represents the rigid-body transformations. The model architecture is rooted in SE(3)-equivariant graph neural networks, adapted to handle protein data symmetrically and efficiently.

The model progresses through several stages: characterizing proteins as graphs, processing through several SE(3)-equivariant convolutional layers, and finally delivering translational and rotational scores. Additionally, a confidence model is introduced to rank the sampled poses, picking the one with the highest confidence to be the final prediction.

Experimental Results and Implications

The evaluative framework set forth involves scrutinizing the performance on the DIPS test set under C-RMSD and interface RMSD metrics. DiffDock-PP substantially outperforms many traditional and novel models such as EquiDock. However, the paper notes that the confidence model, integral for selecting viable poses during inference, has room for enhancement to further refine accuracy outcomes.

Concluding Remarks and Future Directions

DiffDock-PP marks a significant step in applying diffusion generative modeling to rigid protein-protein docking, promising enhancements in both accuracy and computational efficiency. As the paradigm for protein docking shifts, further exploration into enhanced confidence models and exploring their applicability to flexible docking scenarios would be both a natural extension and a potent unlock for complex biomolecular interaction modeling.

This research, thus, provides a promising framework for embracing generative models in biomolecular science, and future work might look towards expanding these concepts to account for protein flexibility and more complex forms of biological interactions.

PDF Markdown

Related Papers

GitHub

GitHub - ketatam/DiffDock-PP: Implementation of DiffDock-PP: Rigid Protein-Protein Docking with Diffusion Models in PyTorch (ICLR 2023 - MLDD Workshop) (208 stars)