Deep-Learning Based Docking Methods: Fair Comparisons to Conventional Docking Workflows (2412.02889v2)

Published 3 Dec 2024 in cs.AI and q-bio.BM

Abstract: The diffusion learning method, DiffDock, for docking small-molecule ligands into protein binding sites was recently introduced. Results included comparisons to more conventional docking approaches, with DiffDock showing superior performance. Here, we employ a fully automatic workflow using the Surflex-Dock methods to generate a fair baseline for conventional docking approaches. Results were generated for the common and expected situation where a binding site location is known and also for the condition of an unknown binding site. For the known binding site condition, Surflex-Dock success rates at 2.0 Angstroms RMSD far exceeded those for DiffDock (Top-1/Top-5 success rates, respectively, were 68/81% compared with 45/51%). Glide performed with similar success rates (67/73%) to Surflex-Dock for the known binding site condition, and results for AutoDock Vina and Gnina followed this pattern. For the unknown binding site condition, using an automated method to identify multiple binding pockets, Surflex-Dock success rates again exceeded those of DiffDock, but by a somewhat lesser margin. DiffDock made use of roughly 17,000 co-crystal structures for learning (98% of PDBBind version 2020, pre-2019 structures) for a training set in order to predict on 363 test cases (2% of PDBBind 2020) from 2019 forward. DiffDock's performance was inextricably linked with the presence of near-neighbor cases of close to identical protein-ligand complexes in the training set for over half of the test set cases. DiffDock exhibited a 40 percentage point difference on near-neighbor cases (two-thirds of all test cases) compared with cases with no near-neighbor training case. DiffDock has apparently encoded a type of table-lookup during its learning process, rendering meaningful applications beyond its reach. Further, it does not perform even close to competitively with a competently run modern docking workflow.

Summary

The paper demonstrates that conventional docking workflows outperform DiffDock for known binding site tasks with success rates of 68/81% versus 45/51%.
The paper reveals that DiffDock's reliance on training-test data overlap inflates its performance metrics, highlighting the need for more rigorous evaluation strategies.
The paper underscores the potential for hybrid approaches that integrate deep learning with traditional docking methods to advance computer-aided drug design.

Deep-Learning Based Docking Methods: A Critical Analysis

The paper "Deep-Learning Based Docking Methods: Fair Comparisons to Conventional Docking Workflows" addresses the performance and application context of the deep learning-based docking method, DiffDock, in comparison with conventional docking approaches. The paper highlights the limitations of DiffDock when compared to well-established methodologies like Surflex-Dock, Glide, AutoDock Vina, and Gnina, emphasizing the significance of a fair evaluation framework.

Overview

The primary focus of this paper is to assess the capability of DiffDock, a diffusion learning method for docking small-molecule ligands into protein binding sites, against traditional docking workflows. The authors contend that previous evaluations may have overstated DiffDock's efficacy due to the use of data subsets with significant overlap between training and testing datasets. This overlap is demonstrated as a reliance on near-neighbor cases from DiffDock's extensive training set, which may have inadvertently influenced its performance.

Key Findings and Numerical Outcomes

The paper outlines that Surflex-Dock, along with other conventional docking tools, exhibits superior performance compared to DiffDock. For known binding site conditions, Surflex-Dock achieved significantly higher success rates at the 2.0 Å RMSD threshold, with Top-1/Top-5 success rates of 68/81% over DiffDock's 45/51%. Similarly, Glide reported a 67/73% success rate for known binding sites, also surpassing DiffDock. AutoDock Vina achieved comparable outcomes, outperforming DiffDock noticeably when using the known binding site condition.

In blind docking scenarios, where no apriori knowledge of binding sites is provided, Surflex-Dock continued to demonstrate superior results, although the performance margin narrowed. Notably, DiffDock's reliance on a 98/2% training/test split led to an overestimation of its performance due to the close similarity between training and test ligands, a fact corroborated by comparative RMSD analysis.

Implications for AI and Structural Biology

The findings suggest that while AI and deep learning offer novel opportunities in computer-aided drug design (CADD), their application must be contextualized against the backdrop of existing methodologies. Deep learning's promise in CADD can be realized, provided there is an acknowledgment of the potential for overfitting, especially in scenarios involving large and intricate datasets such as PDBBind.

Practically, the paper indicates the need for AI models like DiffDock to be more versatile beyond the confines of training data, especially for prospective applications involving novel targets and ligand structures. Conventional methods currently offer robust performance and the ability to handle unknown binding sites without prior knowledge reliance, which is critical for practical drug discovery processes.

Theoretical Perspectives and Future Directions

Theoretically, the evaluation raises questions about how deep learning models can better integrate with limited data, characteristic of novel ligand discovery tasks. This paper encourages further exploration of hybrid models that combine the strengths of both AI-driven and traditional docking methods. Enhanced strategies for reducing data dependency and improving the generalization of docking predictions will be pivotal.

For the field of AI in drug design, future research could focus on optimizing data usage during training and validating models under stricter conditions. By curating more challenging datasets, the research community can ensure that models adapt and learn to predict with minimal overlap, highlighting genuine predictive capability.

In conclusion, this paper provides valuable insights into the comparison of deep learning and conventional docking methods, urging caution in the interpretation of AI-based methods' performance claims. The rigorous baseline offered by conventional approaches sets a foundation for future AI advancements in molecular docking.

PDF Markdown

Related Papers

Tweets

https://twitter.com/wpwalters/status/1864705813803040946

https://twitter.com/arxivsanitybot/status/1864863703314649146

https://twitter.com/zenbrainest/status/1864818111037477027