Development and evaluation of a deep learning model for protein-ligand binding affinity prediction (1712.07042v2)

Published 19 Dec 2017 in stat.ML, cs.LG, and q-bio.BM

Abstract: Structure based ligand discovery is one of the most successful approaches for augmenting the drug discovery process. Currently, there is a notable shift towards ML methodologies to aid such procedures. Deep learning has recently gained considerable attention as it allows the model to "learn" to extract features that are relevant for the task at hand. We have developed a novel deep neural network estimating the binding affinity of ligand-receptor complexes. The complex is represented with a 3D grid, and the model utilizes a 3D convolution to produce a feature map of this representation, treating the atoms of both proteins and ligands in the same manner. Our network was tested on the CASF "scoring power" benchmark and Astex Diverse Set and outperformed classical scoring functions. The model, together with usage instructions and examples, is available as a git repository at http://gitlab.com/cheminfIBB/pafnucy

Authors (3)

Marta M. Stepniewska-Dziubinska (2 papers)
Piotr Zielenkiewicz (2 papers)
Pawel Siedlecki (8 papers)

Citations (410)

View on Semantic Scholar

Summary

Insights into a Deep Learning Approach for Protein-Ligand Binding Affinity Prediction

The paper "Development and evaluation of a deep learning model for protein-ligand binding affinity prediction" presents a comprehensive paper on utilizing deep neural networks (DNNs) for predicting the binding affinity of protein-ligand complexes. This approach is particularly significant given the computational challenges in drug discovery, where traditional scoring functions often fall short in accurately predicting binding strengths.

Model Architecture and Methodology

The authors introduce Pafnucy, a novel DNN model, which distinguishes itself by processing molecular complexes through a 3D convolutional architecture. The input is a 4D tensor representation of the molecular complex, capturing both spatial coordinates and molecular features. The architecture involves convolutional layers with max pooling, followed by fully connected dense layers. This design facilitates the model's capability to learn spatially relevant patterns and complex relationships inherent in protein-ligand interactions.

The network stands out by not necessitating extensive feature engineering; instead, it relies on an adaptable representation of both proteins and ligands, signifying less dependence on a priori expert-defined rules. This flexibility might offer a robust pathway to uncovering underlying biochemical interactions that are less apparent through conventional methods.

Dataset and Evaluation

The model was trained and tested using data from the PDBbind database (v. 2016) along with the Astex Diverse Set, thereby ensuring a diverse range of molecular complexes. Notably, the training involves presenting each protein-ligand complex in 24 different orientations to enhance the model's rotational invariance—a crucial feature for ensuring stability and robustness in predictions.

Pafnucy's performance was evaluated against established benchmarks, such as the CASF "scoring power" benchmark, where it demonstrated superior predictive accuracy over traditional scoring functions like X-Score. In particular, the root mean square error (RMSE) of 1.42 and a correlation coefficient ( $R$ ) of 0.78 on the PDBbind core set are indicative of its potent predictive capabilities.

Implications and Future Directions

The practical implications of Pafnucy are substantive within the field of computational drug design. By enhancing the accuracy of binding affinity predictions, this model can significantly streamline the drug discovery pipeline, effectively prioritizing candidate molecules for experimental validation. Furthermore, the DNN approach aids in deciphering which molecular interactions most critically affect binding affinity, thus providing insights that can guide further biochemical exploration.

From a theoretical perspective, the model's capacity to learn from 3D structural representations without exhaustive feature engineering heralds a new paradigm for machine learning methodologies in bioinformatics. Future research would benefit from exploring the integration of Pafnucy with other predictive frameworks, possibly incorporating multi-modal data to capture even broader interaction potentials.

Additionally, extending the model to handle an even broader range of ligand diversity and integrating biophysical simulations could further enhance its applicability in drug design, possibly facilitating the discovery of novel drug-like molecules with desired therapeutic profiles.

Conclusion

The paper offers a rigorous examination of a deep learning approach to protein-ligand binding affinity prediction. By deploying a sophisticated convolutional neural network architecture, Pafnucy exhibits enhanced predictive power over traditional methods, positioning it as a valuable tool in the computational armory of drug discovery. The insights gleaned from this research underscore the transformative potential of deep learning in computational biology, with far-reaching implications for future studies in the domain.

PDF Markdown

Related Papers

Find Related Papers