Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity (1703.10603v1)

Published 30 Mar 2017 in cs.LG, physics.chem-ph, and stat.ML

Abstract: Empirical scoring functions based on either molecular force fields or cheminformatics descriptors are widely used, in conjunction with molecular docking, during the early stages of drug discovery to predict potency and binding affinity of a drug-like molecule to a given target. These models require expert-level knowledge of physical chemistry and biology to be encoded as hand-tuned parameters or features rather than allowing the underlying model to select features in a data-driven procedure. Here, we develop a general 3-dimensional spatial convolution operation for learning atomic-level chemical interactions directly from atomic coordinates and demonstrate its application to structure-based bioactivity prediction. The atomic convolutional neural network is trained to predict the experimentally determined binding affinity of a protein-ligand complex by direct calculation of the energy associated with the complex, protein, and ligand given the crystal structure of the binding pose. Non-covalent interactions present in the complex that are absent in the protein-ligand sub-structures are identified and the model learns the interaction strength associated with these features. We test our model by predicting the binding free energy of a subset of protein-ligand complexes found in the PDBBind dataset and compare with state-of-the-art cheminformatics and machine learning-based approaches. We find that all methods achieve experimental accuracy and that atomic convolutional networks either outperform or perform competitively with the cheminformatics based methods. Unlike all previous protein-ligand prediction systems, atomic convolutional networks are end-to-end and fully-differentiable. They represent a new data-driven, physics-based deep learning model paradigm that offers a strong foundation for future improvements in structure-based bioactivity prediction.

Authors (4)

Joseph Gomes (10 papers)
Bharath Ramsundar (30 papers)
Evan N. Feinberg (6 papers)
Vijay S. Pande (38 papers)

Citations (183)

View on Semantic Scholar

Summary

The paper presents ACNNs that use 3D atomic convolutions to eliminate the need for hand-tuned force-field parameters.
It achieves low mean unsigned error (under 1 kcal/mol) and high Pearson correlations on the PDBBind dataset, outperforming ligand-only models.
ACNNs generalize well to larger molecular systems, paving the way for scalable, data-driven drug discovery approaches.

Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity

In the paper titled "Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity," the authors introduce a novel deep learning model aimed at enhancing the accuracy and efficiency of predicting protein-ligand binding affinity, a crucial step in drug discovery. The proposed Atomic Convolutional Networks (ACNNs) leverage three-dimensional spatial convolutions to directly learn atomic-level chemical interactions from atomic coordinates.

Overview of Methodology

Traditional empirical scoring functions for predicting ligand binding rely on molecular force fields or fixed cheminformatics descriptors, requiring significant domain expertise for parameterization. ACNNs depart from this convention by employing a data-driven, end-to-end learning approach that eliminates the need for hand-tuned parameters. This model integrates atomic convolutional operations based on local atomic environments and optimizes feature selection through training on experimental data.

ACNNs process molecular input through two novel operations — atom type convolution and radial pooling. The atom type convolution extracts features that encode local chemical environments without assuming spatial locality, while radial pooling abstracts these features into a format invariant to neighbor list permutation. The resultant output is fed into a fully connected network, which assesses the energy of the molecular system.

To validate the model, the researchers use the PDBBind dataset, focusing on predicting the binding free energy of protein-ligand complexes. The prediction accuracy of ACNNs is benchmarked against existing cheminformatics and machine learning techniques, including GRID-RF, GRID-NN, GCNN, and ECFP-based models. ACNNs achieve competitive or superior performance, particularly evident in the core and refined subsets of the dataset, which contain high-quality structural information.

Numerical Results and Performance

Across various dataset splits, ACNN models consistently demonstrate low mean unsigned error (MUE), typically under 1 kcal/mol — a threshold cited as chemically accurate for drug design. Pearson correlation coefficients ( $R^2$ ) indicate that ACNNs maintain performance parity with leading structure-based methods like GRID-RF, outperforming in certain scenarios such as scaffold split tests.

The paper highlights that ACNNs outperform methods reliant solely on ligand information, such as GCNN and ECFP derivatives, where ligand-based models exhibit weaker generalization. This underscores the importance of incorporating protein structure into predictive models for drug discovery. Notably, the ability of ACNNs to generalize prediction on systems larger than training data showcases their scalability.

Implications and Future Directions

The introduction of ACNNs heralds a significant shift in the methodology for predicting molecular interactions, providing a robust alternative to traditional force-field-based methods. By offering a fully-differentiable and end-to-end learning framework, ACNNs facilitate a more nuanced understanding of protein-ligand interactions at the atomic level. This represents a foundational advancement in computational drug discovery, opening pathways for refining predictive models through iterative enhancement and employing high-quality structural data.

Future research may focus on addressing ACNNs’ sensitivity to dataset quality and the challenges of extrapolating beyond the training data’s chemical space. The prospect of deploying ACNNs for applications beyond drug discovery, such as the automated fitting of potential energy surfaces or screening new materials, remains promising. Enhancing data regularization techniques and exploring the model's adaptability across varied chemical and biological systems could further solidify ACNNs’ utility and performance.

PDF Markdown