Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DeepAffinity: Interpretable Deep Learning of Compound-Protein Affinity through Unified Recurrent and Convolutional Neural Networks (1806.07537v2)

Published 20 Jun 2018 in q-bio.BM, cs.LG, and stat.ML

Abstract: Motivation: Drug discovery demands rapid quantification of compound-protein interaction (CPI). However, there is a lack of methods that can predict compound-protein affinity from sequences alone with high applicability, accuracy, and interpretability. Results: We present a seamless integration of domain knowledges and learning-based approaches. Under novel representations of structurally-annotated protein sequences, a semi-supervised deep learning model that unifies recurrent and convolutional neural networks has been proposed to exploit both unlabeled and labeled data, for jointly encoding molecular representations and predicting affinities. Our representations and models outperform conventional options in achieving relative error in IC$_{50}$ within 5-fold for test cases and 20-fold for protein classes not included for training. Performances for new protein classes with few labeled data are further improved by transfer learning. Furthermore, separate and joint attention mechanisms are developed and embedded to our model to add to its interpretability, as illustrated in case studies for predicting and explaining selective drug-target interactions. Lastly, alternative representations using protein sequences or compound graphs and a unified RNN/GCNN-CNN model using graph CNN (GCNN) are also explored to reveal algorithmic challenges ahead. Availability: Data and source codes are available at https://github.com/Shen-Lab/DeepAffinity Supplementary Information: Supplementary data are available at http://shen-lab.github.io/deep-affinity-bioinf18-supp-rev.pdf

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Mostafa Karimi (7 papers)
  2. Di Wu (478 papers)
  3. Zhangyang Wang (375 papers)
  4. Yang Shen (98 papers)
Citations (341)

Summary

  • The paper introduces a unified RNN-CNN framework that leverages sequence data to accurately predict compound-protein affinities.
  • The paper employs attention mechanisms to enhance interpretability by pinpointing key molecular fragments responsible for binding.
  • The paper utilizes transfer learning and innovative molecular representations to generalize predictions even for sparsely labeled protein classes.

Interpretable Deep Learning of Compound-Protein Affinity: A Synthesis of RNN and CNN Approaches

The paper, titled "DeepAffinity: Interpretable Deep Learning of Compound-Protein Affinity through Unified Recurrent and Convolutional Neural Networks," presents a novel deep learning framework designed to predict the affinity between compounds and proteins using only sequence data. This work addresses the challenges of broad applicability, accuracy, and interpretability within the context of compound-protein interactions (CPI), which are critical for drug discovery and development.

The authors introduce a semi-supervised learning model that unifies recurrent neural networks (RNNs) and convolutional neural networks (CNNs) to exploit both labeled and unlabeled data. This integration facilitates the development of molecular representations that can predict binding affinities with remarkable efficacy. By employing structurally annotated protein sequences, the proposed model effectively encodes molecular features and achieves competitive performance metrics. The model attains a relative error in IC₅₀ prediction within a 5-fold range for test cases and a 20-fold range for unseen protein classes. These results are further bolstered by employing transfer learning techniques, which improve predictions in scenarios with limited labeled data for new protein classes.

Key Contributions and Innovations

  1. Unified RNN-CNN Framework: The proposed model integrates RNNs and CNNs, leveraging the strengths of each architecture in handling sequential biological data and allowing for enhanced feature extraction from protein sequences and chemical structures. This architecture is trained end-to-end, thereby refining task-specific protein and compound representations.
  2. Interpretability through Attention Mechanisms: The application of separate and joint attention mechanisms enhances the model's interpretability by pinpointing specific molecular fragments responsible for binding interactions. This insight is crucial for understanding selective drug-target interactions, thus contributing mechanistic insights into compound-protein affinities.
  3. Innovative Data Representation: The paper introduces novel representations for compounds using SMILES strings and protein sequences characterized by secondary structure elements (SSEs) and physiochemical properties. These compact representations significantly reduce the dimensionality of the input data compared to conventional molecular descriptors and Pfam domains, leading to more efficient and interpretable model training.
  4. Generalization and Transfer Learning: Deep transfer learning is employed to generalize predictions to new protein classes with few labeled examples. This methodological advance shows promise for extending model applicability in drug discovery environments where experimental data is scarce.

Practical and Theoretical Implications

The integration of RNN and CNN architectures within the DeepAffinity framework meets the demands of modern drug discovery, which requires both high-throughput computational predictions and mechanistic insights into drug-target interactions. By offering interpretable predictions, DeepAffinity provides a foundation for subsequent experimental validations and therapeutic exploration. The attention mechanisms embedded within the model offer a transparent view of the prediction process, enabling researchers to better understand critical binding sites and selectivity origins at a molecular level.

The prospects of applying 3D structure predictions via extensions such as graph-based compound representations (GCNN) offer a tantalizing avenue for future research. Moreover, addressing the challenges of long-sequence RNN training for protein sequences can further enhance model performance and broaden its applicability.

In summary, the DeepAffinity framework represents a significant advancement in the computational prediction of compound-protein interactions. Its ability to accurately predict affinities, alongside providing interpretable insights, aligns well with the goals of precision medicine and targeted therapeutics, hinting at vast potential for future developments in AI-driven drug discovery.