DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences (1811.02114v1)

Published 6 Nov 2018 in q-bio.QM and cs.LG

Abstract: Identification of drug-target interactions (DTIs) plays a key role in drug discovery. The high cost and labor-intensive nature of in vitro and in vivo experiments have highlighted the importance of in silico-based DTI prediction approaches. In several computational models, conventional protein descriptors are shown to be not informative enough to predict accurate DTIs. Thus, in this study, we employ a convolutional neural network (CNN) on raw protein sequences to capture local residue patterns participating in DTIs. With CNN on protein sequences, our model performs better than previous protein descriptor-based models. In addition, our model performs better than the previous deep learning model for massive prediction of DTIs. By examining the pooled convolution results, we found that our model can detect binding sites of proteins for DTIs. In conclusion, our prediction model for detecting local residue patterns of target proteins successfully enriches the protein features of a raw protein sequence, yielding better prediction results than previous approaches.

Authors (3)

Ingoo Lee (3 papers)
Jongsoo Keum (1 paper)
Hojung Nam (2 papers)

Citations (373)

View on Semantic Scholar

Summary

An Expert Overview of DeepConv-DTI: Predicting Drug-Target Interactions Using Convolutional Neural Networks

The paper "DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences" presented by Ingoo Lee, Jongsoo Keum, and Hojung Nam explores a deep learning approach to the prediction of drug-target interactions (DTIs), a crucial task in the drug discovery process. This paper leverages convolutional neural networks (CNNs) applied to raw protein sequences to enhance the prediction accuracy of drug-target binding affinities, surpassing existing descriptor-based and deep learning models.

Methodology and Innovations

The authors introduce a novel deep learning framework employing CNNs to process the raw sequences of proteins, facilitating the detection of local residue patterns that contribute to DTIs. CNNs are leveraged due to their capability to capture spatial hierarchies and patterns within sequences, thus providing a more nuanced representation of protein sequences compared to traditional methods using pre-defined descriptors like composition, transition, and distribution (CTD).

This approach addresses inherent limitations in current feature-based and similarity-based models, which often fail to capture detailed local residue patterns in proteins. By processing entire protein sequences directly, the model is able to mitigate information loss typically observed when using compressed descriptors or protein similarity scores.

Results and Performance Evaluation

The criteria for evaluating the model's performance were measured against an independent test dataset stemming from PubChem BioAssays and KinaseSARfari. The results indicated a superior accuracy and F1 score of the CNN model when compared to conventional descriptors and similarity-based methods. Specifically, the convolutional model showed enhanced performance in discriminating interactions across different protein classes and various protein lengths, as evidenced by the area under precision-recall (AUPR), achieving a score of 0.832.

Significant improvement over the baseline models is noted, especially when compared against methods such as the one presented by Wen et al. which employed deep belief networks (DBNs). The current model demonstrates enhanced capability in predicting DTIs, with less bias towards the input sequence length and illustrating applicability across a wide range of protein classes.

Implications and Future Prospects

The successful implementation of CNNs for the task of DTI prediction unveils the potential for further exploration of deep learning architectures in drug discovery, particularly in processing biological sequences. This methodology not only offers an improvement over existing methods but also provides insights into the binding mechanisms by effectively identifying protein-ligand binding sites within the sequence data.

Future developments could focus on refining these models to handle larger datasets more efficiently, incorporating additional biological contexts such as three-dimensional protein structures or combining multimodal data inputs (e.g., genetic or phenotypic annotations). Further, expanding the application beyond protein sequence data to include other significant biochemical entities in the drug-target landscape will likely yield broader insights and predictions.

Conclusion

Overall, the DeepConv-DTI paper significantly contributes to computational drug discovery by enhancing the predictive accuracy of drug-target interactions through the use of CNNs on raw protein sequences. This approach not only overcomes the limitations of existing feature-based models but also provides a scalable solution that can adapt to varying lengths and classes of proteins, thus offering a robust framework for future drug discovery endeavors.

PDF Markdown