DeepDTA: Deep Drug-Target Binding Affinity Prediction (1801.10193v2)

Published 30 Jan 2018 in stat.ML and cs.LG

Abstract: The identification of novel drug-target (DT) interactions is a substantial part of the drug discovery process. Most of the computational methods that have been proposed to predict DT interactions have focused on binary classification, where the goal is to determine whether a DT pair interacts or not. However, protein-ligand interactions assume a continuum of binding strength values, also called binding affinity and predicting this value still remains a challenge. The increase in the affinity data available in DT knowledge-bases allows the use of advanced learning techniques such as deep learning architectures in the prediction of binding affinities. In this study, we propose a deep-learning based model that uses only sequence information of both targets and drugs to predict DT interaction binding affinities. The few studies that focus on DT binding affinity prediction use either 3D structures of protein-ligand complexes or 2D features of compounds. One novel approach used in this work is the modeling of protein sequences and compound 1D representations with convolutional neural networks (CNNs). The results show that the proposed deep learning based model that uses the 1D representations of targets and drugs is an effective approach for drug target binding affinity prediction. The model in which high-level representations of a drug and a target are constructed via CNNs achieved the best Concordance Index (CI) performance in one of our larger benchmark data sets, outperforming the KronRLS algorithm and SimBoost, a state-of-the-art method for DT binding affinity prediction.

View on arXiv

Authors (3)

Hakime Öztürk (5 papers)
Elif Ozkirimli (11 papers)
Arzucan Özgür (24 papers)

Citations (914)

View on Semantic Scholar

Summary

DeepDTA: Deep Drug-Target Binding Affinity Prediction

The paper "DeepDTA: Deep Drug-Target Binding Affinity Prediction" presents a computational approach for predicting the binding affinity of drug-target pairs using deep learning techniques, focusing specifically on convolutional neural networks (CNNs). The research is authored by Hakime Öztürk, Elif Ozkirimli, and Arzucan Özgür from Bogazici University, Istanbul, Turkey. This research contributes to the field of pharmacology and bioinformatics, addressing the challenge of predicting continuous binding affinity values rather than the traditional binary classification of drug-target interactions.

Introduction and Motivation

The core of drug discovery involves the identification and characterization of drug-target interactions (DTIs). Traditional approaches to DTIs have largely been focused on binary classifications—determining whether a drug interacts with a target or not. However, drug-target interactions exhibit a continuum of binding strengths, and predicting these continuous values—binding affinities—is crucial for better understanding drug efficacy and specificity.

Recent advances have leveraged large datasets and sophisticated machine learning techniques to predict binding affinities, but they predominantly use 2D or 3D structural information, which may not always be readily available. Hence, there is a significant motivation to develop models that utilize only sequence information of proteins and drugs to predict binding affinities.

Proposed Method

The authors propose a novel model named DeepDTA, which uses CNNs to learn high-level representations of protein sequences and compound SMILES strings (Simplified Molecular Input Line Entry System). The model operates on 1D representations of the biochemical sequences and predicts binding affinities efficiently.

Dataset and Representations

The paper utilizes two primary datasets:

Davis Dataset: Contains binding affinities measured as dissociation constant ( $K_d$ ) for interactions between 442 kinase proteins and 68 ligands, resulting in 30,056 interactions.
KIBA Dataset: Compiles various inhibitor bioactivities ( $K_i$ , $K_d$ , $IC_{50}$ ) for 229 protein targets and 2111 drugs, leading to a comprehensive dataset of 118,254 interactions.

For the input representations, SMILES strings for compounds and amino acid sequences for proteins are encoded into fixed-length numerical vectors. CNNs, known for their efficacy in capturing spatial hierarchies in data, were employed to learn patterns from these sequences.

Experimental Setup and Evaluation

The model's performance was evaluated using the Concordance Index (CI) and Mean Squared Error (MSE) metrics. The authors compared their CNN-based model with two state-of-the-art baseline methods:

KronRLS: Utilizes a Kronecker regularized least squares algorithm based on similarity scores derived from 2D structures and the Smith-Waterman algorithm.
SimBoost: Employs a gradient boosting machine utilizing extensive feature engineering on compound and target similarities.

Results and Analysis

The findings indicate that the DeepDTA model, when using only CNN-learned representations from sequence data, performed comparably or better than both KronRLS and SimBoost methods. Specifically:

On the Davis dataset, DeepDTA achieved a CI score of 0.878 and an MSE of 0.261, aligning closely with the baseline methods' performance.
On the KIBA dataset, DeepDTA notably outperformed the baselines with a CI score of 0.863 and an MSE of 0.194, indicating its superior capability in larger datasets.

Implications and Future Directions

The results demonstrate that deep learning methodologies, especially those employing CNNs, can effectively predict binding affinities using only sequence data. This has significant implications:

Practical Applications: This approach can enhance drug discovery pipelines by accurately predicting affinities without relying on extensive structural data, which can be challenging to obtain.
Theoretical Contributions: It opens new avenues for leveraging deep learning in bioinformatics, promoting the use of sequence data for complex biochemical interaction predictions.

Future work may involve integrating more advanced neural network architectures like Long-Short Term Memory (LSTM) networks to better capture long-range dependencies in protein sequences. Additionally, exploring transfer learning could further boost the model's efficacy for novel targets and drugs, enhancing its generalizability and application scope.

Conclusion

The research presented in the paper "DeepDTA: Deep Drug-Target Binding Affinity Prediction" exemplifies a significant stride in applying deep learning to drug discovery. By demonstrating that CNNs can learn effective representations from raw sequence data for binding affinity prediction, it paves the way for further exploration and application of deep learning in pharmacology and bioinformatics. The promising results, especially in large datasets, underscore the potential of such methods in overcoming traditional challenges in drug-target prediction tasks.

PDF Markdown

Related Papers

Find Related Papers