Drug response prediction by ensemble learning and drug-induced gene expression signatures (1802.03800v3)

Published 11 Feb 2018 in q-bio.GN, cs.LG, and stat.ML

Abstract: Chemotherapeutic response of cancer cells to a given compound is one of the most fundamental information one requires to design anti-cancer drugs. Recent advances in producing large drug screens against cancer cell lines provided an opportunity to apply machine learning methods for this purpose. In addition to cytotoxicity databases, considerable amount of drug-induced gene expression data has also become publicly available. Following this, several methods that exploit omics data were proposed to predict drug activity on cancer cells. However, due to the complexity of cancer drug mechanisms, none of the existing methods are perfect. One possible direction, therefore, is to combine the strengths of both the methods and the databases for improved performance. We demonstrate that integrating a large number of predictions by the proposed method improves the performance for this task. The predictors in the ensemble differ in several aspects such as the method itself, the number of tasks method considers (multi-task vs. single-task) and the subset of data considered (sub-sampling). We show that all these different aspects contribute to the success of the final ensemble. In addition, we attempt to use the drug screen data together with two novel signatures produced from the drug-induced gene expression profiles of cancer cell lines. Finally, we evaluate the method predictions by in vitro experiments in addition to the tests on data sets.The predictions of the methods, the signatures and the software are available from \url{http://mtan.etu.edu.tr/drug-response-prediction/}.

Authors (5)

Mehmet Tan (4 papers)
Ozan Fırat Özgül (1 paper)
Batuhan Bardak (4 papers)
Işıksu Ekşioğlu (2 papers)
Suna Sabuncuoğlu (1 paper)

Citations (31)

View on Semantic Scholar

Summary

The paper introduces a novel ensemble model that integrates KBMTL, Elastic-Net, PSVR, and Neural Networks to improve prediction accuracy.
It employs dual-layered genomic signatures (ActSig and CLSS) derived from GDSC, CCLE, and LINCS to capture drug-cell interaction nuances.
Empirical results demonstrate reduced mean squared error and enhanced Spearman correlation, validated by in vitro experiments on A549 cells.

Ensemble Learning and Drug-Induced Gene Expression Signatures for Drug Response Prediction

The paper "Drug response prediction by ensemble learning and drug-induced gene expression signatures" addresses the intricate challenge of predicting drug efficacy in cancer cell lines using computational models. The authors present a robust framework leveraging ensemble learning combined with novel gene expression signatures to enhance the predictive accuracy of anti-cancer drug activity.

Context and Motivation

The enormous complexity of cancer drug mechanisms and the expanding availability of drug-induced gene expression data necessitate more sophisticated predictive models. The inherent variability in cancer cell response to treatment underscores the need for models that can potentially replace some experimental procedures with in silico predictions. This paper contributes to the existing body of knowledge by amalgamating machine learning methodologies and biologically-relevant data integration to construct a more refined approach.

Methodology

The principal novelty of this work lies in the ensemble learning model they propose. It integrates diverse prediction methods, including Kernelized Bayesian Multi-task Learning (KBMTL), Elastic-Net, Pairwise Support Vector Regression (PSVR), and Neural Networks, into a cohesive system through stacked generalization. These methods capitalize on different representation hypotheses that reflect distinct aspects of cancer cell and drug interactions.

The paper also introduces dual-layered genomic signatures: drug activity signatures (ActSig) and cell line sensitivity signatures (CLSS). These structures, derived from large-scale databases such as the Genomics of Drug Sensitivity in Cancer (GDSC), Cancer Cell Line Encyclopedia (CCLE), and LINCS, are essential for capturing nuanced biological activities and correlating them with pharmacological effects.

Strong Empirical Results

The ensemble method, ELDAP (Ensemble Learning for Drug Activity Prediction), presented herein, yields significant performance improvements. Using metrics such as mean squared error (MSE) and Spearman correlation (ρ), the ensemble model surpassed single methods like KBMTL and Elastic-Net, especially on large datasets like GDSC. Moreover, the inclusion of signature similarities further enhanced prediction accuracies for specific subsets of the data where these were applicable.

In addition to in silico evaluations, the paper validated the approach through in vitro experiments on a human epithelial lung carcinoma cell line (A549), confirming the model's predictive capability with commercially available drugs.

Implications and Future Directions

The implications of this paper are twofold: Practical implications involve the potential reduction in reliance on extensive and costly wet-lab experiments by providing a computational alternative for drug efficacy testing. Theoretically, the construction of robust ensemble models and integration of gene expression signatures mark a significant advancement in precision medicine applications in oncology.

Future research could focus on incorporating pathway data sets, which might illuminate additional dimensions of the gene-drug interaction circuitry, and employing active learning strategies to optimize experimental testing procedures. Another potential enhancement involves expanding the ensemble with more diverse models, like Random Forests, which showed promise in external evaluations.

In conclusion, this paper makes a significant contribution to the field of computational drug response prediction, with its methodologies potentially applicable to other domains requiring intricate biological data interpretations. Its integration of ensemble learning with biologically meaningful signatures presents a scalable framework for complex biomedical prediction tasks.

PDF Markdown