Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties

Published 3 Dec 2012 in q-bio.GN, cs.CE, cs.LG, and q-bio.CB | (1212.0504v3)

Abstract: Predicting the response of a specific cancer to a therapy is a major goal in modern oncology that should ultimately lead to a personalised treatment. High-throughput screenings of potentially active compounds against a panel of genomically heterogeneous cancer cell lines have unveiled multiple relationships between genomic alterations and drug responses. Various computational approaches have been proposed to predict sensitivity based on genomic features, while others have used the chemical properties of the drugs to ascertain their effect. In an effort to integrate these complementary approaches, we developed machine learning models to predict the response of cancer cell lines to drug treatment, quantified through IC50 values, based on both the genomic features of the cell lines and the chemical properties of the considered drugs. Models predicted IC50 values in a 8-fold cross-validation and an independent blind test with coefficient of determination R2 of 0.72 and 0.64 respectively. Furthermore, models were able to predict with comparable accuracy (R2 of 0.61) IC50s of cell lines from a tissue not used in the training stage. Our in silico models can be used to optimise the experimental design of drug-cell screenings by estimating a large proportion of missing IC50 values rather than experimentally measure them. The implications of our results go beyond virtual drug screening design: potentially thousands of drugs could be probed in silico to systematically test their potential efficacy as anti-tumour agents based on their structure, thus providing a computational framework to identify new drug repositioning opportunities as well as ultimately be useful for personalized medicine by linking the genomic traits of patients to drug sensitivity.

Abstract PDF Upgrade to Chat

Authors (7)

Citations (420)

View on Semantic Scholar

Summary

The paper demonstrates a novel machine learning framework that accurately predicts cancer cell IC50 values from integrated genomic and chemical features.
The methodology employs neural networks and random forests to achieve high prediction accuracy with an average Pearson correlation of 0.85.
The results enable improved in silico drug screening and personalized medicine by reliably estimating drug sensitivity even for novel cell line contexts.

Machine Learning Prediction of Cancer Cell Sensitivity to Drugs

The paper "Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties" presents a novel approach integrating machine learning models to predict the response of cancer cell lines to various chemotherapy agents. The predictive framework is based on integrating two primary types of data: the genomic features of the cell lines and the chemical descriptors of the drugs.

Methodology Overview

The research leverages high-throughput screening data from the Genomics of Drug Sensitivity in Cancer (GDSC) project, comprising a diverse panel of cell lines. The primary objective is to predict the half-maximal inhibitory concentration (IC $_{50}$ ) values. These predictions are achieved using advanced machine learning techniques, particularly neural networks and random forests, incorporating a total of 827 features derived from cell line-specific genomic data (including mutational status and copy number variations of oncogenes) and chemical properties of drugs computed using the PaDEL software.

The models are trained using data representing genomic mutations and chemical descriptors across 608 cell lines treated with 111 drugs. The model's performance was evaluated through 8-fold cross-validation and a blinded test set comprised of newly generated experimental IC $_{50}$ values.

Numerical Results and Performance

The models demonstrated significant predictive capabilities. The neural network model achieved an average Pearson correlation coefficient $R_p$ of 0.85 and a coefficient of determination $R^2$ of 0.72 in the cross-validation set, and an $R^2$ of 0.64 in the independent blind test, highlighting the robustness of the model despite the inherent genomic and chemical complexities. Furthermore, it was observed that the models were effective in predicting IC $_{50}$ values even for cell lines from tissues not included in the training data, evidencing generalizability and robustness. Random forests provided comparable results, reinforcing the reliability of the chosen approach.

Implications and Future Prospects

The study holds significant implications for in silico drug screening and personalized medicine. By accurately estimating missing IC $_{50}$ values, these models can optimize the design of experimental drug screens, potentially reducing the need for extensive empirical testing. This computational approach enables the exploration of thousands of compounds to systematically test their efficacy as anti-tumor agents, thus advancing drug repositioning opportunities.

The integration of genomic and chemical features showcases the value of a multifaceted approach, which contrasts with traditional QSAR models that exclusively consider chemical properties and fail to account for genomic diversity. The predictive framework can facilitate precision oncology by tailoring therapeutic strategies to a patient's specific genomic makeup.

Future developments could involve incorporating additional layers of biological data, such as transcriptional or proteomic profiles, which might further enhance the model's predictive accuracy. Additionally, expanding this framework to incorporate network biology perspectives could allow for the identification of dysregulated pathways, offering greater insights into drug action mechanisms.

Conclusion

This research represents a substantive advancement in computational oncology, demonstrating the potential of machine learning to unravel complex genotype-phenotype relationships and provide actionable predictions regarding cancer drug sensitivities. The model's design and results offer a foundational step towards more comprehensive and predictive models, laying the groundwork for integrating more extensive multi-omic datasets in the pursuit of personalized cancer treatment strategies.

Markdown Report Issue