drGAT: Attention-Guided Gene Assessment of Drug Response Utilizing a Drug-Cell-Gene Heterogeneous Network (2405.08979v1)
Abstract: Drug development is a lengthy process with a high failure rate. Increasingly, machine learning is utilized to facilitate the drug development processes. These models aim to enhance our understanding of drug characteristics, including their activity in biological contexts. However, a major challenge in drug response (DR) prediction is model interpretability as it aids in the validation of findings. This is important in biomedicine, where models need to be understandable in comparison with established knowledge of drug interactions with proteins. drGAT, a graph deep learning model, leverages a heterogeneous graph composed of relationships between proteins, cell lines, and drugs. drGAT is designed with two objectives: DR prediction as a binary sensitivity prediction and elucidation of drug mechanism from attention coefficients. drGAT has demonstrated superior performance over existing models, achieving 78\% accuracy (and precision), and 76\% F1 score for 269 DNA-damaging compounds of the NCI60 drug response dataset. To assess the model's interpretability, we conducted a review of drug-gene co-occurrences in Pubmed abstracts in comparison to the top 5 genes with the highest attention coefficients for each drug. We also examined whether known relationships were retained in the model by inspecting the neighborhoods of topoisomerase-related drugs. For example, our model retained TOP1 as a highly weighted predictive feature for irinotecan and topotecan, in addition to other genes that could potentially be regulators of the drugs. Our method can be used to accurately predict sensitivity to drugs and may be useful in the identification of biomarkers relating to the treatment of cancer patients.
- Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2623–2631, 2019.
- Representation of molecules for drug response prediction. Briefings in Bioinformatics, 23(1):bbab393, 2022.
- Francisco Azuaje. Computational models for predicting drug responses in cancer research. Briefings in bioinformatics, 18(5):820–829, 2017.
- Automated assembly of molecular mechanisms at scale from text mining and curated databases. Molecular Systems Biology, 19(5):e11325, 2023.
- A phase-1 pharmacokinetic optimal dosing study of intraventricular topotecan for children with neoplastic meningitis: A pediatric brain tumor consortium study. Pediatric blood & cancer, 60(4):627–632, 2013.
- Pubchem: integrated platform of small molecules and biological activities. In Annual reports in computational chemistry, volume 4, pages 217–241. Elsevier, 2008.
- How attentive are graph attention networks? arXiv preprint arXiv:2105.14491, 2021.
- Drug design by machine learning: support vector machines for pharmaceutical data analysis. Computers & chemistry, 26(1):5–14, 2001.
- Graphnorm: A principled approach to accelerating graph neural network training. In International Conference on Machine Learning, pages 1204–1215. PMLR, 2021.
- Deep learning and its applications in biomedicine. Genomics, Proteomics, Bioinformatics, 16:17 – 32, 2018. 10.1016/j.gpb.2017.07.003.
- Integrated identification of disease specific pathways using multi-omics data. bioRxiv, page 666065, 2019.
- Epidermal growth factor receptor (egfr) and its cross-talks with topoisomerases: challenges and opportunities for multi-target anticancer drugs. Current Pharmaceutical Design, 22(21):3226–3236, 2016.
- A survey and systematic assessment of computational methods for drug response prediction. Briefings in bioinformatics, 22(1):232–246, 2021.
- Data-driven detection of subtype-specific differentially expressed genes. Scientific reports, 11(1):332, 2021.
- Topoisomerase iiα𝛼\alphaitalic_α in chromosome instability and personalized cancer therapy. Oncogene, 34(31):4019–4031, 2015.
- NCBI Resource Coordinators. Database resources of the national center for biotechnology information. Nucleic acids research, 41(D1):D8–D20, 2012.
- Serpina6, bex1, agtr1, slc26a3, and laptm4b are markers of resistance to neoadjuvant chemotherapy in her2-negative breast cancer. Breast cancer research and treatment, 137:213–223, 2013.
- Gseapy: a comprehensive package for performing gene set enrichment analysis in python. Bioinformatics, 39(1):btac757, 2023.
- Fast graph representation learning with pytorch geometric. arXiv preprint arXiv:1903.02428, 2019.
- Differentiable scaffolding tree for molecular optimization. arXiv preprint arXiv:2109.10469, 2021.
- Hint: Hierarchical interaction network for clinical-trial-outcome predictions. Patterns, 3(4), 2022.
- Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature, 483(7391):570–575, 2012.
- Neural message passing for quantum chemistry. In International conference on machine learning, pages 1263–1272. PMLR, 2017.
- Explaining explanations: An overview of interpretability of machine learning. In 2018 IEEE 5th International Conference on data science and advanced analytics (DSAA), pages 80–89. IEEE, 2018.
- Establishment and characterization of a bladder cancer cell line with enhanced doxorubicin resistance by mevalonate pathway activation. Tumor Biology, 36:3293–3300, 2015.
- Targeting e2f sensitizes prostate cancer cells to drug-induced replication stress by promoting unscheduled cdk1 activity. Cancers, 14(19):4952, 2022.
- The potential of topoisomerase inhibitor-based antibody–drug conjugates. Pharmaceutics, 14(8):1707, 2022.
- Auto-keras: An efficient neural architecture search system. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1946–1956, 2019.
- Enhancing model learning and interpretation using multiple molecular graph representations for compound property and activity prediction. In 2023 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pages 1–8. IEEE, 2023.
- Graph convolutional network for drug response prediction using gene expression data. Mathematics, 9(7):772, 2021.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
- Matrix factorization techniques for recommender systems. Computer, 42(8):30–37, 2009.
- Predicting drug response and synergy using a deep learning model of human cancer cells. Cancer cell, 38(5):672–684, 2020.
- Greg Landrum et al. Rdkit: Open-source cheminformatics software. 2016.
- Machine learning in genomic medicine: a review of computational problems and data sets. Proceedings of the IEEE, 104(1):176–197, 2015.
- Deepdsc: a deep learning method to predict drug sensitivity of cancer cell lines. IEEE/ACM transactions on computational biology and bioinformatics, 18(2):575–582, 2019.
- p53 is a direct transcriptional repressor of keratin 17: lessons from a rat model of radiation dermatitis. Journal of Investigative Dermatology, 136(3):680–689, 2016.
- The molecular signatures database hallmark gene set collection. Cell systems, 1(6):417–425, 2015.
- Zachary C Lipton. The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue, 16(3):31–57, 2018.
- COT: an efficient and accurate method for detecting marker genes among many subtypes. Bioinformatics Advances, 2(1):vbac037, 2022.
- rcellminer: exploring molecular profiles and drug response of the nci-60 cell lines in r. Bioinformatics, 32(8):1272–1274, 2016.
- Cellminer cross-database (cellminercdb) version 1.2: Exploration of patient-derived cancer cell line pharmacogenomics. Nucleic acids research, 49(D1):D1083–D1093, 2021.
- Deep learning in mining biological data. Cognitive Computation, 13:1 – 33, 2020. 10.1007/s12559-020-09773-x.
- Harry L Morgan. The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service. Journal of chemical documentation, 5(2):107–113, 1965.
- Inhibitors of abcb1 and abcg2 overcame resistance to topoisomerase inhibitors in small cell lung cancer. Thoracic Cancer, 13(15):2142–2151, 2022.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Predicting drug response based on multi-omics fusion and graph convolution. IEEE Journal of Biomedical and Health Informatics, 26(3):1384–1393, 2021.
- Abc transporter efflux pumps: a defense mechanism against ivermectin in rhipicephalus (boophilus) microplus. International journal for parasitology, 41(13-14):1323–1333, 2011.
- Pathway commons 2019 update: integration, analysis and exploration of pathway data. Nucleic acids research, 48(D1):D489–D497, 2020.
- Evaluation of a keratin 1 targeting peptide-doxorubicin conjugate in a mouse model of triple-negative breast cancer. Pharmaceutics, 13(5):661, 2021.
- Masked label prediction: Unified message passing model for semi-supervised classification. arXiv preprint arXiv:2009.03509, 2020.
- Robert H Shoemaker. The nci60 human tumour cell line anticancer drug screen. Nature Reviews Cancer, 6(10):813–823, 2006.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
- Flaml: A fast and lightweight automl library. Proceedings of Machine Learning and Systems, 3:434–447, 2021.
- Improved anticancer drug response prediction in cell lines using matrix factorization with similarity regularization. BMC cancer, 17(1):1–12, 2017.
- Dna damaging drugs. Molecular therapies of cancer, pages 9–112, 2015.
- Drugbank 5.0: a major update to the drugbank database for 2018. Nucleic acids research, 46(D1):D1074–D1082, 2018.
- Science forum: Author-sourced capture of pathway knowledge in computable form using biofactoid. Elife, 10:e68292, 2021.
- Genomics of drug sensitivity in cancer (gdsc): a resource for therapeutic biomarker discovery in cancer cells. Nucleic acids research, 41(D1):D955–D961, 2012.
- On explainability of graph neural networks via subgraph explorations. In International Conference on Machine Learning, pages 12241–12252. PMLR, 2021.
- Explainability in graph neural networks: A taxonomic survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- A novel heterogeneous network-based method for drug response prediction in cancer cell lines. Scientific reports, 8(1):1–9, 2018.