XAI meets Biology: A Comprehensive Review of Explainable AI in Bioinformatics Applications (2312.06082v1)
Abstract: AI, particularly machine learning and deep learning models, has significantly impacted bioinformatics research by offering powerful tools for analyzing complex biological data. However, the lack of interpretability and transparency of these models presents challenges in leveraging these models for deeper biological insights and for generating testable hypotheses. Explainable AI (XAI) has emerged as a promising solution to enhance the transparency and interpretability of AI models in bioinformatics. This review provides a comprehensive analysis of various XAI techniques and their applications across various bioinformatics domains including DNA, RNA, and protein sequence analysis, structural analysis, gene expression and genome analysis, and bioimaging analysis. We introduce the most pertinent machine learning and XAI methods, then discuss their diverse applications and address the current limitations of available XAI tools. By offering insights into XAI's potential and challenges, this review aims to facilitate its practical implementation in bioinformatics research and help researchers navigate the landscape of XAI tools.
- Deep learning. nature, 521(7553):436–444, 2015.
- Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90, 2017.
- Deep learning for environmentally robust speech recognition: An overview of recent developments. ACM Transactions on Intelligent Systems and Technology (TIST), 9(5):1–28, 2018.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Deep learning in robotics: a review of recent research. Advanced Robotics, 31(16):821–835, 2017.
- Attentionsplice: An interpretable multi-head self-attention based hybrid deep learning model in splice site prediction. Chinese Journal of Electronics, 31(5):870–887, 2022.
- Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions. Bioinformatics, 39(2):btad046, 2023.
- Mapping the glycosyltransferase fold landscape using interpretable deep learning. Nature communications, 12(1):1–12, 2021.
- Deepcda: deep cross-domain compound–protein affinity prediction through lstm and convolutional neural networks. Bioinformatics, 36(17):4633–4642, 2020.
- Deepcombi: explainable artificial intelligence for the analysis and discovery in genome-wide association studies. NAR genomics and bioinformatics, 3(3):lqab065, 2021.
- Explaining decisions of graph convolutional neural networks: patient-specific molecular subnetworks responsible for metastasis prediction in breast cancer. Genome medicine, 13:1–16, 2021.
- Developing an explainable deep learning boundary correction method by incorporating cascaded x-dim models to improve segmentation defects in liver ct images. Computers in Biology and Medicine, 140:105106, 2022.
- A hybrid explainable ensemble transformer encoder for pneumonia identification from chest x-ray images. Journal of Advanced Research, 2022.
- Machine learning for the diagnosis of parkinson’s disease: a review of literature. Frontiers in aging neuroscience, 13:633752, 2021.
- Machine learning approaches and databases for prediction of drug–target interaction: a survey paper. Briefings in bioinformatics, 22(1):247–269, 2021.
- Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
- Yoon Kim. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1746–1751, Doha, Qatar, October 2014. Association for Computational Linguistics. doi:10.3115/v1/D14-1181. URL https://aclanthology.org/D14-1181.
- Language models of protein sequences at the scale of evolution enable accurate structure prediction. BioRxiv, 2022.
- Tree visualizations of protein sequence embedding space enable improved functional clustering of diverse protein superfamilies. Briefings in Bioinformatics, 24(1):bbac619, 2023a.
- Alignment-free estimation of sequence conservation for identifying functional sites using protein sequence embeddings. Briefings in Bioinformatics, 24(1):bbac599, 2023b.
- M Madhavi and P Supraja. Efficient explainable deep learning technique for covid-19 diagnosis based on computed tomography scan images of lungs. In AIP Conference Proceedings, volume 2385, page 050001. AIP Publishing LLC, 2022.
- An explainable machine learning model for early detection of parkinson’s disease using lime on datscan imagery. Computers in Biology and Medicine, 126:104041, 2020.
- Prostate cancer classification from ultrasound and mri images using deep learning based explainable artificial intelligence. Future Generation Computer Systems, 127:462–472, 2022.
- Volumetric breast density estimation on mri using explainable deep learning regression. Scientific Reports, 10(1):1–9, 2020.
- Accelerating prediction of malignant cerebral edema after ischemic stroke with automated image analysis and explainable neural networks. Neurocritical Care, 36(2):471–482, 2022.
- Verifying explainability of a deep learning tissue classifier trained on rna-seq data. Scientific reports, 11(1):2641, 2021.
- Explainable autoencoder-based representation learning for gene expression data. bioRxiv, pages 2021–12, 2021.
- Smash: a scalable, general marker gene identification framework for single-cell rna-sequencing. BMC bioinformatics, 23(1):328, 2022.
- Stratified neural networks in a time-to-event setting. Briefings in Bioinformatics, 23(1):bbab392, 2022.
- Pathme: pathway based multi-modal sparse autoencoders for clustering of patient-level multi-omics data. BMC bioinformatics, 21:1–20, 2020.
- Xomivae: an interpretable deep learning model for cancer classification using high-dimensional omics data. Briefings in Bioinformatics, 22(6):bbab315, 2021.
- Cfa: An explainable deep learning model for annotating the transcriptional roles of cis-regulatory modules based on epigenetic codes. Computers in Biology and Medicine, 152:106375, 2023.
- Protein–ligand binding affinity prediction with edge awareness and supervised attention. iScience, page 105892, 2022.
- Explainable deep learning for biomarker classification of oct images. In 2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE), pages 204–210. IEEE, 2020.
- Gokhan Altan. Deepoct: An explainable deep learning architecture to analyze macular edema on oct images. Engineering Science and Technology, an International Journal, 34:101091, 2022.
- Covidscreen: explainable deep learning framework for differential diagnosis of covid-19 using chest x-rays. Neural Computing and Applications, 33(14):8871–8892, 2021.
- A deep learning approach considering image background for pneumonia identification using explainable ai (xai). IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2022a.
- Deepcovidexplainer: explainable covid-19 diagnosis from chest x-ray images. In 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 1034–1037. IEEE, 2020.
- Explainable deep neural models for covid-19 prediction from chest x-rays with region of interest visualization. In 2021 2nd International Conference on Secure Cyber Computing and Communications (ICSCCC), pages 96–101. IEEE, 2021.
- Interactive deep learning for explainable retinal disease classification. In Medical Imaging 2022: Image Processing, volume 12032, pages 148–155. SPIE, 2022.
- Explainable transformer-based deep learning model for the detection of malaria parasites from blood cell images. Sensors, 22(12):4358, 2022.
- Explainable detection of myocardial infarction using deep learning models with grad-cam technique on ecg signals. Computers in Biology and Medicine, 146:105550, 2022.
- Onconetexplainer: explainable predictions of cancer types based on gene expression data. In 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE), pages 415–422. IEEE, 2019.
- Deepclasspathway: Molecular pathway aware classification using explainable deep learning. European Journal of Cancer, 176:41–49, 2022.
- xdeep-acpep: deep learning method for anticancer peptide activity prediction based on convolutional neural network and multitask learning. Journal of chemical information and modeling, 61(8):3789–3803, 2021.
- Explainable deep drug–target representations for binding affinity prediction. BMC bioinformatics, 23(1):1–24, 2022.
- Deep learning model for identifying critical structural motifs in potential endocrine disruptors. Journal of chemical information and modeling, 61(5):2187–2197, 2021.
- Mgraphdta: deep multiscale graph neural network for explainable drug–target binding affinity prediction. Chemical science, 13(3):816–833, 2022b.
- Dnabert-based explainable lncrna identification in plant genome assemblies. bioRxiv, pages 2022–02, 2022.
- Effective gene expression prediction from sequence by integrating long-range interactions. Nature methods, 18(10):1196–1203, 2021.
- Deephint: understanding hiv-1 integration via deep learning with attention. Bioinformatics, 35(10):1660–1667, 2019.
- Deephpv: a deep learning model to predict human papillomavirus integration sites. Briefings in Bioinformatics, 22(4):bbaa242, 2021.
- Deepebv: a deep learning model to predict epstein–barr virus (ebv) integration sites. Bioinformatics, 37(20):3405–3411, 2021.
- Modeling enhancer-promoter interactions with attention-based neural networks. bioRxiv, page 219667, 2017.
- Explainable deep relational networks for predicting compound–protein affinities and contacts. Journal of chemical information and modeling, 61(1):46–66, 2020.
- Deepaffinity: interpretable deep learning of compound–protein affinity through unified recurrent and convolutional neural networks. Bioinformatics, 35(18):3329–3338, 2019.
- Interpretable drug target prediction using deep neural representation. In IJCAI, volume 2018, pages 3371–3377, 2018.
- Deep gonet: self-explainable deep neural network based on gene ontology for phenotype prediction from gene expression data. BMC bioinformatics, 22(10):1–25, 2021.
- Classification and functional analysis between cancer and normal tissues using explainable pathway deep learning through rna-sequencing gene expression. International Journal of Molecular Sciences, 22(21):11531, 2021.
- Interpretable deep neural network for cancer survival analysis by integrating genomic and clinical data. BMC medical genomics, 12(10):1–13, 2019.
- Knowledge-primed neural networks enable biologically interpretable deep learning on single-cell sequencing data. Genome biology, 21(1):1–36, 2020.
- Fully interpretable deep learning model of transcriptional control. Bioinformatics, 36(Supplement_1):i499–i507, 2020.
- Pmvae: Learning interpretable single-cell representations with pathway modules. bioRxiv, pages 2021–01, 2021.
- Learning interpretable latent autoencoder representations with annotations of feature sets. bioRxiv, pages 2020–12, 2020.
- Learning interpretable cellular responses to complex perturbations in high-throughput screens. BioRxiv, pages 2021–04, 2021.
- " why should i trust you?" explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016.
- A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017.
- Lloyd S Shapley et al. A value for n-person games. 1953.
- Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2921–2929, 2016.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017.
- Long-range enhancer–promoter contacts in gene expression control. Nature Reviews Genetics, 20(8):437–455, 2019.
- Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the drosophila genome. Proceedings of the National Academy of Sciences, 99(2):757–762, 2002.
- Cancerppd: a database of anticancer peptides and proteins. Nucleic acids research, 43(D1):D837–D843, 2015.
- sc-pdb: a 3d-database of ligandable binding sites—10 years on. Nucleic acids research, 43(D1):D399–D404, 2015.
- Organization and expression of eucaryotic split genes coding for proteins. Annual review of biochemistry, 50(1):349–383, 1981.
- Hocomoco: a comprehensive collection of human transcription factor binding sites models. Nucleic acids research, 41(D1):D195–D202, 2013.
- Interpretation of convolutional neural networks reveals crucial sequence features involving in transcription during fiber development. BMC bioinformatics, 23(1):91, 2022.
- Jaspar 2018: update of the open-access database of transcription factor binding profiles and its web framework. Nucleic acids research, 46(D1):D260–D266, 2018.
- Structural basis for selective inhibition of cyclooxygenase-2 by anti-inflammatory agents. Nature, 384(6610):644–648, 1996.
- Structural basis for molecular recognition at serotonin receptors. Science, 340(6132):610–614, 2013.
- An allosteric ligand-binding site in the extracellular cap of k2p channels. Nature communications, 8(1):378, 2017.
- Structure-based discovery of selective serotonin 5-ht1b receptor ligands. Structure, 22(8):1140–1151, 2014.
- Towards full quantum-mechanics-based protein–ligand binding affinities. ChemPhysChem, 18(8):898–905, 2017.
- Alarms about structural alerts. Green Chemistry, 18(16):4348–4360, 2016.
- Endocrine disruptors: from endocrine to metabolic disruption. Annual review of physiology, 73:135–162, 2011.
- An in silico explainable multiparameter optimization approach for de novo drug design against proteins from the central nervous system. Journal of Chemical Information and Modeling, 62(11):2685–2695, 2022.
- Update on epa’s toxcast program: providing high throughput decision support tools for chemical risk management. Chemical research in toxicology, 25(7):1287–1302, 2012.
- Learning important features through propagating activation differences. In International conference on machine learning, pages 3145–3153. PMLR, 2017.
- Explainable deep learning for augmentation of small rna expression profiles. Journal of Computational Biology, 27(2):234–247, 2020.
- A pipeline for the implementation and visualization of explainable machine learning for medical imaging using radiomics features. Sensors, 22(14):5205, 2022.
- Zhongliang Zhou (5 papers)
- Mengxuan Hu (14 papers)
- Mariah Salcedo (1 paper)
- Nathan Gravel (1 paper)
- Wayland Yeung (1 paper)
- Aarya Venkat (1 paper)
- Dongliang Guo (9 papers)
- Jielu Zhang (7 papers)
- Natarajan Kannan (1 paper)
- Sheng Li (217 papers)