Relevance of Genes Selected by ML Explainability in Expression-Based Phenotype Classification
Determine the biological relevance of gene sets selected by explainability methods for machine-learning classifiers trained on gene expression data, specifically methods such as integrated gradients applied to logistic regression, multilayer perceptron, and graph neural network models, by assessing whether these top-ranked genes consistently correspond to phenotype-associated biomarkers and established biological processes (e.g., via over-representation analysis against MSigDB collections).
References
Still, the question of the relevance of genes selected through ML models explainability remains unsolved.
— A Comparative Analysis of Gene Expression Profiling by Statistical and Machine Learning Approaches
(2402.00926 - Bontonou et al., 1 Feb 2024) in Abstract