Feasibility of overcoming spurious “causal” feature identification in ML-based whole-genome phenotype prediction
Determine whether machine learning pipelines for bacterial whole-genome phenotype prediction can reliably overcome the hurdle of falsely identified "causal" features by enabling interpretations that distinguish truly causal genetic variants from spurious associations when models are trained on high-dimensional genomic data.
References
Though it is not yet clear whether we can overcome this hurdle, significant efforts are being made towards discovering potential high-risk bacterial genetic variants.
                — Whole-Genome Phenotype Prediction with Machine Learning: Open Problems in Bacterial Genomics
                
                (2502.07749 - James et al., 11 Feb 2025) in Abstract