Survey and Improvement Strategies for Gene Prioritization with Large Language Models (2501.18794v1)

Published 30 Jan 2025 in q-bio.GN and cs.AI

Abstract: Rare diseases are challenging to diagnose due to limited patient data and genetic diversity. Despite advances in variant prioritization, many cases remain undiagnosed. While LLMs have performed well in medical exams, their effectiveness in diagnosing rare genetic diseases has not been assessed. To identify causal genes, we benchmarked various LLMs for gene prioritization. Using multi-agent and Human Phenotype Ontology (HPO) classification, we categorized patients based on phenotypes and solvability levels. As gene set size increased, LLM performance deteriorated, so we used a divide-and-conquer strategy to break the task into smaller subsets. At baseline, GPT-4 outperformed other LLMs, achieving near 30% accuracy in ranking causal genes correctly. The multi-agent and HPO approaches helped distinguish confidently solved cases from challenging ones, highlighting the importance of known gene-phenotype associations and phenotype specificity. We found that cases with specific phenotypes or clear associations were more accurately solved. However, we observed biases toward well-studied genes and input order sensitivity, which hindered gene prioritization. Our divide-and-conquer strategy improved accuracy by overcoming these biases. By utilizing HPO classification, novel multi-agent techniques, and our LLM strategy, we improved causal gene identification accuracy compared to our baseline evaluation. This approach streamlines rare disease diagnosis, facilitates reanalysis of unsolved cases, and accelerates gene discovery, supporting the development of targeted diagnostics and therapies.

Authors (12)

Matthew Neeley (26 papers)
Guantong Qi (2 papers)
Guanchu Wang (33 papers)
Ruixiang Tang (44 papers)
Dongxue Mao (1 paper)
Chaozhong Liu (1 paper)
Sasidhar Pasupuleti (1 paper)
Bo Yuan (151 papers)
Fan Xia (26 papers)
Pengfei Liu (191 papers)
Zhandong Liu (6 papers)
Xia Hu (186 papers)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/KNM/status/1886307418470052151

Survey and Improvement Strategies for Gene Prioritization with Large Language Models (2501.18794v1)

Summary

Related Papers

Tweets