An Agentic System for Rare Disease Diagnosis with Traceable Reasoning

Published 25 Jun 2025 in cs.CL, cs.AI, cs.CV, and cs.MA | (2506.20430v2)

Abstract: Rare diseases collectively affect over 300 million individuals worldwide, yet timely and accurate diagnosis remains a pervasive challenge. This is largely due to their clinical heterogeneity, low individual prevalence, and the limited familiarity most clinicians have with rare conditions. Here, we introduce DeepRare, the first rare disease diagnosis agentic system powered by a LLM, capable of processing heterogeneous clinical inputs. The system generates ranked diagnostic hypotheses for rare diseases, each accompanied by a transparent chain of reasoning that links intermediate analytic steps to verifiable medical evidence. DeepRare comprises three key components: a central host with a long-term memory module; specialized agent servers responsible for domain-specific analytical tasks integrating over 40 specialized tools and web-scale, up-to-date medical knowledge sources, ensuring access to the most current clinical information. This modular and scalable design enables complex diagnostic reasoning while maintaining traceability and adaptability. We evaluate DeepRare on eight datasets. The system demonstrates exceptional diagnostic performance among 2,919 diseases, achieving 100% accuracy for 1013 diseases. In HPO-based evaluations, DeepRare significantly outperforms other 15 methods, like traditional bioinformatics diagnostic tools, LLMs, and other agentic systems, achieving an average Recall@1 score of 57.18% and surpassing the second-best method (Reasoning LLM) by a substantial margin of 23.79 percentage points. For multi-modal input scenarios, DeepRare achieves 70.60% at Recall@1 compared to Exomiser's 53.20% in 109 cases. Manual verification of reasoning chains by clinical experts achieves 95.40% agreements. Furthermore, the DeepRare system has been implemented as a user-friendly web application http://raredx.cn/doctor.

Abstract PDF Upgrade to Chat

Authors (13)

Summary

The paper introduces DeepRare, which generates a ranked list of rare disease hypotheses with evidence-based reasoning.
It utilizes a tiered architecture combining a long-term memory host, specialized agent servers, and external data sources to process clinical inputs through over 40 tools.
It achieved remarkable diagnostic performance, including 70.60% Recall@1 for multi-modal cases and 95.40% agreement on reasoning chains by clinical experts.

An Agentic System for Rare Disease Diagnosis with Traceable Reasoning

The paper introduces DeepRare, an innovative agentic system specifically designed for rare disease diagnosis, leveraging a LLM to process complex clinical inputs, including free-text descriptions, Human Phenotype Ontology (HPO) terms, and genomic variants. DeepRare aims to generate a ranked list of diagnostic hypotheses for rare diseases while providing a transparent reasoning chain that links intermediate analytic steps to verifiable medical evidence. This clarity of diagnostics is essential for clinical adoption, enabling human-AI collaboration in diagnostic workflows.

System Architecture and Workflow

DeepRare employs a tiered architecture comprising three key components: a central host with a long-term memory module, specialized agent servers, and extensive external data sources. The central host is responsible for orchestrating diagnostic processes and integrating collected evidence into a coherent context. Specialized agent servers handle domain-specific analytical tasks such as phenotype extraction and variant prioritization, utilizing over 40 tools and up-to-date medical knowledge sources. This modular design promotes complex diagnostic reasoning while ensuring traceability and adaptability.

Figure 1: DeepRare: An agentic framework for rare disease prioritization. (a) System workflow: Multi-modal patient data (HPO terms, genomic variants) are processed through a tiered MCP-inspired architecture, generating a ranked Top-K diagnosis list with evidence-supported reasoning chains. (b) Knowledge architecture: Sunburst visualization depicting hierarchical integration of diagnostic tools and biomedical knowledge sources within DeepRare. (c) Multi-center benchmark characteristics: Case distributions, phenotypic complexity (HPO metrics), disease spectrum, provenance, and genetic annotation status (solid: confirmed pathogenic variants; half-solid: candidate variants extracted; hollow: no genetic data). (d) Performance benchmarking: Comparative evaluation across diagnostic APIs, general-purpose LLMs, reasoning-enhanced LLMs, medically-tuned LLMs, and agentic systems.

Performance and Benchmarking

DeepRare was evaluated across eight datasets from Asia, North America, and Europe, involving 2,919 diseases spanning specialties such as neurology and genetics. The system achieved remarkable diagnostic performance, with 100% accuracy for 1,013 diseases and significant outperformance over 15 comparative methods, including traditional bioinformatics tools and other LLM-based systems. In HPO-based evaluations, DeepRare's average Recall@1 score of 57.18% exceeded the second-best method by a significant margin of 23.79 percentage points. Under multi-modal input scenarios, DeepRare achieved 70.60% at Recall@1, outperforming Exomiser's 53.20% in 109 cases.

The manual verification by clinical experts yielded a 95.40% agreement on reasoning chains, substantiating the system's intermediate steps as medically valid and traceable. Such high reliability emphasizes DeepRare's potential as a trustworthy decision support tool in rare disease diagnostics.

Diagnostic Accuracy Across Specialties

The system demonstrated substantial performance across diverse medical specialties, asserting its broad understanding of medical knowledge. In the Endocrine System category, it achieved a top-1 diagnostic accuracy of 60%, notably higher than competing methods. DeepRare also excelled in the Kidneys and Urinary System category with an accuracy of 66%, highlighting its clinical application prowess.

Web Application Deployment

The DeepRare system has been deployed as a user-friendly web application to facilitate clinical adoption. It allows users to input patient demographics, clinical presentations, and family histories to obtain diagnostic predictions. The platform supports the upload of supplementary materials such as case reports and diagnostic imaging, promoting comprehensive patient assessments.

Implications for Future Developments

DeepRare addresses critical challenges such as the dynamic nature of rare disease knowledge, the scarcity of data, and the necessity for transparency and traceability in clinical diagnostics. By offering evidence-based reasoning chains, the system reduces the time for literature review, thereby enhancing diagnostic efficiency in healthcare settings.

Future avenues may include refining retrieval mechanisms for more precise knowledge curation and expanding the agentic system to encompass rare disease treatment and prognosis prediction. This could transform DeepRare into an even more versatile ecosystem for rare disease management.

Conclusion

DeepRare provides an integrated framework for rare disease diagnosis with substantial improvements over existing methods, emphasizing both diagnostic accuracy and reasoning transparency. Its implementation as a web application and validation across multiple datasets underscore its practical applicability and promise in transforming rare disease diagnostics through computational intelligence.

Markdown Report Issue