- The paper introduces a hierarchical multi-agent system combining chain-of-thought case banking with dichotomy-based inference for multimodal survival prediction.
- It integrates multi-scale whole slide image analysis and gene-stratified reports to deliver transparent, interpretable predictions with enhanced clinical utility.
- Experimental results on five TCGA cohorts demonstrate significant improvements in C-index and robust risk stratification compared to state-of-the-art methods.
SurvAgent: Hierarchical CoT-Enhanced Case Banking and Dichotomy-Based Multi-Agent System for Multimodal Survival Prediction
Introduction and Motivation
Survival prediction is central to precision oncology, especially for cancer prognosis and treatment planning. While multimodal approaches—particularly those leveraging whole slide images (WSIs) and genomics—have demonstrated improved prognostic power, the explainability and transparency essential for clinical adoption remain limited in state-of-the-art (SOTA) models. Current LLM–based medical and pathology agents deliver explainable diagnostics, but they exhibit several constraints: they are frequently unimodal, employ suboptimal region-of-interest (ROI) exploration, and lack mechanisms for experiential learning from historical cases.
SurvAgent addresses these challenges via a two-stage, hierarchical multi-agent system: (1) CoT-enhanced (chain-of-thought) case banking for both WSIs and genomics and (2) a dichotomy-based multi-expert inference agent leveraging retrieval-augmented generation (RAG) and progressive interval refinement for transparent, multimodal survival prediction.
SurvAgent Framework
SurvAgent’s architecture comprises two principal stages, visualized in the following overview:
Figure 1: The SurvAgent pipeline integrates hierarchical WSI analysis and gene-stratified analysis for CoT-enhanced case bank construction, followed by multi-expert, RAG-based inference via progressive survival interval refinement.
WSI-Gene CoT-Enhanced Case Bank Construction
Hierarchical WSI Case Bank
WSI analysis is realized through a multi-magnification pipeline:
- Low-Magnification Screening (LMScreen): PathAgent generates slide-level reports at 2.5× capturing architectural context.
- CoSMining (Cross-Modal Similarity-Aware Patch Mining): At 10×, redundant patches are excluded based on both feature space (self-patch similarity) and text space (self-report similarity).
- ConfMining (Confidence-Aware Patch Mining): At 20×, high-magnification subdivision is selectively triggered for patches with low analytic confidence, ensuring efficient yet thorough exploration of uncertain ROIs.
All patch- and global-level outputs are standardized using a curated WSI attribute checklist, producing structured, interpretable multi-scale reports.
Figure 2: The WSI Attribute Checklist standardizes extraction of prognostic histopathological features from whole slide images, supporting machine-driven reporting and clinical interpretability.
Every report is further processed by PathAgent for chain-of-thought explanation, with a self-critique mechanism (using Qwen2.5-32B quality assessment) ensuring high-fidelity reasoning trajectories. This triplet—summarized report, CoT, and ground truth—is deposited in the WSI CoT case bank for future retrieval.
Gene-Stratified Case Bank
Genomic features are abstracted and organized into six clinically relevant gene categories: tumor suppressors, oncogenes, kinases, differentiation markers, transcription factors, cytokines/growth factors. For each, GenAgent computes global and mutation statistics, selects type-specific key genes via integrative knowledge retrieval, and produces exhaustive, structured genomic reports.
Similar to WSIs, chain-of-thought explanations and refinements are generated for each case and deposited in the gene case bank.
Dichotomy-Based Multi-Expert Inference
At test time, SurvAgent performs:
- Hierarchical WSI and Gene Analysis: Test samples are processed identically to training samples, producing structured, multi-scale, and multi-type reports.
- Retrieval-Augmented Generation: The system retrieves the K most similar cases (in aggregate WSI-gene feature space) from both banks, leveraging their CoTs and outcomes.
- Multi-Expert Integration: SurvAgent combines predictions from several deep survival models (including multimodal co-attention transformers and other baselines).
- Dichotomy Reasoning: Instead of regressing survival time directly, the reasoning agent employs a progressive, hierarchical binary partitioning. It first assigns to a broad survival interval, then recursively refines into narrower intervals, finally regressing the survival time within the chosen stratum.
- Comprehensive Logging: The agent outputs the final survival prediction, structured reports, and a transparent decision rationale, mimicking clinical reasoning.
This is operationalized with strong prompt engineering (see Figures 9–12) to ensure modular, interpretable outputs at every stage.
Experimental Results
SurvAgent was benchmarked against classic unimodal/multimodal models, proprietary frontier MLLMs (e.g., Gemini-2.5-Pro, Claude-4.5, GPT-5), and advanced medical agents (MedAgent, MDAgent) on five TCGA cancer cohorts (BLCA, BRCA, GBMLGG, LUAD, UCEC) using cross-validated C-index.
| Model Category |
Top Baseline |
SurvAgent (C-index, overall) |
SurvAgent Gain (absolute, %) |
| Conventional Multimodal (MOTCat) |
0.706 |
0.713 |
+0.7 |
| Proprietary MLLMs (Gemini-2.5-Pro) |
0.541 |
0.713 |
+17.2 |
| Multi-Agent (MDAgent) |
0.514 |
0.713 |
+19.9 |
| Pathology-Specific Agents (WSI-Agent) |
0.524 |
0.713 |
+19.0* |
*SurvAgent surpasses all SOTA comparators both in absolute C-index and through consistent improvements across cancer types.
Kaplan-Meier Stratification
SurvAgent’s dichotomy-based inferential logic produced statistically significant (p < 0.05) separation in low-risk vs. high-risk groups on all five cancer cohorts, compared to the inconsistent or insignificant stratification of proprietary MLLMs and non-task-specific multi-agent architectures.
Figure 3: Kaplan-Meier survival curves for SurvAgent–predicted high- and low-risk subgroups demonstrate robust, significant stratification across five TCGA datasets.
Ablation Analysis
Removing either the WSI or gene case bank decreases performance substantially, with the largest drop observed when eliminating dichotomy-based multi-agent inference. This quantifies the additive value of each module and highlights the synergistic effect of multimodal, experiential, and interpretable reasoning mechanisms.
Explainability and Case Study Analysis
SurvAgent provides granular, interpretable reasoning for each prediction, as illustrated via case analysis.
Figure 4: Example of SurvAgent’s multi-level, cross-modal explainability on case TCGA-XF-A9SU, visualizing detailed WSI and gene analysis and full CoT trajectories.
Structured WSI and gene summaries, extracted prognostic attributes (e.g., sarcomatoid differentiation, perineural invasion, TP53 amplification), negative/positive evidence across modalities, and explicit documentation of confidence and analytical uncertainty are presented for each case. In handling contradictory findings—such as sarcomatoid histology with variable genomic profiles—the agent transparently weighs and resolves evidence through dichotomy-based reasoning.
Figure 5: Complete SurvAgent reasoning outputs—including WSI and gene summaries, final prediction, and ground truth—for TCGA-XF-A9SJ.
Figure 6: Example generation and CoT outputs for TCGA-G2-A2EL, illustrating the agentic inferential pipeline.
Agentic Infrastructure and Prompt Engineering
SurvAgent is built from scratch, integrating PathGen-LLaVA and Qwen2.5-32B-Instruct for vision-language and generative capacities, with DeepSeek-V3.2 for knowledge retrieval. The system uses tailored prompts for each agent to enforce conformity to clinical checklists, harmonize gene category analytics, and scaffold the dichotomy-based multi-agent inference routine.
Figure 7: Visualization of SurvAgent’s CoT Case Bank, storing multi-level reasoning traces for efficient RAG-based retrieval and experiential learning.
(Figures 9–12)
Figure 8: WSI report extraction via attribute checklist prompt.
Figure 9: Gene class statistical feature analysis and selection prompt for tumor suppressor genes.
Figure 10: Inference prompt for exact survival time prediction (using retrieved case reports and summaries).
Figure 11: Inference prompt for coarse survival interval assignment (integrating RAG and multi-expert model outputs).
Practical and Theoretical Implications
Practical implications are substantial: SurvAgent provides explainable, multimodal survival predictions directly aligned with the reasoning paradigms used by oncology clinicians. Its chain-of-thought and case-retentive design facilitate clinical validation, transparent patient counseling, and the integration of experiential (“case memory”) knowledge absent in most LLM agents.
Theoretically, SurvAgent operationalizes hierarchical CoT case banks and dichotomous agentic inference as a paradigm for combining retrieval, multi-scale/multimodal mining, and agent collaboration. The approach advocates for combining vision-language foundation models with modular agent design and self-critique, setting a strong precedent for interpretable medical AI going forward.
Conclusion
SurvAgent introduces a new multi-agent architecture for multimodal survival prediction, outperforming both classic and emergent methods in accuracy and interpretability. Its case-based, CoT-enhanced structure and dichotomy-based agentic inference set a benchmark for next-generation, clinically aligned AI systems in oncology and broader medical decision support. Future work could fuse deeper longitudinal evidence, extend to further modalities (e.g., radiomics, clinical narratives), and instantiate continuous self-evolution via interactive learning from new cases.