SurvAgent: Hierarchical CoT-Enhanced Case Banking and Dichotomy-Based Multi-Agent System for Multimodal Survival Prediction (2511.16635v1)

Published 20 Nov 2025 in cs.CV and cs.CL

Abstract: Survival analysis is critical for cancer prognosis and treatment planning, yet existing methods lack the transparency essential for clinical adoption. While recent pathology agents have demonstrated explainability in diagnostic tasks, they face three limitations for survival prediction: inability to integrate multimodal data, ineffective region-of-interest exploration, and failure to leverage experiential learning from historical cases. We introduce SurvAgent, the first hierarchical chain-of-thought (CoT)-enhanced multi-agent system for multimodal survival prediction. SurvAgent consists of two stages: (1) WSI-Gene CoT-Enhanced Case Bank Construction employs hierarchical analysis through Low-Magnification Screening, Cross-Modal Similarity-Aware Patch Mining, and Confidence-Aware Patch Mining for pathology images, while Gene-Stratified analysis processes six functional gene categories. Both generate structured reports with CoT reasoning, storing complete analytical processes for experiential learning. (2) Dichotomy-Based Multi-Expert Agent Inference retrieves similar cases via RAG and integrates multimodal reports with expert predictions through progressive interval refinement. Extensive experiments on five TCGA cohorts demonstrate SurvAgent's superority over conventional methods, proprietary MLLMs, and medical agents, establishing a new paradigm for explainable AI-driven survival prediction in precision oncology.

Summary

The paper introduces a hierarchical multi-agent system combining chain-of-thought case banking with dichotomy-based inference for multimodal survival prediction.
It integrates multi-scale whole slide image analysis and gene-stratified reports to deliver transparent, interpretable predictions with enhanced clinical utility.
Experimental results on five TCGA cohorts demonstrate significant improvements in C-index and robust risk stratification compared to state-of-the-art methods.

SurvAgent: Hierarchical CoT-Enhanced Case Banking and Dichotomy-Based Multi-Agent System for Multimodal Survival Prediction

Introduction and Motivation

Survival prediction is central to precision oncology, especially for cancer prognosis and treatment planning. While multimodal approaches—particularly those leveraging whole slide images (WSIs) and genomics—have demonstrated improved prognostic power, the explainability and transparency essential for clinical adoption remain limited in state-of-the-art (SOTA) models. Current LLM–based medical and pathology agents deliver explainable diagnostics, but they exhibit several constraints: they are frequently unimodal, employ suboptimal region-of-interest (ROI) exploration, and lack mechanisms for experiential learning from historical cases.

SurvAgent addresses these challenges via a two-stage, hierarchical multi-agent system: (1) CoT-enhanced (chain-of-thought) case banking for both WSIs and genomics and (2) a dichotomy-based multi-expert inference agent leveraging retrieval-augmented generation (RAG) and progressive interval refinement for transparent, multimodal survival prediction.

SurvAgent Framework

SurvAgent’s architecture comprises two principal stages, visualized in the following overview:

Figure 1: The SurvAgent pipeline integrates hierarchical WSI analysis and gene-stratified analysis for CoT-enhanced case bank construction, followed by multi-expert, RAG-based inference via progressive survival interval refinement.

WSI-Gene CoT-Enhanced Case Bank Construction

Hierarchical WSI Case Bank

WSI analysis is realized through a multi-magnification pipeline:

Low-Magnification Screening (LMScreen): PathAgent generates slide-level reports at $2.5\times$ capturing architectural context.
CoSMining (Cross-Modal Similarity-Aware Patch Mining): At $10\times$ , redundant patches are excluded based on both feature space (self-patch similarity) and text space (self-report similarity).
ConfMining (Confidence-Aware Patch Mining): At $20\times$ , high-magnification subdivision is selectively triggered for patches with low analytic confidence, ensuring efficient yet thorough exploration of uncertain ROIs.

All patch- and global-level outputs are standardized using a curated WSI attribute checklist, producing structured, interpretable multi-scale reports.

Figure 2: The WSI Attribute Checklist standardizes extraction of prognostic histopathological features from whole slide images, supporting machine-driven reporting and clinical interpretability.

Every report is further processed by PathAgent for chain-of-thought explanation, with a self-critique mechanism (using Qwen2.5-32B quality assessment) ensuring high-fidelity reasoning trajectories. This triplet—summarized report, CoT, and ground truth—is deposited in the WSI CoT case bank for future retrieval.

Gene-Stratified Case Bank

Genomic features are abstracted and organized into six clinically relevant gene categories: tumor suppressors, oncogenes, kinases, differentiation markers, transcription factors, cytokines/growth factors. For each, GenAgent computes global and mutation statistics, selects type-specific key genes via integrative knowledge retrieval, and produces exhaustive, structured genomic reports.

Similar to WSIs, chain-of-thought explanations and refinements are generated for each case and deposited in the gene case bank.

Dichotomy-Based Multi-Expert Inference

At test time, SurvAgent performs:

Hierarchical WSI and Gene Analysis: Test samples are processed identically to training samples, producing structured, multi-scale, and multi-type reports.
Retrieval-Augmented Generation: The system retrieves the $K$ most similar cases (in aggregate WSI-gene feature space) from both banks, leveraging their CoTs and outcomes.
Multi-Expert Integration: SurvAgent combines predictions from several deep survival models (including multimodal co-attention transformers and other baselines).
Dichotomy Reasoning: Instead of regressing survival time directly, the reasoning agent employs a progressive, hierarchical binary partitioning. It first assigns to a broad survival interval, then recursively refines into narrower intervals, finally regressing the survival time within the chosen stratum.
Comprehensive Logging: The agent outputs the final survival prediction, structured reports, and a transparent decision rationale, mimicking clinical reasoning.

This is operationalized with strong prompt engineering (see Figures 9–12) to ensure modular, interpretable outputs at every stage.

Experimental Results

Performance Benchmarking

SurvAgent was benchmarked against classic unimodal/multimodal models, proprietary frontier MLLMs (e.g., Gemini-2.5-Pro, Claude-4.5, GPT-5), and advanced medical agents (MedAgent, MDAgent) on five TCGA cancer cohorts (BLCA, BRCA, GBMLGG, LUAD, UCEC) using cross-validated C-index.

Model Category	Top Baseline	SurvAgent (C-index, overall)	SurvAgent Gain (absolute, %)
Conventional Multimodal (MOTCat)	0.706	0.713	+0.7
Proprietary MLLMs (Gemini-2.5-Pro)	0.541	0.713	+17.2
Multi-Agent (MDAgent)	0.514	0.713	+19.9
Pathology-Specific Agents (WSI-Agent)	0.524	0.713	+19.0*

*SurvAgent surpasses all SOTA comparators both in absolute C-index and through consistent improvements across cancer types.

Kaplan-Meier Stratification

SurvAgent’s dichotomy-based inferential logic produced statistically significant (p < 0.05) separation in low-risk vs. high-risk groups on all five cancer cohorts, compared to the inconsistent or insignificant stratification of proprietary MLLMs and non-task-specific multi-agent architectures.

Figure 3: Kaplan-Meier survival curves for SurvAgent–predicted high- and low-risk subgroups demonstrate robust, significant stratification across five TCGA datasets.

Ablation Analysis

Removing either the WSI or gene case bank decreases performance substantially, with the largest drop observed when eliminating dichotomy-based multi-agent inference. This quantifies the additive value of each module and highlights the synergistic effect of multimodal, experiential, and interpretable reasoning mechanisms.

Explainability and Case Study Analysis

SurvAgent provides granular, interpretable reasoning for each prediction, as illustrated via case analysis.

Figure 4: Example of SurvAgent’s multi-level, cross-modal explainability on case TCGA-XF-A9SU, visualizing detailed WSI and gene analysis and full CoT trajectories.

Structured WSI and gene summaries, extracted prognostic attributes (e.g., sarcomatoid differentiation, perineural invasion, TP53 amplification), negative/positive evidence across modalities, and explicit documentation of confidence and analytical uncertainty are presented for each case. In handling contradictory findings—such as sarcomatoid histology with variable genomic profiles—the agent transparently weighs and resolves evidence through dichotomy-based reasoning.

Figure 5: Complete SurvAgent reasoning outputs—including WSI and gene summaries, final prediction, and ground truth—for TCGA-XF-A9SJ.

Figure 6: Example generation and CoT outputs for TCGA-G2-A2EL, illustrating the agentic inferential pipeline.

Agentic Infrastructure and Prompt Engineering

SurvAgent is built from scratch, integrating PathGen-LLaVA and Qwen2.5-32B-Instruct for vision-language and generative capacities, with DeepSeek-V3.2 for knowledge retrieval. The system uses tailored prompts for each agent to enforce conformity to clinical checklists, harmonize gene category analytics, and scaffold the dichotomy-based multi-agent inference routine.

Figure 7: Visualization of SurvAgent’s CoT Case Bank, storing multi-level reasoning traces for efficient RAG-based retrieval and experiential learning.

(Figures 9–12)

Figure 8: WSI report extraction via attribute checklist prompt. Figure 9: Gene class statistical feature analysis and selection prompt for tumor suppressor genes. Figure 10: Inference prompt for exact survival time prediction (using retrieved case reports and summaries). Figure 11: Inference prompt for coarse survival interval assignment (integrating RAG and multi-expert model outputs).

Practical and Theoretical Implications

Practical implications are substantial: SurvAgent provides explainable, multimodal survival predictions directly aligned with the reasoning paradigms used by oncology clinicians. Its chain-of-thought and case-retentive design facilitate clinical validation, transparent patient counseling, and the integration of experiential (“case memory”) knowledge absent in most LLM agents.

Theoretically, SurvAgent operationalizes hierarchical CoT case banks and dichotomous agentic inference as a paradigm for combining retrieval, multi-scale/multimodal mining, and agent collaboration. The approach advocates for combining vision-language foundation models with modular agent design and self-critique, setting a strong precedent for interpretable medical AI going forward.

Conclusion

SurvAgent introduces a new multi-agent architecture for multimodal survival prediction, outperforming both classic and emergent methods in accuracy and interpretability. Its case-based, CoT-enhanced structure and dichotomy-based agentic inference set a benchmark for next-generation, clinically aligned AI systems in oncology and broader medical decision support. Future work could fuse deeper longitudinal evidence, extend to further modalities (e.g., radiomics, clinical narratives), and instantiate continuous self-evolution via interactive learning from new cases.