CT-Agent Framework
- CT-Agent is an agent-based AI framework for CT imaging that integrates computer vision, large language models, and planning tools to perform complex radiological tasks.
- It employs a modular, multi-agent architecture that decomposes CT analysis into specialized, tractable subtasks such as segmentation, protocol management, and structured report generation.
- The framework enhances transparency and robustness in medical imaging by maintaining structured memory and evidence-based reasoning, yielding state-of-the-art performance metrics in clinical applications.
A CT-Agent is an agent-based artificial intelligence framework designed to perform perception, reasoning, decision support, and interactive manipulation in the context of computed tomography (CT) imaging. Modern CT-Agents are characterized by multi-stage, modular, or multi-agent architectures that orchestrate heterogeneous task-specialized submodules—including computer vision models, LLMs, and domain-specific planning tools—enabling end-to-end pipelines for CT interpretation, acquisition control, reconstruction, protocol management, and clinical reporting. CT-Agents are rapidly proliferating in medical AI, with distinct variants developed for radiology report generation, protocol management, image reconstruction, nodule detection, denoising, segmentation, reinforcement-learning-based localization, and device interaction (Mao et al., 22 May 2025, Kang et al., 24 Sep 2025, Suarez-Rodriguez et al., 18 Sep 2025, Yang et al., 26 Nov 2025, Lin et al., 17 Apr 2026, Wang et al., 20 Feb 2026, Maksudov et al., 11 May 2026).
1. Core Principles and Architecture
The CT-Agent paradigm centers on three foundational components: (1) decomposition of complex CT tasks into tractable subtasks (e.g., regional analysis, sequential reporting, protocol editing), (2) explicit multi-tool or multi-agent collaboration, and (3) maintenance of structured memory or reasoning trace (Mao et al., 22 May 2025, Yang et al., 26 Nov 2025, Roschewitz et al., 16 Apr 2026, Lin et al., 17 Apr 2026, Wang et al., 20 Feb 2026). Architectures typically comprise:
- A planning module (controller, often an LLM or finite-state machine) responsible for decomposing user queries, dispatching subtasks, and orchestrating specialized tools.
- An action/tool space consisting of anatomy-aware vision models, domain-specific LoRA adapters, segmentation or detection heads, retrieval engines, and reporting generators.
- A memory module for tracking intermediate states, tool outputs, reports, and conversation history, often structured to allow multi-agent state sharing, region-specific caching, or evidence-based traceability.
Formally, a generic CT-Agent can be described as a tuple , with the planning process recursively updating the system state via: where is a tool invocation, and encodes task- and volume-specific context (Mao et al., 22 May 2025). This paradigm allows seamless integration of perception, reasoning, and interaction within a unified pipeline.
2. Modularity and Multi-Agent Collaboration
Contemporary CT-Agents exhibit explicit modularity through role specialization and agent collaboration. Key exemplars include:
- Multi-agent radiology:
- "LungNoduleAgent" structures lung nodule diagnosis into three intercommunicating agents: Nodule Spotter (detection), Simulated Radiologist (region-aware reporting), and Doctor Agent System (malignancy grading, knowledge graph reasoning). All modules interact via a central Memory, supporting region-level semantic alignment, structured report generation, and evidence-based consensus (Yang et al., 26 Nov 2025).
- "MARCH" models the conventional clinical hierarchy using Resident, Fellow, and Attending agents, with retrieval-grounded case review and multi-round consensus dialogue for report fidelity and hallucination suppression (Lin et al., 17 Apr 2026).
- Agentic tool orchestration:
- "RadAgent" and "3DMedAgent" adopt an iterative Reason–Act–Observe loop, chaining tool calls for perception, segmentation, VQA, reporting, and cross-modal reasoning. Their decision process is governed by checklists or diagnostic protocols and produces a machine-interpretable, clinician-inspectable trace for all intermediate steps (Roschewitz et al., 16 Apr 2026, Wang et al., 20 Feb 2026).
This modularization yields several advantages: improved robustness (segmental error containment), transparency (fine-grained traceability), and extensibility (plug-and-play tool addition, fusion with retrieval or clinical databases).
3. CT-Agent Applications
CT-Agents are instantiated in a diverse range of clinical and technical applications:
- Radiology report generation and VQA: CT-Agent frameworks decompose CT interpretation into region-guided question answering, structured reporting, and retrieval-augmented synthesis for improved clinical F1 and reduced hallucination rates (Mao et al., 22 May 2025, Roschewitz et al., 16 Apr 2026, Lin et al., 17 Apr 2026, Yang et al., 26 Nov 2025, Liang et al., 16 Mar 2026).
- Protocol management: LLM-based CT-Agents interpret and execute complex protocol modification requests, acting as interface layers between clinicians and scanner configuration, and supporting device-agnostic representation, tool calling, and validation (Kang et al., 24 Sep 2025).
- Reconstruction and denoising: Agent-based equilibrium frameworks (e.g., DICE) combine measurement-consistency agents (proximal solvers) with diffusion-model priors, achieving state-of-the-art sparse-view CT reconstruction performance. Agent-integrated denoising experts (A-IDE) route LDCT images to anatomy-specialized denoisers using zero-shot LLM-based semantic routing for improved RMSE, PSNR, SSIM (Suarez-Rodriguez et al., 18 Sep 2025, Cho et al., 21 Mar 2025).
- Reinforcement learning and active localization: RL-based CT-Agents self-teach to localize organs via sequential box-transform actions, outperforming classic regression and region-proposal algorithms in IoU and wall/centroid distance with substantially lower data requirements (Navarro et al., 2020, Wang et al., 2022, Wang et al., 19 Feb 2026).
- Interactive device control and benchmark evaluation: The ABRA benchmark operationalizes a standardized evaluation environment where CT-Agents interact with an OHIF viewer and Orthanc DICOM server via 21 function-calling tools, spanning slice navigation, window/level control, annotation, metadata QA, and longitudinal reporting (Maksudov et al., 11 May 2026).
4. Technical Innovations and Methodologies
CT-Agents integrate numerous innovations that underpin performance and extensibility:
- Hierarchical region-guided processing: Decomposition of CT volumes into anatomical regions, each handled by LoRA-adapted tools, supports fine-grained interpretation and region-specific QA/reporting (Mao et al., 22 May 2025).
- Global-local token compression: Hierarchical token aggregation and dominant-token selection drastically reduce the number of visual tokens while preserving cross-slice and context information, enabling LLM-based reasoning over entire volumes (Mao et al., 22 May 2025).
- Consensus and aggregation mechanisms: Multi-agent systems leverage iterative consensus (voting, averaging, stance-discourse) for robust finding adjudication and hallucination reduction (Yang et al., 26 Nov 2025, Lin et al., 17 Apr 2026).
- Region-level semantic alignment: Mask clustering (via DBSCAN on IoU-distance) and focal prompting align candidate detections with image regions, supporting accurate attribution and morphological correlation (Yang et al., 26 Nov 2025).
- Adaptive retrieval augmentation: Detection of embedding bottlenecks in 3D contrastive encoders motivates adaptive retrieval-augmented generation, as in AdaRAG-CT, which overcomes pathology coverage limitations by adaptively injecting text evidence at generation hot spots (Liang et al., 16 Mar 2026).
- Structured memory and reasoning trace: Agent memory systems archive image masks, measurement metrics, reports, intermediate summaries, and multi-agent dialogue, allowing transparent backward tracing from final output to evidence.
5. Evaluations, Metrics, and Benchmarks
CT-Agents are evaluated on task- and context-specific metrics, typically including:
- Detection: Mean average precision (mAP), F1-score, region-level IoU.
- Reporting: Clinical F1, macro/micro-F1 across pathologies, BLEU-n, ROUGE-L, METEOR, LLM-Judge score (fluency, relevance, consistency, rationality) (Lin et al., 17 Apr 2026, Yang et al., 26 Nov 2025).
- Segmentation/localization: Absolute wall/centroid distance, intersection-over-union.
- Reconstruction/denoising: PSNR, SSIM, RMSE.
- Protocol management: Syntax correctness rate (SCR), plan accuracy, plan faithfulness (cosine embedding), retrieval F1 (Kang et al., 24 Sep 2025).
- Agentic operation: Planning, Execution, Outcome (ABRA); trace coherence (tool-judge), trace faithfulness, outcome IoU (Maksudov et al., 11 May 2026).
Recent CT-Agents achieve F1 > 0.80 across clinical reporting and nodule grading, with multi-agent, modular, and retrieval-augmented systems demonstrating statistically significant improvements over end-to-end VLMs and monolithic models. Outcome gaps in real-image interactive tasks are attributed to perception failures rather than tool orchestration, as evidenced by sharp jumps when oracle detections are plugged in (Maksudov et al., 11 May 2026). The DeepChestVQA and RadGenome-ChestCT datasets support granular task benchmarking and ablation studies (Wang et al., 20 Feb 2026, Lin et al., 17 Apr 2026).
6. Limitations and Future Directions
Current limitations of CT-Agents include:
- Coverage scope: Pathology graphs and structured knowledge bases often focus on a single cancer subtype (e.g., adenocarcinoma), limiting generalizability across pathologic spectra (Yang et al., 26 Nov 2025).
- Temporal reasoning: Integration of longitudinal/time-series analysis remains a target for future work (Yang et al., 26 Nov 2025).
- Device API fragmentation: Heterogeneous scanner APIs necessitate vendor-specific adapters and challenge tool interoperability (Kang et al., 24 Sep 2025).
- Visual bottlenecks: Dimensional collapse in visual embeddings restricts fine-grained finding extraction, partially mitigated by adaptive retrieval (Liang et al., 16 Mar 2026).
- Clinical deployment barriers: Real-world integration requires PACS connectivity, DICOM metadata handling, workflow validation, and regulatory approval (Yang et al., 26 Nov 2025).
Future research areas include generalized multi-pathology graphs, continuous-time RL for real-time scan control (Wang et al., 19 Feb 2026), multi-modal fusion (PET-CT/MRI-CT), uncertainty quantification via ensemble sampling, and open clinical trials leveraging interactive agent benchmarks.
7. Impact and Prospects
CT-Agents represent a paradigm shift from monolithic, black-box VLMs to transparent, modular, multi-agent systems that closely mimic human clinical workflows. By combining fine-grained region guidance, explicit agent collaboration, structured reasoning trace, and evidence-grounded decision support, CT-Agents deliver state-of-the-art performance for 3D CT interpretation, with robust sensitivity, specificity, and clinical faithfulness (Yang et al., 26 Nov 2025, Lin et al., 17 Apr 2026). The modularity and extensibility of the CT-Agent framework positions it as the core architectural substrate for trustworthy, adaptive AI in next-generation radiology.