GPT-RadPlan: Automated Radiotherapy Planner

Updated 8 October 2025

GPT-RadPlan is an automated radiotherapy planning framework that uses GPT-4V’s multimodal capabilities to analyze both dosimetric images and DVH data.
It integrates clinical protocols through in-context learning, enabling iterative optimization that adjusts treatment parameters for optimal target coverage and OAR sparing.
The system enhances planning efficiency and consistency by mimicking expert workflows and producing plans that meet or exceed manual planning benchmarks.

GPT-RadPlan is an automated radiotherapy treatment planning framework that leverages large multi-modal LLMs, specifically GPT-4Vision (GPT-4V), to integrate clinical protocols, expert plan evaluation, and iterative optimization. By combining in-context clinical knowledge with visual analysis, GPT-RadPlan functions both as plan evaluator and plan optimizer, delivering VMAT treatment plans that match or exceed manually generated clinical plans in terms of target coverage and organ-at-risk (OAR) sparing (Liu et al., 21 Jun 2024). The system is designed to operate as an interactive, protocol-guided expert, systematically refining optimization parameters based on its analysis of dose distributions and dose-volume histograms (DVHs), thus mimicking human planning workflows while amplifying efficiency and consistency.

GPT-RadPlan fundamentally relies on GPT-4Vision (GPT-4V), which is capable of processing both textual and visual inputs. In this architecture, GPT-4V ingests dose distribution images and DVH tabular data, parsing these with reference to embedded clinical protocols, constraints, and a small set of approved example plans. The system uses in-context learning to assimilate the decision rules implicit in planning objectives—such as acceptable PTV coverage and upper bounds on OAR doses. GPT-4V’s output consists of free-text feedback specifying necessary modifications to optimization parameters, such as adjusting the relative weights assigned to coverage and sparing in the inner-loop planning engine.

The framework is structured hierarchically: the planning engine (e.g., matRad or equivalent inverse planning software) performs fluence map optimization using current weights and constraints; GPT-4V acts as the outer-loop expert, evaluating the output and proposing iterative updates. Historical planning trajectories—images, DVHs, feedback, and parameter tweaks—are retained in a memory module to streamline learning from previous cases.

2. Clinical Protocol Incorporation and Optimization

Clinical protocols, encompassing prescribed target doses, dose constraints for various OARs, and qualitative objectives (e.g., homogeneity, conformity), are presented to GPT-RadPlan as contextual prompts. The model thus receives not only raw dosimetric data but also the explicit clinical criteria by which plan quality is measured. These criteria are incorporated directly into both the evaluation and optimization stages, enabling the system to autonomously assess compliance and prioritize plan revisions.

During planning, a two-loop strategy is adopted. The inner loop executes canonical optimization (e.g., weighted quadratic cost minimization), while the outer loop uses GPT-4V’s feedback to iteratively adjust parameters—such as weight vectors and specific dose objectives—until convergence on clinical goals is achieved. Notably, the system’s domain adaptation via in-context learning permits rapid re-targeting to new disease sites and protocols using only a modest set of exemplars.

GPT-RadPlan replicates expert iterative planning through its feedback cycle. Each candidate plan is evaluated on both spatial dose distributions and DVH tables:

The evaluation module (driven by GPT-4V) compares current outputs against the prescription and generates structured feedback describing under-dosed target regions, excessively irradiated OARs, and other deviations.
The planning module translates this text feedback into new optimization parameters, adjusting target/OAR weights and dose objectives in the subsequent inner-loop run.
Persistent memory tracks the evolution of plans, preventing inefficient or redundant parameter searches and allowing the system to “remember” effective adjustment strategies from previous trajectories.

Final convergence is judged by whether target coverage and OAR sparing reach protocol-defined thresholds, or after a prescribed number of optimization cycles.

4. Dosimetric Performance Metrics

Plan quality is assessed using conventional dosimetric endpoints:

Target Coverage: D₉₅ (minimum dose to 95% of the PTV) is matched exactly to the prescribed doses in most cases.
Homogeneity Index (HI): HI = (D₅ – D₉₅)/dₜ × 100; GPT-RadPlan consistently achieves lower HI, indicating more homogeneous dose delivery (e.g., HI of 1.96 vs. 5.43 in clinical plans for prostate).
Conformity Index (CI): CI = (TV₉₅,PTV)² / (TV × TV₉₅,Body), with higher CI reflecting tighter confinement of high-dose regions to the target.
OAR Sparing: Metrics such as D₅, D₅₀, V₁₅, and V₃₀ for relevant organs are tracked; GPT-RadPlan yields up to 5 Gy average reduction in mean OAR dose (15% for prostate, 10–15% for head and neck).

Comparative benchmarks (tables and boxplots within the source data) demonstrate that the automated system meets or surpasses manual planning in both target coverage and critical structure sparing.

5. Clinical Workflow and Practical Implications

Workflow integration is a key advantage of GPT-RadPlan. The system’s language-based interface enables direct, protocol-informed communication with clinicians, facilitating transparent plan evaluation and adjustment without deep technical reconfiguration. In practice, this enables:

Efficiency: Eliminates much of the manual, trial-and-error parameter tuning, reducing time to successful plan delivery and increasing patient throughput.
Consistency: Automated protocol adherence leads to standardized high-quality plans, lowering inter-operator and inter-institutional variability.
Adaptability: The light-weight domain adaptation method enables rapid recalibration for new protocols or cancer types; extensive retraining is not required.
Personalization: Clinicians can easily tailor plans to individual patient requirements by modifying contextual prompts or optimization constraints.

Challenges remain, particularly concerning alignment with implicitly encoded preferences of experienced planners or nuanced tradeoffs in highly complex cases. Continuous clinical validation, and possibly direct feedback loops with practitioners, will be necessary for full integration.

6. System Limitations and Future Research Directions

While GPT-RadPlan exhibits robust protocol-driven plan generation and evaluation, several limitations exist:

The system relies on a relatively small number of approved clinical plans for context learning; unknown edge cases can conceivably deviate from known optimization trajectories.
Optimization is parameterized primarily through target/OAR weights and DVH-based cost functions; more sophisticated objective structures (e.g., spatially varying penalties, biological models) could further refine planning.
The current evaluation does not address long-term clinical outcomes, nor does it directly accommodate inter-patient anatomical variability beyond the protocol constraints.

Future research will likely extend the system to integrate more nuanced clinical feedback, additional diagnostic imaging data, and possibly longitudinal outcome prediction. Expansion to encompass multicriteria optimization (as in Pareto-based pipelines (Zhang et al., 2021)) and integration with larger real-world datasets (see (Abdulkadir et al., 13 Nov 2024, Gao et al., 21 Jan 2025)) will further improve performance and generalizability.

GPT-RadPlan builds on rapid recent progress in multi-modal LLM-guided planning frameworks, including the use of GANs for automated dose prediction (Mahmood et al., 2018), semiautomated multicriteria optimization (Zhang et al., 2021), and scalable pipeline automation (Gao et al., 21 Jan 2025). Systems such as RadOnc-GPT (Liu et al., 2023) and retrieval-augmented LLMs (Cui et al., 25 Sep 2025) are beginning to merge structured clinical knowledge with modular toolchains for plan evaluation and report generation. Data integration solutions (Abdulkadir et al., 13 Nov 2024) and vision-language benchmarking (Zhu et al., 8 Mar 2024, Bassi et al., 8 Jan 2025) provide complementary capabilities for model training and validation. A plausible implication is that future AI-powered planning will combine continuous protocol adaptation, multi-modal reasoning, and deep statistical history from aggregated multi-institutional datasets, enabling rapid, reproducible advances in the quality of radiotherapy planning.