Clinical DVH metrics as a loss function for 3D dose prediction in head and neck radiotherapy

Published 31 Mar 2026 in cs.CV | (2603.29670v1)

Abstract: Purpose: Deep-learning-based three-dimensional (3D) dose prediction is widely used in automated radiotherapy workflows. However, most existing models are trained with voxel-wise regression losses, which are poorly aligned with clinical plan evaluation criteria based on dose-volume histogram (DVH) metrics. This study aims to develop a clinically guided loss formulation that directly optimizes clinically used DVH metrics while remaining computationally efficient for head and neck (H&N) dose prediction. Methods: We propose a clinical DVH metric loss (CDM loss) that incorporates differentiable \textit{D-metrics} and surrogate \textit{V-metrics}, together with a lossless bit-mask region-of-interest (ROI) encoding to improve training efficiency. The method was evaluated on 174 H&N patients using a temporal split (137 training, 37 testing). Results: Compared with MAE- and DVH-curve based losses, CDM loss substantially improved target coverage and satisfied all clinical constraints. Using a standard 3D U-Net, the PTV Score was reduced from 1.544 (MAE) to 0.491 (MAE + CDM), while OAR sparing remained comparable. Bit-mask encoding reduced training time by 83\% and lowered GPU memory usage. Conclusion: Directly optimizing clinically used DVH metrics enables 3D dose predictions that are better aligned with clinical treatment planning criteria than conventional voxel-wise or DVH-curve-based supervision. The proposed CDM loss, combined with efficient ROI bit-mask encoding, provides a practical and scalable framework for H&N dose prediction.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces a clinical DVH metric loss that directly integrates differentiable D- and V-metrics into deep learning loss functions to align model outputs with clinical treatment constraints.
It employs a lossless bit-mask encoding method to efficiently manage overlapping ROIs, dramatically reducing computational overhead during training.
Experiments on 174 H&N patients demonstrate superior target coverage, improved OAR sparing, and enhanced training efficiency compared to traditional voxel-wise loss approaches.

Clinical DVH Metrics as a Loss Function for 3D Dose Prediction in Head and Neck Radiotherapy

Motivation and Problem Formulation

Three-dimensional (3D) dose prediction for radiotherapy planning is central to the automation and standardization of cancer treatment workflows. In head and neck (H&N) radiotherapy, the complexity of anatomical structures and the large number of critical organs-at-risk (OARs) necessitate precise balancing between tumor coverage and tissue sparing. Deep learning (DL) models have rapidly become the foundation of dose prediction, but standard training approaches rely on voxel-wise losses such as mean absolute error (MAE) or mean squared error (MSE). These metrics, however, poorly reflect clinical plan evaluation, which instead hinges on dose-volume histogram (DVH)-derived metrics—quantities directly related to clinical constraints and treatment objectives.

This paper introduces a clinical DVH metric loss (CDM loss) that incorporates differentiable formulations of both D-metrics and surrogates of V-metrics, enabling neural networks to optimize objectives aligned with clinical standards for H&N dose prediction. A novel lossless bit-mask encoding approach is also proposed to efficiently handle many overlapping ROIs, addressing bottlenecks in training scalability and computational overhead.

Figure 1: Workflow of deep-learning-based head and neck dose prediction, showing multi-level PTVs and numerous OARs as network inputs.

Technical Contributions

Differentiable Clinical DVH Metric Loss

The proposed CDM loss combines analytically differentiable D-metrics (such as $D_{x\%}$ quantiles, $D_{0.03\,\text{cc}}$ , maximum, minimum, and mean dose) with differentiable surrogate V-metrics (such as $V_{95\%}$ ) using logistic sigmoid approximations to thresholding functions. This method ensures end-to-end gradient flow during optimization and supports direct supervision of metric values specified in institutional clinical templates.

Clinically, these metrics represent mandatory plan constraints (e.g., minimum PTV coverage, maximum dose in sensitive structures) and planning aims. By explicitly targeting these objectives, the network is steered towards solutions that satisfy clinical requirements rather than merely achieving numerical similarity to historical dose distributions.

Figure 2: Transformation of 3D dose distributions for an ROI into 1D arrays for differentiable D-metric and V-metric evaluation.

The surrogate V-metric approximation is parameterized by a slope $\alpha$ , balancing fidelity to the hard threshold and avoidance of gradient saturation. The paper derives an analytic lower bound for $\alpha$ based on a tolerated approximation error $\varepsilon$ and the fraction of voxels in a margin $m$ around the prescribed dose threshold.

Lossless Bit-Mask ROI Encoding

Conventional dose prediction pipelines encode each ROI as a separate input channel; with large numbers of overlapping ROIs, this leads to significant preprocessing and GPU overhead. The lossless bit-mask encoding scheme assigns a unique bit position to each ROI, compressing all ROI masks into a single integer-valued volume. On-demand decoding using bitwise operations restores the binary masks as needed for network input or loss evaluation.

This approach yields substantial improvements in training efficiency and memory usage, enabling comprehensive clinical supervision over all relevant structures and facilitating high-throughput experimentation.

Figure 3: Visualization and decoding of lossless bit-mask encoding for multiple overlapping ROIs, showing efficient channel reduction and on-demand bitwise extraction.

Experimental Validation

A retrospective cohort of 174 H&N patients treated with VMAT was chronologically split for training ( $n=137$ ) and out-of-time testing ( $n=37$ ), supporting robust evaluation. A standardized clinical template (JSON-based) defined all optimized metrics, ROIs, planning aims, and constraints.

Performance Analysis

Comparison of multiple loss function configurations (MAE, MAE+DVH, MAE+DVH+CDM, MAE+CDM) demonstrated that only CDM-based approaches achieved stringent PTV coverage and satisfied all clinical constraints. The MAE+CDM configuration reduced the PTV Score from 1.544 (MAE) to 0.491, while maintaining comparable or improved OAR sparing. Boxplot and statistical analysis confirmed constraint satisfaction and non-significant deviation from clinical plans for prioritized metrics.

Figure 4: Boxplots of clinically relevant dose metrics across loss functions, showing constraint satisfaction and statistical parity with clinical plans for MAE+CDM.

Visualizations of dose distributions under various loss configurations illustrated inadequate coverage or overdosing under conventional losses, contrasted with CDM-based solutions’ clinically acceptable profiles.

Figure 5: Axial dose maps under different loss functions, highlighting superior target coverage and OAR sparing for CDM approaches.

Computational Efficiency

Ablation experiments established the efficacy of the bit-mask encoding, yielding an 83% reduction in average epoch duration (241s → 43s) and >4% peak GPU memory saving. This scalability advantage enables broader clinical supervision and efficient utilization of available hardware resources.

Architectural Generality

Under the clinically aligned MAE+CDM loss, a standard 3D U-Net matched or outperformed state-of-the-art architectures (SwinUNETR, MedNeXt, cascade models) in PTV and OAR metrics, suggesting that alignment of training objectives with clinical plans is a more critical determinant of dose prediction quality than architectural complexity.

Implications and Future Directions

Direct optimization of clinical DVH metrics transforms the paradigm of radiotherapy AI, bridging the gap between numerical accuracy and clinical acceptability. The CDM loss template-driven framework is adaptable to diverse planning protocols by simple updates to JSON templates, paving the way for multi-institutional deployment. Bit-mask encoding removes constraints on ROI channel count, supporting large-scale evaluation and comprehensive clinical supervision.

In practice, adoption of CDM-based models enables automated dose prediction that meets clinical standards, potentially reducing manual workload and interobserver variability. The architectural independence of the approach—where simpler networks suffice given the right loss formulation—accommodates efficient clinical integration, training, and validation.

Theoretically, the differentiable surrogate construction of V-metrics and analytic slope bounds provide generalizable methods for supervised optimization under non-differentiable clinical constraints. Future research directions include multi-institutional generalization, integration with downstream dose mimicking for deliverable plan optimization, and end-to-end automated RT planning systems.

Conclusion

This work establishes a clinically guided, differentiable DVH metric loss coupled with lossless bit-mask ROI encoding as a scalable, robust framework for 3D dose prediction in head and neck radiotherapy. CDM loss achieves superior target coverage and OAR sparing, consistent satisfaction of clinical constraints, and operational efficiency, independent of network complexity. The methodology is broadly adaptable and provides a foundation for clinically transparent, AI-assisted radiotherapy planning.

Markdown Report Issue