Papers
Topics
Authors
Recent
2000 character limit reached

FluenceFormer: Transformer for IMRT Planning

Updated 3 January 2026
  • FluenceFormer is a transformer architecture for automated IMRT fluence map prediction, directly mapping patient anatomy to beam configurations.
  • It employs a two-stage design with global dose prior prediction followed by geometry-conditioned fluence regression for enhanced planning accuracy.
  • The framework integrates a physics-informed FAR loss combining MSE, gradient, correlation, and energy terms to ensure structural and dosimetric fidelity.

FluenceFormer is a backbone-agnostic transformer architecture for direct, geometry-aware multi-beam fluence map regression in automated radiotherapy planning for intensity-modulated radiation therapy (IMRT). Addressing the inherent ill-posedness of predicting beam fluence maps from anatomical data, FluenceFormer employs a two-stage transformer design and introduces a physics-informed loss function (Fluence-Aware Regression, or FAR) to enforce structural, physical, and dosimetric plausibility. The framework shows statistically significant improvements in energy error and structural fidelity over existing single-stage and convolutional models, demonstrating generality across multiple transformer backbones (Mgboh et al., 27 Dec 2025).

1. Motivation and Problem Formulation

Automated radiotherapy planning aims to replace labor-intensive, non-standardized manual fluence design with machine learning–based approaches. In IMRT, optimal patient-specific therapy requires multi-beam fluence maps—2D modulator settings for each angle—yielding the prescribed 3D dose for tumor control while sparing organs-at-risk (OARs). Learning fluence maps directly from anatomical data constitutes a highly ill-posed inverse problem: many distinct multi-beam fluence configurations can deliver nearly identical spatial dose distributions, making anatomy-to-fluence mapping both non-unique and severely under-constrained.

Standard CNN-based direct regression models suffer from two main issues:

  • Limited receptive field: Lacking the capacity to model long-range inter-beam and anatomical context, resulting in plans with degraded structural consistency.
  • Physical unrealizability: Output fluence maps may violate beam-by-beam energy conservation or mechanical delivery smoothness.

FluenceFormer is designed to overcome these by decomposing the task and embedding domain physics directly in the learning objective (Mgboh et al., 27 Dec 2025).

2. Architecture: Two-Stage, Geometry-Conditioned Transformer Framework

FluenceFormer adopts a pipeline that mirrors clinical planning, partitioned into dose prior prediction and explicit geometry-aware fluence regression.

Stage 1: Global Dose Prior Prediction

  • Input: Axial CT slice CzC_z and corresponding anatomical mask RzR_z, producing xz=[Cz,Rz]R2×H×Wx_z = [C_z, R_z] \in \mathbb{R}^{2 \times H \times W}.
  • Backbone: Transformer-based encoder (e.g., UNETR, Swin UNETR, nnFormer, MedFormer) extracts multi-scale features, upsampled via decoder.
  • Regression Head: 1×11\times1 convolution + ReLU, yielding the predicted dose slice y^zR01×H×W\hat{y}_z \in \mathbb{R}_{\ge 0}^{1 \times H \times W}.
  • Supervision: Voxel-wise mean squared error (MSE) to clinical dose.

Stage 2: Geometry-Aware Fluence Regression

  • Beam Conditioning: For each beam bb (gantry angle θb\theta_b), two constant maps Msin,McosM_{\sin}, M_{\cos} (sinθb\sin\theta_b, cosθb\cos\theta_b) are concatenated with the dose prior: input xz(2)=[y^z,Msin,Mcos]R3×H×Wx^{(2)}_z = [\hat{y}_z, M_{\sin}, M_{\cos}] \in \mathbb{R}^{3 \times H \times W}.
  • Backbone: Same or different transformer-based encoder-decoder as in Stage 1.
  • Output: Per-beam fluence map F^zbR01×H×W\hat{F}_z^b \in \mathbb{R}_{\ge 0}^{1 \times H \times W} via 1×11\times1 convolution + ReLU.
  • Supervision: Physics-informed composite loss, as detailed below.

The backbone-agnostic nature arises from interchangeable transformer variants, all supporting the two-stage geometry-conditioned design.

3. Physics-Informed Objective: Fluence-Aware Regression (FAR) Loss

The FAR loss integrates multiple criteria to reflect clinical and physical constraints:

LFAR=αLMSE+βLGrad+γLCorr+δLEnergy\mathcal{L}_{\mathrm{FAR}} = \alpha\,\mathcal{L}_{\mathrm{MSE}} + \beta\,\mathcal{L}_{\mathrm{Grad}} + \gamma\,\mathcal{L}_{\mathrm{Corr}} + \delta\,\mathcal{L}_{\mathrm{Energy}}

with weights (α,β,γ,δ)=(1,0.5,0.3,0.2)(\alpha, \beta, \gamma, \delta) = (1, 0.5, 0.3, 0.2).

  • Voxel-Level Fidelity (LMSE\mathcal{L}_{\mathrm{MSE}}): Average MSE between predictions and ground truth fluence maps across all beams.
  • Gradient Smoothness (LGrad\mathcal{L}_{\mathrm{Grad}}): L1 penalty on spatial gradient differences, discouraging non-smooth, mechanically infeasible fluence.
  • Structural Consistency (LCorr\mathcal{L}_{\mathrm{Corr}}): 1ρ(F^b,Fb)1 - \rho(\hat{F}^b, F^b), with ρ\rho as pixelwise Pearson correlation per beam, ensuring shape alignment regardless of scaling.
  • Beam-Wise Energy Conservation (LEnergy\mathcal{L}_{\mathrm{Energy}}): Absolute difference in total monitor units per beam, bounding photon flux prediction in clinical thresholds.

Empirical ablation indicates performance degradation if the gradient or energy terms are omitted; beam-wise (as opposed to global) computation of correlation and energy yields highest structural fidelity as measured by SSIM.

4. Transformer Backbone Flexibility

FluenceFormer achieves architecture-agnosticism by abstracting the specific backbone transformer type:

Backbone Encoder Style Key Feature
UNETR ViT + conv decoder Long-range context via tokens
Swin UNETR Shifted-window transformer Multi-scale spatial hierarchy
nnFormer Interleaved local/global U-Net skip connections with MHSA
MedFormer Medical-task–tuned MHSA Domain-tailored attention blocks

All architectures support patch embedding/tokenization, multi-head self-attention for global anatomical context, and various forms of skip connection or hierarchical contextualization. The observed energy error and SSIM gains are consistent across backbones, with Swin UNETR yielding the strongest quantitative performance on the presented IMRT task (Mgboh et al., 27 Dec 2025).

5. Experimental Evaluation and Quantitative Results

  • Dataset: Ninety-nine prostate IMRT cases, including 3D CT, contours, dose, and nine-beam fluence per patient. Preprocessing (resampling to 128×128128 \times 128 axial resolution, normalization/scaling) standardizes input for transformer processing.
  • Splits: 70% training, 10% validation, 20% test.
  • Metrics: Mean absolute error (MAE), peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and, critically, mean relative beam-wise energy error (EE).
  • Results: With two-stage Swin UNETR + FAR, EE reduces to 4.53%±2.54%4.53\% \pm 2.54\%, SSIM rises to 0.76±0.080.76 \pm 0.08 (versus single-stage direct regression EE 7.1%7.1\%, SSIM $0.67$; U-Net EE 8.4%8.4\%). Performance gains reach statistical significance (p<0.05p < 0.05).
  • Baselines: Naive segmentation-style transformers (sigmoid heads) have high EE (>20%>20\%), low SSIM ($0.40–0.50$). Single-stage CNN/transformer regression with ReLU improves but lags the two-stage FAR approach.

Ablation confirms that beam-wise statistics, higher input resolution, and linear regression heads outperform alternatives; FluenceFormer consistently recovers both sharp field edges and smooth internal fluence modulation.

6. Limitations and Future Directions

  • Domain Generalization: Current evaluation is limited to single-institution prostate IMRT; extension to other anatomic sites, modalities, and treatment protocols remains untested.
  • Dose-Coupling: No differentiable feedback from dose calculation to model; fluence prediction is performed independently, so final delivered dose accuracy is not directly optimized during training.
  • Generalization of Physics Constraints: Reliance on representative training data for physical statistics (e.g., beam energies, anatomical variation).
  • Future Work: Directions include clinical protocol adaptation, integration of differentiable dose engines for end-to-end optimization, and establishment of open benchmarks for standardized external comparison (Mgboh et al., 27 Dec 2025).

7. Significance and Contributions

FluenceFormer formally reframes direct fluence map regression as a geometry-conditioned, two-stage transformer problem. By interposing a predicted structural dose prior and grounding supervision in explicitly physics-informed loss terms that codify smoothness, correlation, and physical conservation, FluenceFormer achieves both superior quantitative accuracy (as measured by EE and SSIM) and physically/plausibly deliverable IMRT plans. The design is compatible with diverse transformer backbones and offers a reproducible, extensible foundation for future automated radiotherapy planning research and deployment (Mgboh et al., 27 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to FluenceFormer.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube