Differentiable Digitally Reconstructed Radiographs

Updated 12 November 2025

Differentiable DRRs are 2D synthetic X-ray images computed from 3D volumetric data using auto-differentiation, bridging CT imaging with deep learning frameworks.
They enable gradient-based optimization over imaging parameters, scene geometry, and density fields, which improves registration, pose estimation, and segmentation in medical imaging.
Implementations range from voxel grid ray-tracing to neural scene representations like NeTT and mNeRF, balancing computational efficiency with increased physical realism.

Digitally Reconstructed Radiographs (DRRs) are 2D synthetic X-ray images formed from 3D volumetric data (typically CT), serving as the computational analogue of radiographic projections in medical imaging. Differentiable DRRs (dDRRs) advance this concept by reformulating the DRR synthesis pipeline to be compatible with automatic differentiation in modern deep learning frameworks, such as PyTorch or TensorFlow. This capability is foundational for integrating DRR generation within optimization and learning tasks that require gradient-based updates with respect to imaging parameters, scene geometry, volumetric density, and even neural scene representations. dDRRs underpin a diverse range of inverse problems in image-guided interventions, pose estimation, registration, segmentation, and simulation-to-real domain adaptation.

1. Mathematical Foundations of Differentiable DRR Rendering

dDRR frameworks are built on the Beer–Lambert law, which models radiographic imaging as an exponential attenuation process along rays through a density or attenuation volume. For a source position $s \in \mathbb{R}^3$ and a pixel (detector coordinate) $p \in \mathbb{R}^3$ , the DRR intensity via a continuous projection (line integral) is

$I(u, v; \eta) = \int_{\ell_{s, p}} \mu(x)\, dx$

where $\mu(x)$ is the linear attenuation coefficient at $x$ , and $\ell_{s, p}$ is the ray from $s$ through $p$ .

Discretization typically follows ray tracing schemes:

Siddon's algorithm decomposes the ray into a sequence of intersections with the volume grid, yielding

$E(R) = \|p - s\|_2 \sum_{m=1}^{M-1} (\alpha_{m+1} - \alpha_m) V\left(s + \frac{\alpha_{m+1} + \alpha_m}{2}(p-s)\right)$

with $V$ the discrete volumetric tensor, and $\alpha$ values denoting sorted intersection points with the volume planes (Gopalakrishnan et al., 2022).

For end-to-end differentiability, these integrals are implemented in DL frameworks as sequences of gather, multiply, sum, sorting, and interpolation operations. The computational graph encompasses all parameters ( $p$ , $s$ , scene densities) to enable backward gradient flow.

2. Algorithmic Implementations and Scene Representations

dDRRs have been realized using various scene representations and computational paradigms:

a. Voxel Grid and Siddon-Type Ray-Tracing

Detector grid points $P \in \mathbb{R}^{H \times W \times 3}$ and source $s$ are used to instantiate rays for each pixel.
Voxel volume $V$ is sampled trilinearly along each ray at the midpoints of each intersection segment.
All geometric transforms (translations and rotations) are differentiable, with angles and positions encoded as optimization variables.
Full vectorization in PyTorch accelerates both the forward and backward passes, achieving rendering latencies ( $\sim$ 10–20 ms for $100 \times 100$ DRRs on GPUs) that match clinical fluoroscopy frame rates (Gopalakrishnan et al., 2022).

b. Neural Scene Representations

dDRR renderers have absorbed neural representations:

Neural Tuned Tomography (NeTT): An MLP with positional encoding refines the density field to match empirical fluoroscopy styles.
Masked NeRF (mNeRF): A coordinate-MLP with scene masking for continuous and generalizable density fields (Zhou et al., 2023).

Ray integration becomes

$I = \exp\left(-\sum_{i=1}^{N_s} \rho(x_i) \Delta t_i \right)$

where $\rho(x_i)$ may be read from a voxel, predicted by NeTT, or queried from mNeRF, all kept within the autodiff graph.

c. Incorporation of Semantic and Physics-Inspired Factors

Class-Weighted Projections: DRR4Covid introduces semantic weighting $W(m_{i,j,k})$ per tissue class (background, lung, infection) in the radiological integral, allowing explicit control over synthetic pathology appearance and fully labeled (mask-aligned) synthetic datasets (Zhang et al., 2020).
Anisotropy via Physics-Inspired Models: Recent approaches seek greater physical realism by modeling anisotropic phenomena (e.g., Compton scatter), though full technical exposition requires additional sources (Gao et al., 4 Jun 2024).

Scene Model	Forward Ray Integral	Differentiable Parameters
Voxel Grid	$\sum_k V(x_k)\Delta t_k$	$V$ , geometry ( $s,p$ ), transforms
NeTT	$\sum_k \mathrm{MLP}(V(x_k))\Delta t_k$	MLP weights, $V$ , transforms
mNeRF	$\sum_k \mathrm{NeRFMLP}(x_k)\Delta t_k$	MLP weights, transforms
Class-Weighted (DRR4Covid)	Numerator/denominator weighted sum	Class weights $W$ , thresholds

3. Differentiability and Gradient Computation

The core requirement for dDRRs is the analytic or autodiff-enabled availability of gradients with respect to geometry, pose, scene, or auxiliary parameters. For geometry, the pixel intensity $I_{uv}(\eta)$ (with pose $\eta$ ) admits derivatives via the chain rule: $\frac{\partial I_{uv}}{\partial \eta} = \frac{\partial \|p-s\|}{\partial \eta} (\cdots) + \|p-s\| \left( \frac{\partial \alpha_m}{\partial \eta} (\cdots) + \nabla V(\text{midpoints}) \cdot \left[ \frac{\partial s}{\partial \eta} + \alpha_{mid} \frac{\partial (p-s)}{\partial \eta} \right] \right)$ Actual implementation uses automatic differentiation in PyTorch/TensorFlow to avoid hand-coded Jacobians except for special cases (e.g., rotation).

When representing voxels or densities with MLPs (as in NeTT/mNeRF), the computational graph propagates derivatives through the neural network structure itself.

The practical result is the ability to directly optimize over pose, density, neural scene parameters, or augmentation/semantic weights using any network-compatible loss.

4. Integration into Optimization, Registration, and Learning Frameworks

dDRRs are foundational in several inverse problems in medical computer vision:

a. Pose Estimation and 2D/3D Registration

Slice-to-Volume Registration: Direct backpropagation through a dDRR generator allows gradient-based registration between a fixed “real” DRR and a synthetic, pose-adjusted DRR. Experiments demonstrate empirical convexity in the loss landscape near the optimum, facilitating rapid convergence with gradient descent (65.5 iterations, $\sim$ 1.9 s on RTX 2080 Ti for 1000 trials) compared to orders-of-magnitude slower gradient-free methods (Gopalakrishnan et al., 2022).
3D/2D Registration in Interventions: The mutual information (MI) loss, implemented as a differentiable “soft” histogram, outperforms MSE, L1, SSIM, and Dice in both accuracy and convergence; MI yields smooth, globally convex rotational loss surfaces, suppressing local minima found in other metrics (Zhou et al., 2023).

b. Segmentation and Simulation-to-Real Domain Adaptation

Supervised Learning from Synthetic DRRs: DRR4Covid uses an infection-aware dDRR generator to synthesize labeled radiographs from CTs with voxelwise segmentation. These labeled DRRs train FCN-style networks with classification and segmentation heads (Zhang et al., 2020).
Domain Adaptation via MMD: Maximum Mean Discrepancy (MMD), calculated in feature space between synthetic and real (unlabeled) data, is incorporated into the total loss during training; network achieves 0.954 accuracy and 0.989 AUC on COVID classification, and 0.957 accuracy and 0.981 AUC on segmentation on held-out chest X-ray datasets.

c. Differentiable “Back-Projections” in 2D/3D Deformable Registration

LiftReg: Instead of differentiable forward DRR generation, LiftReg utilizes a differentiable back-projection of radiographs into 3D to enable network-based prediction of deformation fields for thoracic registration (Tian et al., 2022). The core differentiable routines—grid sampling and accumulation—are directly compatible with autodiff.

5. Computational Performance and Practical Considerations

Efficient implementation is critical for intraoperative and real-time tasks. Vectorized GPU Siddon (VGS) achieves DRR generation in 17.6 ms for $100 \times 100$ images and 72.7 ms for $200 \times 200$ on RTX 2080 Ti, enabling 10–20 ms latencies compatible with live fluoroscopic guidance (Gopalakrishnan et al., 2022). Autograd-based backward passes (35.1 ms per gradient) are order-of-magnitude faster than finite-difference approximations.

Choice of scene representation incurs direct computational cost: voxelized volumes enable fastest inference, NeTT MLPs require $\sim$ 30 min training but can generalize across subjects, while mNeRF introduces subject-specific models with up to $\sim$ 5 hours of per-patient training and slower inference ( $\sim$ 128 s per pose estimate at $128\times128$ resolution) (Zhou et al., 2023).

Recommended practices for clinical or research pipelines include use of voxel grid DRRs for time-sensitive tasks, fallback to NeTT for style-mismatched domains, and mNeRF primarily for cases that demand high flexibility in scene encoding.

6. Task-Specific Loss Function Design and Regularization

Loss function choice is critical for optimizing through dDRR layers in inverse tasks:

Zero-Normalized Cross-Correlation (ZNCC): Used in registration tasks, ZNCC between DRRs underlies fast, convex optimization surfaces (Gopalakrishnan et al., 2022).
Mutual Information (MI): The differentiable MI loss enables robust pose estimation, particularly in presence of photometric inconsistencies, and avoids local minima prevalent in MSE/SSIM landscapes (Zhou et al., 2023).
Feature Alignment via MMD: Facilitates unsupervised simulation-to-real adaptation in segmentation/classification pipelines, especially when synthetic DRRs do not perfectly match the real data distribution (Zhang et al., 2020).

Segmentation/classification networks are typically constructed with off-the-shelf architectures (ResNet-18 backbones, FCN heads), trained using weighted cross-entropy to address class imbalance and learning rates decayed over epochs for stability. Regularization (e.g., deformation field diffusion, class-weighting) is incorporated according to the specifics of the structural prediction task.

7. Impact, Limitations, and Future Directions

Differentiable DRR generators represent a foundational advance for integrating physics-driven forward models with modern gradient-based learning, optimization, and simulation-to-real transfer. Their impact is evident in accelerating image-guided interventions, enabling real-time registration, providing fully aligned training maps for semantic segmentation, and facilitating learned corrections of simulated imaging artifacts.

Limitations include:

High computational cost for neural scene representations (notably mNeRF).
Potential for domain gap between simulated DRRs and real fluoroscopy, mitigated by learning-based tuning (NeTT) or domain-adaptive loss (MMD).

Emerging research directions include:

Integration of scatter and anisotropic radiative physics in differentiable form for increased realism (Gao et al., 4 Jun 2024).
Hybrid pipelines that combine dDRR generation with joint neural optimization over both scene structure and imaging geometry.
Exploitation of computational sparsity and hardware accelerators to approach $\sim$ 1 ms per DRR, as suggested for ultrafast intraoperative tracking applications (Gopalakrishnan et al., 2022).

A plausible implication is that as differentiable X-ray forward models increase in physical fidelity while retaining computational tractability, they will catalyze new classes of closed-loop, data-driven methods for intraoperative image guidance, robotic navigation, and disease diagnosis.