Source of architecture-driven performance differences

Ascertain whether the observed performance differences among deep learning architectures for synthetic CT generation in the SynthRAD2023 tasks (MRI-to-CT and CBCT-to-CT) are attributable solely to architectural choices or are significantly influenced by other components of the end-to-end pipeline, including preprocessing, data augmentation, postprocessing, and training procedures.

Background

In SynthRAD2023, transformer-based approaches tended to outperform CNN encoder–decoder models, which in turn generally outperformed GANs, while diffusion models were less competitive. Although statistically significant differences were observed, the margins were small and training and preprocessing choices varied widely across teams.

Because these methodological variations can confound architecture comparisons, the authors explicitly note that it remains inconclusive whether the performance gaps are due to the architectures themselves or other elements of the complex training and processing pipeline.

References

Therefore, whether the observed differences stem solely from architectural choices or are significantly influenced by other aspects of the complex end-to-end pipeline, including preprocessing, data augmentation, postprocessing, and training procedures, remains inconclusive.

Generating Synthetic Computed Tomography for Radiotherapy: SynthRAD2023 Challenge Report (2403.08447 - Huijben et al., 13 Mar 2024) in Discussion (Section 6)